Rate this Page

helion.Settings#

class helion.Settings(**settings)[source]#

Settings can be passed to hl.kernel as kwargs and control the behavior of the compilation process. Unlike a Config, settings are not auto-tuned and set by the user.

Parameters:

settings (object) –

__init__(**settings)[source]#

Initialize the Settings object with the provided dictionary of settings.

Parameters:

settings (object) –

Methods

__init__(**settings)

Initialize the Settings object with the provided dictionary of settings.

check_autotuning_disabled()

rtype:

None

get_rebenchmark_threshold()

Get the effective rebenchmark threshold.

to_dict()

Convert the Settings object to a dictionary.

Attributes

backend

Code generation backend.

ignore_warnings

Subtypes of exc.BaseWarning to ignore when compiling.

index_dtype

The dtype to use for index variables.

dot_precision

Precision for dot products, see triton.language.dot.

fast_math

If True, enable fast math approximations (Helion-level and Inductor-level).

static_shapes

If True, use static shapes for all tensors.

persistent_reserved_sms

Number of streaming multiprocessors to reserve when launching persistent kernels.

autotune_force_persistent

If True, restrict pid_type choices to persistent kernels only during config selection.

autotune_log_level

Log level for autotuning using Python logging levels.

autotune_log

Base filename for autotune logs.

autotune_compile_timeout

Timeout for Triton compilation in seconds used for autotuning.

autotune_precompile

'fork', 'spawn', or falsy/None to disable.

autotune_precompile_jobs

Maximum concurrent Triton precompile processes, default to cpu count.

autotune_random_seed

Seed used for autotuner random number generation.

autotune_accuracy_check

If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.

autotune_rebenchmark_threshold

If a config is within threshold*best_perf, re-benchmark it to avoid outliers.

autotune_search_acf

List of PTXAS Advanced Controls Files (ACFs) to search during autotuning.

autotune_progress_bar

If True, show progress bar during autotuning.

autotune_max_generations

Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).

autotune_ignore_errors

If True, skip logging and raising autotune errors.

autotune_adaptive_timeout

If True, set the compile timeout threshold to be smaller for Triton compilation,based on a quantile of initial compile times (with a lower bound).

print_output_code

If True, print the output code of the kernel to stderr.

print_repro

If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.

output_origin_lines

If True, annotate generated Triton code with source-origin comments.

force_autotune

If True, force autotuning even if a config is provided.

autotune_config_overrides

4}'.

allow_warp_specialize

If True, allow warp specialization for tl.range calls on CUDA devices.

debug_dtype_asserts

If True, emit tl.static_assert checks for dtype after each device node.

ref_mode

Reference mode for kernel execution.

autotuner_fn

Function to create an autotuner.

autotune_effort

Autotuning effort preset.

autotune_baseline_fn

Custom baseline function for computing baseline output during autotuning.

autotune_baseline_atol

Absolute tolerance for baseline output comparison during autotuning accuracy checks.

autotune_baseline_rtol

Relative tolerance for baseline output comparison during autotuning accuracy checks.

autotune_baseline_accuracy_check_fn

object, expected: object) -> None.

autotune_cache

The name of the autotuner cache class to use.

autotune_benchmark_fn

list[Callable[[], object]], *, repeat: int, desc: str | None = None) -> list[float]. If None (default), uses the built-in benchmark function.

autotune_best_available_max_configs

Maximum number of cached configs to use for FROM_BEST_AVAILABLE initial population strategy.

autotune_best_available_max_cache_scan

Maximum number of cache files to scan when searching for matching configs in FROM_BEST_AVAILABLE strategy.

autotune_initial_population_strategy

'from_random', 'from_default', 'from_best_available'.

__init__(**settings)[source]#

Initialize the Settings object with the provided dictionary of settings.

Parameters:

settings (object) –

to_dict()[source]#

Convert the Settings object to a dictionary.

Returns:

A dictionary representation of the Settings object.

Return type:

dict[str, object]

check_autotuning_disabled()[source]#
Return type:

None

get_rebenchmark_threshold()[source]#

Get the effective rebenchmark threshold. Uses the explicit setting if provided, otherwise falls back to the effort profile default.

Returns:

The rebenchmark threshold value.

Return type:

float

backend: Literal['triton', 'pallas', 'cute', 'tileir', 'metal']#

Code generation backend. One of ‘triton’ (default), ‘pallas’ (JAX/Pallas), ‘cute’ (CUTLASS CuTe DSL), or ‘metal’ (Apple Metal MSL). Set HELION_BACKEND=<backend> to override.

ignore_warnings: list[type[BaseWarning]]#

Subtypes of exc.BaseWarning to ignore when compiling. Set HELION_IGNORE_WARNINGS=WarningA,WarningB (names from helion.exc) to configure via env.

index_dtype: dtype | None#

The dtype to use for index variables. Default auto-selects torch.int32 or torch.int64 based on input sizes. Override with HELION_INDEX_DTYPE=<dtype> (or set to ‘auto’).

dot_precision: Literal['tf32', 'tf32x3', 'ieee']#

Precision for dot products, see triton.language.dot. Can be ‘tf32’, ‘tf32x3’, or ‘ieee’.

fast_math: bool#

If True, enable fast math approximations (Helion-level and Inductor-level). May reduce numerical precision. Set HELION_FAST_MATH=1 to enable.

static_shapes: bool#

If True, use static shapes for all tensors. This is a performance optimization. Set HELION_STATIC_SHAPES=0 to disable.

persistent_reserved_sms: int#

Number of streaming multiprocessors to reserve when launching persistent kernels. Set HELION_PERSISTENT_RESERVED_SMS=N (default 0) or pass persistent_reserved_sms=N to helion.kernel.

autotune_force_persistent: bool#

If True, restrict pid_type choices to persistent kernels only during config selection. Set HELION_AUTOTUNE_FORCE_PERSISTENT=1 to force persistent kernel autotuning globally.

autotune_log_level: int#

Log level for autotuning using Python logging levels. Default is logging.INFO. Use HELION_AUTOTUNE_LOG_LEVEL to override or set 0 to disable output.

autotune_log: str | None#

Base filename for autotune logs. Set HELION_AUTOTUNE_LOG=/tmp/run to write /tmp/run.csv and /tmp/run.log with per-config metrics and debug logs.

autotune_compile_timeout: int#

Timeout for Triton compilation in seconds used for autotuning. Default is 60 seconds.

autotune_precompile: Optional[Literal['spawn', 'fork']]#

‘fork’, ‘spawn’, or falsy/None to disable. Defaults to ‘fork’ on non-Windows platforms.

Type:

Autotuner precompile mode

autotune_precompile_jobs: int | None#

Maximum concurrent Triton precompile processes, default to cpu count.

autotune_random_seed: int#

Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED or a time-based seed.

autotune_accuracy_check: bool#

If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.

autotune_rebenchmark_threshold: float | None#

If a config is within threshold*best_perf, re-benchmark it to avoid outliers. Defaults to effort profile value. Set HELION_REBENCHMARK_THRESHOLD to override.

autotune_search_acf: list[str]#

List of PTXAS Advanced Controls Files (ACFs) to search during autotuning. ACFs are highly specialized configurations for specific hardware and use cases; when autotuning with ACFs, default -O3 is always considered. Empty list disables.

autotune_progress_bar: bool#

If True, show progress bar during autotuning. Default is True. Set HELION_AUTOTUNE_PROGRESS_BAR=0 to disable.

autotune_max_generations: int | None#

Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).

autotune_ignore_errors: bool#

If True, skip logging and raising autotune errors. Set HELION_AUTOTUNE_IGNORE_ERRORS=1 to enable globally.

autotune_adaptive_timeout: bool#

If True, set the compile timeout threshold to be smaller for Triton compilation,based on a quantile of initial compile times (with a lower bound). Lower bound and quantile are set by the effort profile. Set HELION_AUTOTUNE_ADAPTIVE_TIMEOUT=0 to disable.

print_output_code: bool#

If True, print the output code of the kernel to stderr.

print_repro: bool#

If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.

output_origin_lines: bool#

If True, annotate generated Triton code with source-origin comments. Set HELION_OUTPUT_ORIGIN_LINES=0 to disable.

force_autotune: bool#

If True, force autotuning even if a config is provided.

autotune_config_overrides: dict[str, object]#

4}’.

Type:

Dictionary of config key/value pairs forced during autotuning. Accepts HELION_AUTOTUNE_CONFIG_OVERRIDES=’{“num_warps”

allow_warp_specialize: bool#

If True, allow warp specialization for tl.range calls on CUDA devices.

debug_dtype_asserts: bool#

If True, emit tl.static_assert checks for dtype after each device node.

ref_mode: RefMode#

Reference mode for kernel execution. Can be RefMode.OFF or RefMode.EAGER.

autotuner_fn: AutotunerFunction#

Function to create an autotuner. Override by passing a callable to @helion.kernel(…, autotuner_fn=…).

autotune_effort: Literal['none', 'quick', 'full']#

Autotuning effort preset. One of ‘none’, ‘quick’, ‘full’.

autotune_baseline_fn: Optional[Callable[..., object]]#

Custom baseline function for computing baseline output during autotuning. If provided, this function will be called instead of running the default config. Should have the same signature as the kernel function. Pass as @helion.kernel(…, autotune_baseline_fn=my_baseline_fn).

autotune_baseline_atol: float | None#

Absolute tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_atol=1e-3).

autotune_baseline_rtol: float | None#

Relative tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_rtol=1e-3).

autotune_baseline_accuracy_check_fn: Optional[Callable[[object, object], None]]#

object, expected: object) -> None. Should raise AssertionError on mismatch. When set, replaces the default torch.testing.assert_close-based check (atol/rtol settings are ignored). Useful for scenarios where a small fraction of elements may have large relative differences, e.g. checking that mismatch percentage < X AND max relative diff < Y. A built-in utility helion._testing.assert_close_with_mismatch_tolerance is provided for this common pattern; use functools.partial(assert_close_with_mismatch_tolerance, ...) to customize thresholds. Pass as @helion.kernel(…, autotune_baseline_accuracy_check_fn=my_check_fn).

Type:

Custom accuracy check function for comparing autotuning candidate outputs against the baseline. Signature

Type:

(actual

autotune_cache: str#

The name of the autotuner cache class to use. Set HELION_AUTOTUNE_CACHE=StrictLocalAutotuneCache to enable strict caching. Defaults to ‘LocalAutotuneCache’.

autotune_benchmark_fn: Optional[Callable[..., list[float]]]#

list[Callable[[], object]], *, repeat: int, desc: str | None = None) -> list[float]. If None (default), uses the built-in benchmark function.

Type:

Custom benchmark function for rebenchmarking during autotuning. Should have the following signature

Type:

(fns

autotune_best_available_max_configs: int#

Maximum number of cached configs to use for FROM_BEST_AVAILABLE initial population strategy. Set HELION_BEST_AVAILABLE_MAX_CONFIGS=N to override. Default is 20.

autotune_best_available_max_cache_scan: int#

Maximum number of cache files to scan when searching for matching configs in FROM_BEST_AVAILABLE strategy. Set HELION_BEST_AVAILABLE_MAX_CACHE_SCAN=N to override. Default is 500.

autotune_initial_population_strategy: Optional[Literal['from_random', 'from_default', 'from_best_available']]#

‘from_random’, ‘from_default’, ‘from_best_available’. When set, takes precedence over the HELION_AUTOTUNER_INITIAL_POPULATION env var and the effort profile default.

Type:

Override the initial population strategy for autotuning. Valid values