Rate this Page

★ ★ ★ ★ ★

Settings#

The Settings class controls compilation behavior and debugging options for Helion kernels.

class helion.Settings(**settings)[source]#

Bases: _Settings

Settings can be passed to hl.kernel as kwargs and control the behavior of the compilation process. Unlike a Config, settings are not auto-tuned and set by the user.

Parameters:: settings (object) –

__init__(**settings)[source]#

Initialize the Settings object with the provided dictionary of settings.

Parameters:: settings (object) –

to_dict()[source]#

Convert the Settings object to a dictionary.

Returns:: A dictionary representation of the Settings object.
Return type:: dict[str, object]

check_autotuning_disabled()[source]#

Return type:: None

get_rebenchmark_threshold()[source]#

Get the effective rebenchmark threshold. Uses the explicit setting if provided, otherwise falls back to the effort profile default.

Returns:: The rebenchmark threshold value.
Return type:: float

ignore_warnings: list[type[BaseWarning]]#: Subtypes of exc.BaseWarning to ignore when compiling. Set HELION_IGNORE_WARNINGS=WarningA,WarningB (names from helion.exc) to configure via env.

index_dtype: dtype | None#: The dtype to use for index variables. Default auto-selects torch.int32 or torch.int64 based on input sizes. Override with HELION_INDEX_DTYPE=<dtype> (or set to ‘auto’).

dot_precision: Literal['tf32', 'tf32x3', 'ieee']#: Precision for dot products, see triton.language.dot. Can be ‘tf32’, ‘tf32x3’, or ‘ieee’.

static_shapes: bool#: If True, use static shapes for all tensors. This is a performance optimization. Set HELION_STATIC_SHAPES=0 to disable.

persistent_reserved_sms: int#: Number of streaming multiprocessors to reserve when launching persistent kernels. Set HELION_PERSISTENT_RESERVED_SMS=N (default 0) or pass persistent_reserved_sms=N to helion.kernel.

autotune_force_persistent: bool#: If True, restrict pid_type choices to persistent kernels only during config selection. Set HELION_AUTOTUNE_FORCE_PERSISTENT=1 to force persistent kernel autotuning globally.

autotune_log_level: int#: Log level for autotuning using Python logging levels. Default is logging.INFO. Use HELION_AUTOTUNE_LOG_LEVEL to override or set 0 to disable output.

autotune_log: str | None#: Base filename for autotune logs. Set HELION_AUTOTUNE_LOG=/tmp/run to write /tmp/run.csv and /tmp/run.log with per-config metrics and debug logs.

autotune_compile_timeout: int#: Timeout for Triton compilation in seconds used for autotuning. Default is 60 seconds.

autotune_precompile: Optional[Literal['spawn', 'fork']]#

‘fork’, ‘spawn’, or falsy/None to disable. Defaults to ‘fork’ on non-Windows platforms.

Type:: Autotuner precompile mode

autotune_precompile_jobs: int | None#: Maximum concurrent Triton precompile processes, default to cpu count.

autotune_random_seed: int#: Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED or a time-based seed.

autotune_accuracy_check: bool#: If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.

autotune_rebenchmark_threshold: float | None#: If a config is within threshold*best_perf, re-benchmark it to avoid outliers. Defaults to effort profile value. Set HELION_REBENCHMARK_THRESHOLD to override.

autotune_progress_bar: bool#: If True, show progress bar during autotuning. Default is True. Set HELION_AUTOTUNE_PROGRESS_BAR=0 to disable.

autotune_max_generations: int | None#: Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).

autotune_ignore_errors: bool#: If True, skip logging and raising autotune errors. Set HELION_AUTOTUNE_IGNORE_ERRORS=1 to enable globally.

print_output_code: bool#: If True, print the output code of the kernel to stderr.

print_repro: bool#: If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.

output_origin_lines: bool#: If True, annotate generated Triton code with source-origin comments. Set HELION_OUTPUT_ORIGIN_LINES=0 to disable.

force_autotune: bool#: If True, force autotuning even if a config is provided.

autotune_config_overrides: dict[str, object]#

4}’.

Type:: Dictionary of config key/value pairs forced during autotuning. Accepts HELION_AUTOTUNE_CONFIG_OVERRIDES=’{“num_warps”

allow_warp_specialize: bool#: If True, allow warp specialization for tl.range calls on CUDA devices.

debug_dtype_asserts: bool#: If True, emit tl.static_assert checks for dtype after each device node.

ref_mode: RefMode#: Reference mode for kernel execution. Can be RefMode.OFF or RefMode.EAGER.

autotuner_fn: AutotunerFunction#: Function to create an autotuner. Override by passing a callable to @helion.kernel(…, autotuner_fn=…).

autotune_effort: Literal['none', 'quick', 'full']#: Autotuning effort preset. One of ‘none’, ‘quick’, ‘full’.

autotune_baseline_fn: Optional[Callable[..., object]]#: Custom baseline function for computing baseline output during autotuning. If provided, this function will be called instead of running the default config. Should have the same signature as the kernel function. Pass as @helion.kernel(…, autotune_baseline_fn=my_baseline_fn).

autotune_baseline_atol: float | None#: Absolute tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_atol=1e-3).

autotune_baseline_rtol: float | None#: Relative tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_rtol=1e-3).

autotune_cache: str#: The name of the autotuner cache class to use. Set HELION_AUTOTUNE_CACHE=StrictLocalAutotuneCache to enable strict caching. Defaults to ‘LocalAutotuneCache’.

autotune_benchmark_fn: Optional[Callable[..., list[float]]]#

list[Callable[[], object]], *, repeat: int, desc: str | None = None) -> list[float]. If None (default), uses the built-in benchmark function.

Type:: Custom benchmark function for rebenchmarking during autotuning. Should have the following signature
Type:: (fns

Overview#

Settings control the compilation process and development environment for Helion kernels.

Key Characteristics#

Not autotuned: Settings remain constant across all kernel configurations
Meta-compilation: Control the compilation process itself, debugging output, and development features
Environment-driven: Often configured via environment variables
Development-focused: Primarily used for debugging, logging, and development workflow optimization

Settings vs Config#

Aspect	Settings	Config
Purpose	Control compilation behavior	Control execution performance
Autotuning	❌ Never autotuned	✅ Automatically optimized
Examples	`print_output_code`, `autotune_effort`	`block_sizes`, `num_warps`
When to use	Development, debugging, environment setup	Performance optimization

Settings can be configured via:

Environment variables
Keyword arguments to @helion.kernel

If both are provided, decorator arguments take precedence.

Note

Helion reads the environment variables for Settings when the @helion.kernel decorator defines the function (typically at import time). One can modify Kernel.settings to change settings for an already defined kernel.

Configuration Examples#

Using Environment Variables#

env HELION_PRINT_OUTPUT_CODE=1  HELION_AUTOTUNE_EFFORT=none my_kernel.py

Using Decorator Arguments#

import logging
import helion
import helion.language as hl

@helion.kernel(
    autotune_effort="none",           # Skip autotuning
    print_output_code=True,            # Debug: show generated Triton code
    print_repro=True,                  # Debug: show Helion kernel code, config, and caller code as a standalone repro script
)
def my_kernel(x: torch.Tensor) -> torch.Tensor:
    result = torch.zeros_like(x)
    for i in hl.grid(x.size(0)):
        result[i] = x[i] * 2
    return result

Settings Reference#

Core Compilation Settings#

Settings.index_dtype: dtype | None#

The dtype to use for index variables. Default auto-selects torch.int32 or torch.int64 based on input sizes. Override with HELION_INDEX_DTYPE=<dtype> (or set to ‘auto’).

The data type used for index variables in generated code. By default Helion auto-selects between torch.int32 and torch.int64 based on whether any input tensor exceeds torch.iinfo(torch.int32).max elements. Override via HELION_INDEX_DTYPE=<dtype> (or set it to auto to keep the automatic behavior).

Settings.dot_precision: Literal['tf32', 'tf32x3', 'ieee']#

Precision for dot products, see triton.language.dot. Can be ‘tf32’, ‘tf32x3’, or ‘ieee’.

Precision mode for dot product operations. Default is "tf32". Controlled by TRITON_F32_DEFAULT environment variable.

Settings.static_shapes: bool#

If True, use static shapes for all tensors. This is a performance optimization. Set HELION_STATIC_SHAPES=0 to disable.

When enabled, tensor shapes are treated as compile-time constants for optimization. Default is True. Set HELION_STATIC_SHAPES=0 the default if you need a compiled kernel instance to serve many shape variants.

Settings.persistent_reserved_sms: int#

Number of streaming multiprocessors to reserve when launching persistent kernels. Set HELION_PERSISTENT_RESERVED_SMS=N (default 0) or pass persistent_reserved_sms=N to helion.kernel.

Reserve this many streaming multiprocessors when launching persistent kernels. Default is 0 (use all SMs). Configure globally with HELION_PERSISTENT_RESERVED_SMS or per-kernel via @helion.kernel(..., persistent_reserved_sms=N).

Autotuning Settings#

Settings.force_autotune: bool#

If True, force autotuning even if a config is provided.

Force autotuning even when explicit configs are provided. Default is False. Controlled by HELION_FORCE_AUTOTUNE=1.

Settings.autotune_force_persistent: bool#

If True, restrict pid_type choices to persistent kernels only during config selection. Set HELION_AUTOTUNE_FORCE_PERSISTENT=1 to force persistent kernel autotuning globally.

Restrict pid_type choices to the persistent strategies ("persistent_blocked" or "persistent_interleaved"). Default is False. Enable globally with HELION_AUTOTUNE_FORCE_PERSISTENT=1 or per kernel via @helion.kernel(..., autotune_force_persistent=True).

Settings.autotune_log_level: int#

Log level for autotuning using Python logging levels. Default is logging.INFO. Use HELION_AUTOTUNE_LOG_LEVEL to override or set 0 to disable output.

Controls verbosity of autotuning output using Python logging levels:

logging.CRITICAL: No autotuning output
logging.WARNING: Only warnings and errors
logging.INFO: Standard progress messages (default)
logging.DEBUG: Verbose debugging output

You can also use 0 to completely disable all autotuning output. Controlled by HELION_AUTOTUNE_LOG_LEVEL.

Settings.autotune_log: str | None#

Base filename for autotune logs. Set HELION_AUTOTUNE_LOG=/tmp/run to write /tmp/run.csv and /tmp/run.log with per-config metrics and debug logs.

When set, Helion writes per-config autotuning telemetry (config index, generation, status, perf, compile time, timestamp, config JSON) to <value>.csv and mirrors the autotune log output to <value>.log for population-based autotuners (currently PatternSearch and DifferentialEvolution). Controlled by HELION_AUTOTUNE_LOG.

Settings.autotune_compile_timeout: int#

Timeout for Triton compilation in seconds used for autotuning. Default is 60 seconds.

Timeout in seconds for Triton compilation during autotuning. Default is 60. Controlled by HELION_AUTOTUNE_COMPILE_TIMEOUT.

Settings.autotune_precompile: Optional[Literal['spawn', 'fork']]#

‘fork’, ‘spawn’, or falsy/None to disable. Defaults to ‘fork’ on non-Windows platforms.

Type:: Autotuner precompile mode

Select the autotuner precompile mode, which adds parallelism and checks for errors/timeouts. "fork" (default) is faster but does not include the error check run, "spawn" runs kernel warm-up in a fresh process including running to check for errors, or None to disables precompile checks altogether. Controlled by HELION_AUTOTUNE_PRECOMPILE.

Settings.autotune_random_seed: int#

Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED or a time-based seed.

Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED if set, otherwise a time-based value.

Settings.autotune_precompile_jobs: int | None#

Maximum concurrent Triton precompile processes, default to cpu count.

Cap the number of concurrent Triton precompile subprocesses. None (default) uses the machine CPU count. Controlled by HELION_AUTOTUNE_PRECOMPILE_JOBS. When using "spawn" precompile mode, Helion may automatically lower this cap if free GPU memory is limited.

Settings.autotune_max_generations: int | None#

Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).

Override the default number of generations set for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).

Lower values result in faster autotuning but may find less optimal configurations.

Settings.autotune_ignore_errors: bool#

If True, skip logging and raising autotune errors. Set HELION_AUTOTUNE_IGNORE_ERRORS=1 to enable globally.

Continue autotuning even when candidate configurations raise recoverable runtime errors (for example, GPU out-of-memory). Default is False. Controlled by HELION_AUTOTUNE_IGNORE_ERRORS.

Settings.autotune_accuracy_check: bool#

If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.

Validate each candidate configuration against a baseline output before accepting it. Default is True. Controlled by HELION_AUTOTUNE_ACCURACY_CHECK.

Settings.autotune_baseline_atol: float | None#

Absolute tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_atol=1e-3).

Absolute tolerance for baseline output comparison during autotune accuracy checks. Default is 1e-2.

Settings.autotune_baseline_rtol: float | None#

Relative tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_rtol=1e-3).

Relative tolerance for baseline output comparison during autotune accuracy checks. Default is 1e-2.

Settings.autotune_rebenchmark_threshold: float | None#

If a config is within threshold*best_perf, re-benchmark it to avoid outliers. Defaults to effort profile value. Set HELION_REBENCHMARK_THRESHOLD to override.

Controls how aggressively Helion re-runs promising configs to avoid outliers. Default is 1.5 (re-benchmark anything within 1.5x of the best).

Settings.autotune_progress_bar: bool#

If True, show progress bar during autotuning. Default is True. Set HELION_AUTOTUNE_PROGRESS_BAR=0 to disable.

Toggle the interactive progress bar during autotuning. Default is True. Controlled by HELION_AUTOTUNE_PROGRESS_BAR.

Settings.autotune_config_overrides: dict[str, object]#

4}’.

Type:: Dictionary of config key/value pairs forced during autotuning. Accepts HELION_AUTOTUNE_CONFIG_OVERRIDES=’{“num_warps”

Dict of config key/value pairs to force during autotuning. Useful for disabling problematic candidates or pinning experimental options. Provide JSON via HELION_AUTOTUNE_CONFIG_OVERRIDES='{"num_warps": 4}' for global overrides.

Settings.autotune_effort: Literal['none', 'quick', 'full']#

Autotuning effort preset. One of ‘none’, ‘quick’, ‘full’.

Select the autotuning effort preset. Available values:

"none" – skip autotuning and run the default configuration.
"quick" – limited search for faster runs with decent performance.
"full" – exhaustive autotuning (current default behavior).

Users can still override individual autotune_* settings; explicit values win over the preset. Controlled by HELION_AUTOTUNE_EFFORT.

Autotuning Cache#

Helion stores the best-performing configs discovered during autotuning in an on-disk cache so subsequent runs can skip the search.

HELION_CACHE_DIR: Override the directory used to store cache entries. Defaults to PyTorch’s torch._inductor cache path (typically /tmp/torchinductor_$USER/helion).
HELION_SKIP_CACHE: Set to 1 to ignore cached entries and force the autotuner to re-run even if a matching artifact exists.

See :class:helion.autotuner.LocalAutotuneCache for details on cache keys and behavior.

Debugging and Development#

Settings.print_output_code: bool#

If True, print the output code of the kernel to stderr.

Print generated Triton code to stderr. Default is False. Controlled by HELION_PRINT_OUTPUT_CODE=1.

Settings.print_repro: bool#

If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.

Print Helion kernel code, config, and caller code to stderr as a standalone repro script. Default is False. Controlled by HELION_PRINT_REPRO=1.

Settings.output_origin_lines: bool#

If True, annotate generated Triton code with source-origin comments. Set HELION_OUTPUT_ORIGIN_LINES=0 to disable.

Annotate generated Triton code with # src[<file>:<line>] comments indicating the originating Helion statements. Default is True. Controlled by HELION_OUTPUT_ORIGIN_LINES (set to 0 to disable).

Settings.ignore_warnings: list[type[BaseWarning]]#

Subtypes of exc.BaseWarning to ignore when compiling. Set HELION_IGNORE_WARNINGS=WarningA,WarningB (names from helion.exc) to configure via env.

List of warning types to suppress during compilation. Default is an empty list. Accepts comma-separated warning class names from helion.exc via HELION_IGNORE_WARNINGS (for example, HELION_IGNORE_WARNINGS=TensorOperationInWrapper).

Settings.debug_dtype_asserts: bool#

If True, emit tl.static_assert checks for dtype after each device node.

Emit tl.static_assert dtype checks after each lowering step. Default is False. Controlled by HELION_DEBUG_DTYPE_ASSERTS.

Device Execution Modes#

Settings.allow_warp_specialize: bool#

If True, allow warp specialization for tl.range calls on CUDA devices.

Allow warp specialization for tl.range calls. Default is True. Controlled by HELION_ALLOW_WARP_SPECIALIZE.

Settings.ref_mode: RefMode#

Reference mode for kernel execution. Can be RefMode.OFF or RefMode.EAGER.

Select the reference execution strategy. RefMode.OFF runs compiled kernels (default); RefMode.EAGER runs the interpreter for debugging. Controlled by HELION_INTERPRET.

Autotuner Hooks#

Settings.autotuner_fn: AutotunerFunction#

Function to create an autotuner. Override by passing a callable to @helion.kernel(…, autotuner_fn=…).

Override the callable that constructs autotuner instances. Accepts the same signature as helion.runtime.settings.default_autotuner_fn(). Pass a replacement callable via @helion.kernel(..., autotuner_fn=...) or helion.kernel(autotuner_fn=...) at definition time.

Settings.autotune_benchmark_fn: Optional[Callable[..., list[float]]]#