Settings#
The Settings class controls compilation behavior and debugging options for Helion kernels.
- class helion.Settings(**settings)[source]#
Bases:
_SettingsSettings can be passed to hl.kernel as kwargs and control the behavior of the compilation process. Unlike a Config, settings are not auto-tuned and set by the user.
- Parameters:
settings (
object) –
- __init__(**settings)[source]#
Initialize the Settings object with the provided dictionary of settings.
- Parameters:
settings (
object) –
- get_rebenchmark_threshold()[source]#
Get the effective rebenchmark threshold. Uses the explicit setting if provided, otherwise falls back to the effort profile default.
- Returns:
The rebenchmark threshold value.
- Return type:
-
ignore_warnings:
list[type[BaseWarning]]# Subtypes of exc.BaseWarning to ignore when compiling. Set HELION_IGNORE_WARNINGS=WarningA,WarningB (names from helion.exc) to configure via env.
-
index_dtype:
dtype# The dtype to use for index variables. Default is torch.int32. Override with HELION_INDEX_DTYPE=torch.int64, etc.
-
dot_precision:
Literal['tf32','tf32x3','ieee']# Precision for dot products, see triton.language.dot. Can be ‘tf32’, ‘tf32x3’, or ‘ieee’.
-
static_shapes:
bool# If True, use static shapes for all tensors. This is a performance optimization. Set HELION_STATIC_SHAPES=0 to disable.
-
autotune_log_level:
int# Log level for autotuning using Python logging levels. Default is logging.INFO. Use HELION_AUTOTUNE_LOG_LEVEL to override or set 0 to disable output.
-
autotune_compile_timeout:
int# Timeout for Triton compilation in seconds used for autotuning. Default is 60 seconds.
-
autotune_precompile:
Optional[Literal['spawn','fork']]# ‘fork’, ‘spawn’, or falsy/None to disable. Defaults to ‘fork’ on non-Windows platforms.
- Type:
Autotuner precompile mode
-
autotune_precompile_jobs:
int|None# Maximum concurrent Triton precompile processes, default to cpu count.
-
autotune_random_seed:
int# Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED or a time-based seed.
-
autotune_accuracy_check:
bool# If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.
-
autotune_rebenchmark_threshold:
float|None# If a config is within threshold*best_perf, re-benchmark it to avoid outliers. Defaults to effort profile value. Set HELION_REBENCHMARK_THRESHOLD to override.
-
autotune_progress_bar:
bool# If True, show progress bar during autotuning. Default is True. Set HELION_AUTOTUNE_PROGRESS_BAR=0 to disable.
-
autotune_max_generations:
int|None# Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).
-
autotune_ignore_errors:
bool# If True, skip logging and raising autotune errors. Set HELION_AUTOTUNE_IGNORE_ERRORS=1 to enable globally.
-
print_repro:
bool# If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.
-
output_origin_lines:
bool# If True, annotate generated Triton code with source-origin comments. Set HELION_OUTPUT_ORIGIN_LINES=0 to disable.
-
autotune_config_overrides:
dict[str,object]# 4}’.
- Type:
Dictionary of config key/value pairs forced during autotuning. Accepts HELION_AUTOTUNE_CONFIG_OVERRIDES=’{“num_warps”
-
ref_mode:
RefMode# Reference mode for kernel execution. Can be RefMode.OFF or RefMode.EAGER.
-
autotuner_fn:
AutotunerFunction# Function to create an autotuner. Override by passing a callable to @helion.kernel(…, autotuner_fn=…).
-
autotune_effort:
Literal['none','quick','full']# Autotuning effort preset. One of ‘none’, ‘quick’, ‘full’.
-
autotune_baseline_fn:
Optional[Callable[...,object]]# Custom baseline function for computing baseline output during autotuning. If provided, this function will be called instead of running the default config. Should have the same signature as the kernel function. Pass as @helion.kernel(…, autotune_baseline_fn=my_baseline_fn).
Overview#
Settings control the compilation process and development environment for Helion kernels.
Key Characteristics#
Not autotuned: Settings remain constant across all kernel configurations
Meta-compilation: Control the compilation process itself, debugging output, and development features
Environment-driven: Often configured via environment variables
Development-focused: Primarily used for debugging, logging, and development workflow optimization
Settings vs Config#
Aspect |
Settings |
Config |
|---|---|---|
Purpose |
Control compilation behavior |
Control execution performance |
Autotuning |
❌ Never autotuned |
✅ Automatically optimized |
Examples |
|
|
When to use |
Development, debugging, environment setup |
Performance optimization |
Settings can be configured via:
Environment variables
Keyword arguments to
@helion.kernel
If both are provided, decorator arguments take precedence.
Note
Helion reads the environment variables for Settings when the
@helion.kernel decorator defines the function (typically at import
time). One can modify Kernel.settings to change settings
for an already defined kernel.
Configuration Examples#
Using Environment Variables#
env HELION_PRINT_OUTPUT_CODE=1 HELION_AUTOTUNE_EFFORT=none my_kernel.py
Using Decorator Arguments#
import logging
import helion
import helion.language as hl
@helion.kernel(
autotune_effort="none", # Skip autotuning
print_output_code=True, # Debug: show generated Triton code
print_repro=True, # Debug: show Helion kernel code, config, and caller code as a standalone repro script
)
def my_kernel(x: torch.Tensor) -> torch.Tensor:
result = torch.zeros_like(x)
for i in hl.grid(x.size(0)):
result[i] = x[i] * 2
return result
Settings Reference#
Core Compilation Settings#
-
Settings.index_dtype:
dtype# The dtype to use for index variables. Default is torch.int32. Override with HELION_INDEX_DTYPE=torch.int64, etc.
The data type used for index variables in generated code. Default is
torch.int32. Override viaHELION_INDEX_DTYPE=int64(or anytorch.<dtype>name).
-
Settings.dot_precision:
Literal['tf32','tf32x3','ieee']# Precision for dot products, see triton.language.dot. Can be ‘tf32’, ‘tf32x3’, or ‘ieee’.
Precision mode for dot product operations. Default is
"tf32". Controlled byTRITON_F32_DEFAULTenvironment variable.
-
Settings.static_shapes:
bool# If True, use static shapes for all tensors. This is a performance optimization. Set HELION_STATIC_SHAPES=0 to disable.
When enabled, tensor shapes are treated as compile-time constants for optimization. Default is
True. SetHELION_STATIC_SHAPES=0the default if you need a compiled kernel instance to serve many shape variants.
Autotuning Settings#
-
Settings.force_autotune:
bool# If True, force autotuning even if a config is provided.
Force autotuning even when explicit configs are provided. Default is
False. Controlled byHELION_FORCE_AUTOTUNE=1.
-
Settings.autotune_log_level:
int# Log level for autotuning using Python logging levels. Default is logging.INFO. Use HELION_AUTOTUNE_LOG_LEVEL to override or set 0 to disable output.
Controls verbosity of autotuning output using Python logging levels:
logging.CRITICAL: No autotuning outputlogging.WARNING: Only warnings and errorslogging.INFO: Standard progress messages (default)logging.DEBUG: Verbose debugging output
You can also use
0to completely disable all autotuning output. Controlled byHELION_AUTOTUNE_LOG_LEVEL.
-
Settings.autotune_compile_timeout:
int# Timeout for Triton compilation in seconds used for autotuning. Default is 60 seconds.
Timeout in seconds for Triton compilation during autotuning. Default is
60. Controlled byHELION_AUTOTUNE_COMPILE_TIMEOUT.
-
Settings.autotune_precompile:
Optional[Literal['spawn','fork']]# ‘fork’, ‘spawn’, or falsy/None to disable. Defaults to ‘fork’ on non-Windows platforms.
- Type:
Autotuner precompile mode
Select the autotuner precompile mode, which adds parallelism and checks for errors/timeouts.
"fork"(default) is faster but does not include the error check run,"spawn"runs kernel warm-up in a fresh process including running to check for errors, or None to disables precompile checks altogether. Controlled byHELION_AUTOTUNE_PRECOMPILE.
-
Settings.autotune_random_seed:
int# Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED or a time-based seed.
Seed used for autotuner random number generation. Defaults to
HELION_AUTOTUNE_RANDOM_SEEDif set, otherwise a time-based value.
-
Settings.autotune_precompile_jobs:
int|None# Maximum concurrent Triton precompile processes, default to cpu count.
Cap the number of concurrent Triton precompile subprocesses.
None(default) uses the machine CPU count. Controlled byHELION_AUTOTUNE_PRECOMPILE_JOBS. When using"spawn"precompile mode, Helion may automatically lower this cap if free GPU memory is limited.
-
Settings.autotune_max_generations:
int|None# Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).
Override the default number of generations set for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).
Lower values result in faster autotuning but may find less optimal configurations.
-
Settings.autotune_ignore_errors:
bool# If True, skip logging and raising autotune errors. Set HELION_AUTOTUNE_IGNORE_ERRORS=1 to enable globally.
Continue autotuning even when candidate configurations raise recoverable runtime errors (for example, GPU out-of-memory). Default is
False. Controlled byHELION_AUTOTUNE_IGNORE_ERRORS.
-
Settings.autotune_accuracy_check:
bool# If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.
Validate each candidate configuration against a baseline output before accepting it. Default is
True. Controlled byHELION_AUTOTUNE_ACCURACY_CHECK.
-
Settings.autotune_rebenchmark_threshold:
float|None# If a config is within threshold*best_perf, re-benchmark it to avoid outliers. Defaults to effort profile value. Set HELION_REBENCHMARK_THRESHOLD to override.
Controls how aggressively Helion re-runs promising configs to avoid outliers. Default is
1.5(re-benchmark anything within 1.5x of the best).
-
Settings.autotune_progress_bar:
bool# If True, show progress bar during autotuning. Default is True. Set HELION_AUTOTUNE_PROGRESS_BAR=0 to disable.
Toggle the interactive progress bar during autotuning. Default is
True. Controlled byHELION_AUTOTUNE_PROGRESS_BAR.
-
Settings.autotune_config_overrides:
dict[str,object]# 4}’.
- Type:
Dictionary of config key/value pairs forced during autotuning. Accepts HELION_AUTOTUNE_CONFIG_OVERRIDES=’{“num_warps”
Dict of config key/value pairs to force during autotuning. Useful for disabling problematic candidates or pinning experimental options. Provide JSON via
HELION_AUTOTUNE_CONFIG_OVERRIDES='{"num_warps": 4}'for global overrides.
-
Settings.autotune_effort:
Literal['none','quick','full']# Autotuning effort preset. One of ‘none’, ‘quick’, ‘full’.
Select the autotuning effort preset. Available values:
"none"– skip autotuning and run the default configuration."quick"– limited search for faster runs with decent performance."full"– exhaustive autotuning (current default behavior).
Users can still override individual
autotune_*settings; explicit values win over the preset. Controlled byHELION_AUTOTUNE_EFFORT.
Autotuning Cache#
Helion stores the best-performing configs discovered during autotuning in an on-disk cache so subsequent runs can skip the search.
HELION_CACHE_DIR: Override the directory used to store cache entries. Defaults to PyTorch’storch._inductorcache path (typically/tmp/torchinductor_$USER/helion).HELION_SKIP_CACHE: Set to1to ignore cached entries and force the autotuner to re-run even if a matching artifact exists.
See :class:helion.autotuner.LocalAutotuneCache for details on cache keys and behavior.
Debugging and Development#
-
Settings.print_output_code:
bool# If True, print the output code of the kernel to stderr.
Print generated Triton code to stderr. Default is
False. Controlled byHELION_PRINT_OUTPUT_CODE=1.
-
Settings.print_repro:
bool# If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.
Print Helion kernel code, config, and caller code to stderr as a standalone repro script. Default is
False. Controlled byHELION_PRINT_REPRO=1.
-
Settings.output_origin_lines:
bool# If True, annotate generated Triton code with source-origin comments. Set HELION_OUTPUT_ORIGIN_LINES=0 to disable.
Annotate generated Triton code with
# src[<file>:<line>]comments indicating the originating Helion statements. Default isTrue. Controlled byHELION_OUTPUT_ORIGIN_LINES(set to0to disable).
-
Settings.ignore_warnings:
list[type[BaseWarning]]# Subtypes of exc.BaseWarning to ignore when compiling. Set HELION_IGNORE_WARNINGS=WarningA,WarningB (names from helion.exc) to configure via env.
List of warning types to suppress during compilation. Default is an empty list. Accepts comma-separated warning class names from
helion.excviaHELION_IGNORE_WARNINGS(for example,HELION_IGNORE_WARNINGS=TensorOperationInWrapper).
Device Execution Modes#
-
Settings.allow_warp_specialize:
bool# If True, allow warp specialization for tl.range calls on CUDA devices.
Allow warp specialization for
tl.rangecalls. Default isTrue. Controlled byHELION_ALLOW_WARP_SPECIALIZE.
-
Settings.ref_mode:
RefMode# Reference mode for kernel execution. Can be RefMode.OFF or RefMode.EAGER.
Select the reference execution strategy.
RefMode.OFFruns compiled kernels (default);RefMode.EAGERruns the interpreter for debugging. Controlled byHELION_INTERPRET.
Autotuner Hooks#
-
Settings.autotuner_fn:
AutotunerFunction# Function to create an autotuner. Override by passing a callable to @helion.kernel(…, autotuner_fn=…).
Override the callable that constructs autotuner instances. Accepts the same signature as
helion.runtime.settings.default_autotuner_fn(). Pass a replacement callable via@helion.kernel(..., autotuner_fn=...)orhelion.kernel(autotuner_fn=...)at definition time.
Built-in values for HELION_AUTOTUNER include "PatternSearch", "DifferentialEvolutionSearch", "FiniteSearch", and "RandomSearch".
Functions#
Environment Variable Reference#
Environment Variable |
Maps To |
Description |
|---|---|---|
|
|
Sets default floating-point precision for Triton dot products ( |
|
|
Choose the default index dtype (accepts any |
|
|
Set to |
|
|
Force the autotuner to run even when explicit configs are provided. |
|
|
Hard-disable autotuning; kernels must supply explicit configs when this is |
|
|
Maximum seconds to wait for Triton compilation during autotuning. |
|
|
Adjust logging verbosity; accepts names like |
|
|
Select the autotuner precompile mode ( |
|
|
Cap the number of concurrent Triton precompile subprocesses. |
|
|
Seed used for randomized autotuning searches. |
|
|
Upper bound on generations for Pattern Search and Differential Evolution. |
|
|
Toggle baseline validation for candidate configs. |
|
|
Select autotuning preset ( |
|
|
Re-run configs whose performance is within a multiplier of the current best. |
|
|
Enable or disable the progress bar UI during autotuning. |
|
|
Continue autotuning even when recoverable runtime errors occur. |
|
|
Supply JSON forcing particular autotuner config key/value pairs. |
|
|
Override the on-disk directory used for cached autotuning artifacts. |
|
|
When set to |
|
|
When set to |
|
|
Print generated Triton code to stderr for inspection. |
|
|
Print Helion kernel code, config, and caller code to stderr as a standalone repro script. |
|
|
Include |
|
|
Comma-separated warning names defined in |
|
|
Permit warp-specialized code generation for |
|
|
Inject dtype assertions after each lowering step. |
|
|
Run kernels through the reference interpreter when set to |
|
|
Select which autotuner implementation to instantiate ( |
See Also#
Config - Kernel optimization parameters
Exceptions - Exception handling and debugging
Autotuner Module - Autotuning configuration