helion.Settings#
- class helion.Settings(**settings)[source]#
Settings can be passed to hl.kernel as kwargs and control the behavior of the compilation process. Unlike a Config, settings are not auto-tuned and set by the user.
- Parameters:
settings (
object) –
- __init__(**settings)[source]#
Initialize the Settings object with the provided dictionary of settings.
- Parameters:
settings (
object) –
Methods
__init__(**settings)Initialize the Settings object with the provided dictionary of settings.
- rtype:
Get the effective rebenchmark threshold.
to_dict()Convert the Settings object to a dictionary.
Attributes
Subtypes of exc.BaseWarning to ignore when compiling.
The dtype to use for index variables.
Precision for dot products, see triton.language.dot.
If True, use static shapes for all tensors.
Number of streaming multiprocessors to reserve when launching persistent kernels.
If True, restrict pid_type choices to persistent kernels only during config selection.
Log level for autotuning using Python logging levels.
Base filename for autotune logs.
Timeout for Triton compilation in seconds used for autotuning.
'fork', 'spawn', or falsy/None to disable.
Maximum concurrent Triton precompile processes, default to cpu count.
Seed used for autotuner random number generation.
If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.
If a config is within threshold*best_perf, re-benchmark it to avoid outliers.
If True, show progress bar during autotuning.
Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).
If True, skip logging and raising autotune errors.
If True, print the output code of the kernel to stderr.
If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.
If True, annotate generated Triton code with source-origin comments.
If True, force autotuning even if a config is provided.
4}'.
If True, allow warp specialization for tl.range calls on CUDA devices.
If True, emit tl.static_assert checks for dtype after each device node.
Reference mode for kernel execution.
Function to create an autotuner.
Autotuning effort preset.
Custom baseline function for computing baseline output during autotuning.
Absolute tolerance for baseline output comparison during autotuning accuracy checks.
Relative tolerance for baseline output comparison during autotuning accuracy checks.
The name of the autotuner cache class to use.
list[Callable[[], object]], *, repeat: int, desc: str | None = None) -> list[float]. If None (default), uses the built-in benchmark function.
- __init__(**settings)[source]#
Initialize the Settings object with the provided dictionary of settings.
- Parameters:
settings (
object) –
- get_rebenchmark_threshold()[source]#
Get the effective rebenchmark threshold. Uses the explicit setting if provided, otherwise falls back to the effort profile default.
- Returns:
The rebenchmark threshold value.
- Return type:
-
ignore_warnings:
list[type[BaseWarning]]# Subtypes of exc.BaseWarning to ignore when compiling. Set HELION_IGNORE_WARNINGS=WarningA,WarningB (names from helion.exc) to configure via env.
-
index_dtype:
dtype|None# The dtype to use for index variables. Default auto-selects torch.int32 or torch.int64 based on input sizes. Override with HELION_INDEX_DTYPE=<dtype> (or set to ‘auto’).
-
dot_precision:
Literal['tf32','tf32x3','ieee']# Precision for dot products, see triton.language.dot. Can be ‘tf32’, ‘tf32x3’, or ‘ieee’.
-
static_shapes:
bool# If True, use static shapes for all tensors. This is a performance optimization. Set HELION_STATIC_SHAPES=0 to disable.
-
persistent_reserved_sms:
int# Number of streaming multiprocessors to reserve when launching persistent kernels. Set HELION_PERSISTENT_RESERVED_SMS=N (default 0) or pass persistent_reserved_sms=N to helion.kernel.
-
autotune_force_persistent:
bool# If True, restrict pid_type choices to persistent kernels only during config selection. Set HELION_AUTOTUNE_FORCE_PERSISTENT=1 to force persistent kernel autotuning globally.
-
autotune_log_level:
int# Log level for autotuning using Python logging levels. Default is logging.INFO. Use HELION_AUTOTUNE_LOG_LEVEL to override or set 0 to disable output.
-
autotune_log:
str|None# Base filename for autotune logs. Set HELION_AUTOTUNE_LOG=/tmp/run to write /tmp/run.csv and /tmp/run.log with per-config metrics and debug logs.
-
autotune_compile_timeout:
int# Timeout for Triton compilation in seconds used for autotuning. Default is 60 seconds.
-
autotune_precompile:
Optional[Literal['spawn','fork']]# ‘fork’, ‘spawn’, or falsy/None to disable. Defaults to ‘fork’ on non-Windows platforms.
- Type:
Autotuner precompile mode
-
autotune_precompile_jobs:
int|None# Maximum concurrent Triton precompile processes, default to cpu count.
-
autotune_random_seed:
int# Seed used for autotuner random number generation. Defaults to HELION_AUTOTUNE_RANDOM_SEED or a time-based seed.
-
autotune_accuracy_check:
bool# If True, validate candidate configs against the baseline kernel output before accepting them during autotuning.
-
autotune_rebenchmark_threshold:
float|None# If a config is within threshold*best_perf, re-benchmark it to avoid outliers. Defaults to effort profile value. Set HELION_REBENCHMARK_THRESHOLD to override.
-
autotune_progress_bar:
bool# If True, show progress bar during autotuning. Default is True. Set HELION_AUTOTUNE_PROGRESS_BAR=0 to disable.
-
autotune_max_generations:
int|None# Override the maximum number of generations for Pattern Search and Differential Evolution Search autotuning algorithms with HELION_AUTOTUNE_MAX_GENERATIONS=N or @helion.kernel(autotune_max_generations=N).
-
autotune_ignore_errors:
bool# If True, skip logging and raising autotune errors. Set HELION_AUTOTUNE_IGNORE_ERRORS=1 to enable globally.
-
print_repro:
bool# If True, print Helion kernel code, config, and caller code to stderr as a standalone repro script.
-
output_origin_lines:
bool# If True, annotate generated Triton code with source-origin comments. Set HELION_OUTPUT_ORIGIN_LINES=0 to disable.
-
autotune_config_overrides:
dict[str,object]# 4}’.
- Type:
Dictionary of config key/value pairs forced during autotuning. Accepts HELION_AUTOTUNE_CONFIG_OVERRIDES=’{“num_warps”
-
ref_mode:
RefMode# Reference mode for kernel execution. Can be RefMode.OFF or RefMode.EAGER.
-
autotuner_fn:
AutotunerFunction# Function to create an autotuner. Override by passing a callable to @helion.kernel(…, autotuner_fn=…).
-
autotune_effort:
Literal['none','quick','full']# Autotuning effort preset. One of ‘none’, ‘quick’, ‘full’.
-
autotune_baseline_fn:
Optional[Callable[...,object]]# Custom baseline function for computing baseline output during autotuning. If provided, this function will be called instead of running the default config. Should have the same signature as the kernel function. Pass as @helion.kernel(…, autotune_baseline_fn=my_baseline_fn).
-
autotune_baseline_atol:
float|None# Absolute tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_atol=1e-3).
-
autotune_baseline_rtol:
float|None# Relative tolerance for baseline output comparison during autotuning accuracy checks. Defaults to 1e-2, or 0.0 for fp8 dtypes (automatic bitwise comparison). Pass as @helion.kernel(…, autotune_baseline_rtol=1e-3).
-
autotune_cache:
str# The name of the autotuner cache class to use. Set HELION_AUTOTUNE_CACHE=StrictLocalAutotuneCache to enable strict caching. Defaults to ‘LocalAutotuneCache’.
-
autotune_benchmark_fn:
Optional[Callable[...,list[float]]]# list[Callable[[], object]], *, repeat: int, desc: str | None = None) -> list[float]. If None (default), uses the built-in benchmark function.
- Type:
Custom benchmark function for rebenchmarking during autotuning. Should have the following signature
- Type:
(fns