Rate this Page

★ ★ ★ ★ ★

Autotuner Module#

The helion.autotuner module provides automatic optimization of kernel configurations.

Autotuning effort can be adjusted via :attr:helion.Settings.autotune_effort, which configures how much each algorithm explores ("none" disables autotuning, "quick" runs a smaller search, "full" uses the full search budget). Users may still override individual autotuning parameters if they need finer control.

Configuration Classes#

Config#

class helion.runtime.config.Config(*, block_sizes=None, loop_orders=None, flatten_loops=None, l2_groupings=None, reduction_loops=None, range_unroll_factors=None, range_warp_specializes=None, range_num_stages=None, range_multi_buffers=None, range_flattens=None, static_ranges=None, load_eviction_policies=None, num_warps=None, num_stages=None, pid_type=None, indexing=None, **kwargs)[source]#

Parameters:

block_sizes (list[int] | None) –
loop_orders (list[list[int]] | None) –
flatten_loops (list[bool] | None) –
l2_groupings (list[int] | None) –
reduction_loops (list[int | None] | None) –
range_unroll_factors (list[int] | None) –
range_warp_specializes (list[bool | None] | None) –
range_num_stages (list[int] | None) –
range_multi_buffers (list[bool | None] | None) –
range_flattens (list[bool | None] | None) –
static_ranges (list[bool] | None) –
load_eviction_policies (list[Literal['', 'first', 'last']] | None) –
num_warps (int | None) –
num_stages (int | None) –
pid_type (Optional[Literal['flat', 'xyz', 'persistent_blocked', 'persistent_interleaved']]) –
indexing (Union[Literal['pointer', 'tensor_descriptor', 'block_ptr'], list[Literal['pointer', 'tensor_descriptor', 'block_ptr']], None]) –
kwargs (object) –

__init__(*, block_sizes=None, loop_orders=None, flatten_loops=None, l2_groupings=None, reduction_loops=None, range_unroll_factors=None, range_warp_specializes=None, range_num_stages=None, range_multi_buffers=None, range_flattens=None, static_ranges=None, load_eviction_policies=None, num_warps=None, num_stages=None, pid_type=None, indexing=None, **kwargs)[source]#

Initialize a Config object.

Parameters:

block_sizes (list[int] | None) – Controls tile sizes for hl.tile invocations.
loop_orders (list[list[int]] | None) – Permutes iteration order of tiles.
l2_groupings (list[int] | None) – Reorders program IDs for L2 cache locality.
reduction_loops (list[int | None] | None) – Configures reduction loop behavior.
range_unroll_factors (list[int] | None) – Loop unroll factors for tl.range calls.
range_warp_specializes (list[bool | None] | None) – Warp specialization for tl.range calls.
range_num_stages (list[int] | None) – Number of stages for tl.range calls.
range_multi_buffers (list[bool | None] | None) – Controls disallow_acc_multi_buffer for tl.range calls.
range_flattens (list[bool | None] | None) – Controls flatten parameter for tl.range calls.
static_ranges (list[bool] | None) – Whether to use tl.static_range instead tl.range.
load_eviction_policies (list[Literal['', 'first', 'last']] | None) – Eviction policies for load operations (“”, “first”, “last”).
num_warps (int | None) – Number of warps per block.
num_stages (int | None) – Number of stages for software pipelining.
pid_type (Optional[Literal['flat', 'xyz', 'persistent_blocked', 'persistent_interleaved']]) – Program ID type strategy (“flat”, “xyz”, “persistent_blocked”, “persistent_interleaved”).
indexing (Union[Literal['pointer', 'tensor_descriptor', 'block_ptr'], list[Literal['pointer', 'tensor_descriptor', 'block_ptr']], None]) –
Indexing strategy for load and store operations. Can be: - A single strategy string (all loads/stores use this strategy):

indexing=”block_ptr” # backward compatible
- A list of strategies (one per load/store operation, must specify all): indexing=[“pointer”, “block_ptr”, “tensor_descriptor”]
- Empty/omitted (all loads/stores default to “pointer”)
Valid strategies: “pointer”, “tensor_descriptor”, “block_ptr”
**kwargs (object) – Additional user-defined configuration parameters.
flatten_loops (list[bool] | None) –

config: dict[str, object]#

to_json()[source]#

Convert the config to a JSON string.

Return type:: str

classmethod from_json(json_str)[source]#

Create a Config object from a JSON string.

Parameters:: json_str (str) –
Return type:: Config

save(path)[source]#

Save the config to a JSON file.

Parameters:: path (str | Path) –
Return type:: None

classmethod load(path)[source]#

Load a config from a JSON file.

Parameters:: path (str | Path) –
Return type:: Config

property block_sizes: list[int]#

property loop_orders: list[list[int]]#

property flatten_loops: list[bool]#

property reduction_loops: list[int | None]#

property num_warps: int#

property num_stages: int#

property l2_groupings: list[int]#

property pid_type: Literal['flat', 'xyz', 'persistent_blocked', 'persistent_interleaved']#

property range_unroll_factors: list[int]#

property range_warp_specializes: list[bool | None]#

property range_num_stages: list[int]#

property range_multi_buffers: list[bool | None]#

property range_flattens: list[bool | None]#

property static_ranges: list[bool]#

property load_eviction_policies: list[Literal['', 'first', 'last']]#

property indexing: Literal['pointer', 'tensor_descriptor', 'block_ptr'] | list[Literal['pointer', 'tensor_descriptor', 'block_ptr']]#

Search Algorithms#

The autotuner supports multiple search strategies:

Pattern Search#

class helion.autotuner.pattern_search.PatternSearch(kernel, args, *, initial_population=100, copies=5, max_generations=20, min_improvement_delta=0.001)[source]#

Search that explores single-parameter perturbations around the current best.

Parameters:

kernel (BoundKernel) –
args (Sequence[object]) –
initial_population (int) –
copies (int) –
max_generations (int) –
min_improvement_delta (float) –

__init__(kernel, args, *, initial_population=100, copies=5, max_generations=20, min_improvement_delta=0.001)[source]#

Create a PatternSearch autotuner.

Parameters:

kernel (BoundKernel) – The kernel to be autotuned.
args (Sequence[object]) – The arguments to be passed to the kernel.
initial_population (int) – The number of random configurations to generate for the initial population.
copies (int) – Count of top Configs to run pattern search on.
max_generations (int) – The maximum number of generations to run.
min_improvement_delta (float) – Relative stop threshold; stop if abs(best/current - 1) < this.

Differential Evolution#

class helion.autotuner.differential_evolution.DifferentialEvolutionSearch(kernel, args, population_size=40, max_generations=40, crossover_rate=0.8, immediate_update=None)[source]#

A search strategy that uses differential evolution to find the best config.

Parameters:

kernel (BoundKernel) –
args (Sequence[object]) –
population_size (int) –
max_generations (int) –
crossover_rate (float) –
immediate_update (bool | None) –

__init__(kernel, args, population_size=40, max_generations=40, crossover_rate=0.8, immediate_update=None)[source]#

Initialize the PopulationBasedSearch object.

Parameters:

kernel (BoundKernel) – The kernel to be tuned.
args (Sequence[object]) – The arguments to be passed to the kernel.
population_size (int) –
max_generations (int) –
crossover_rate (float) –
immediate_update (bool | None) –

mutate(x_index)[source]#

Parameters:: x_index (int) –
Return type:: list[object]

initial_two_generations()[source]#

Return type:: None

iter_candidates()[source]#

Return type:: Iterator[tuple[int, PopulationMember]]

evolve_population()[source]#

Return type:: int

Random Search#

class helion.autotuner.random_search.RandomSearch(kernel, args, count=1000)[source]#

Implements a random search algorithm for kernel autotuning.

This class generates a specified number of random configurations for a given kernel and evaluates their performance.

Inherits from:: FiniteSearch: A base class for finite configuration searches.

kernel#

The kernel to be tuned.

Type:: BoundKernel

args#

The arguments to be passed to the kernel.

Type:: Sequence[object]

count#

The number of random configurations to generate.

Type:: int

Parameters:

kernel (BoundKernel) –
args (Sequence[object]) –
count (int) –

__init__(kernel, args, count=1000)[source]#

Initialize the BaseSearch object.

Parameters:

kernel (BoundKernel) – The kernel to be tuned.
args (Sequence[object]) – The arguments to be passed to the kernel.
count (int) –

Finite Search#

class helion.autotuner.finite_search.FiniteSearch(kernel, args, configs=None)[source]#

Search over a given list of configs, returning the best one.

This strategy is similar to triton.Autotune, and is the default if you specify helion.kernel(configs=[…]).

Parameters:

kernel (BoundKernel) –
args (Sequence[object]) –
configs (list[Config] | None) –

__init__(kernel, args, configs=None)[source]#

Initialize the BaseSearch object.

Parameters:

kernel (BoundKernel) – The kernel to be tuned.
args (Sequence[object]) – The arguments to be passed to the kernel.
configs (list[Config] | None) –

Local Cache#

class helion.autotuner.local_cache.LocalAutotuneCache(autotuner)[source]#

This class implements the local autotune cache, storing the best config artifact on the local file system either by default on torch’s cache directory, or at a user specified HELION_CACHE_DIR directory. It uses the LooseAutotuneCacheKey implementation for the cache key which takes into account device and source code properties, but does not account for library level code changes such as Triton, Helion or PyTorch. Use StrictLocalAutotuneCache to consider these properties.

Parameters:: autotuner (BaseSearch) –

__init__(autotuner)[source]#

Parameters:: autotuner (BaseSearch) –

get()[source]#

Return type:: Config | None

put(config)[source]#

Parameters:: config (Config) –
Return type:: None

class helion.autotuner.local_cache.StrictLocalAutotuneCache(autotuner)[source]#

Stricter implementation of the local autotune cache, which takes into account library level code changes such as Triton, Helion or PyTorch.

Parameters:: autotuner (BaseSearch) –