API Reference

Complete API documentation for Helion.

Kernel Creation and Control

Everything you need to create and configure Helion kernels using the helion.kernel() decorator:

Language Constructs

The helion.language module contains DSL constructs for authoring kernels:

Debugging and Utilities

Advanced Topics

Quick Reference

Main Functions

kernel

Decorator to create a Kernel object from a Python function.

set_default_settings

Set the default settings for the current thread and return a context manager that restores the previous settings upon exit.

Config

Settings

Settings can be passed to hl.kernel as kwargs and control the behavior of the compilation process.

Language Functions

tile

Break up an iteration space defined by a size or sequence of sizes into tiles.

grid

Iterate over individual indices of the given iteration space.

static_range

Create a range that gets unrolled at compile time by iterating over constant integer values.

load

Load a value from a tensor using a list of indices.

store

Store a value to a tensor using a list of indices.

atomic_add

Atomically add a value to a target tensor.

atomic_and

Atomically apply bitwise AND with value to target[index].

atomic_or

Atomically apply bitwise OR with value to target[index].

atomic_xor

Atomically apply bitwise XOR with value to target[index].

atomic_xchg

Atomically exchange (set) a value at target[index].

atomic_max

Atomically update target[index] with the maximum of current value and value.

atomic_min

Atomically update target[index] with the minimum of current value and value.

atomic_cas

Atomically compare-and-swap a value at target[index].

device_print

Print values from device code.

signal

Set global memory barriers.

wait

Wait for global memory barriers.

stacktensor_like

Creates a StackTensor from a tensor of data pointers (dev_ptrs) pointing to tensors alike residing at different memory locations.

zeros

Return a device-tensor filled with zeros.

full

Create a device-tensor filled with a specified value.

arange

Same as torch.arange(), but defaults to same device as the current kernel.

subscript

Equivalent to tensor[index] where tensor is a kernel-tensor (not a host-tensor).

reduce

Applies a reduction operation along a specified dimension or all dimensions.

associative_scan

Applies an associative scan operation along a specified dimension.

cumsum

Compute the cumulative sum along a specified dimension.

cumprod

Compute the cumulative product along a specified dimension.

dot

Performs a matrix multiplication of tensors with support for multiple dtypes.

inline_asm_elementwise

Execute inline assembly over a tensor.

register_block_size

Explicitly register a block size that should be autotuned and can be used for allocations and inside hl.tile(..., block_size=...).

register_reduction_dim

Explicitly register a reduction dimension that should be used for reduction operations.

register_tunable

Register a tunable parameter for autotuning.

constexpr

alias of ConstExpr

specialize

Turn a dynamic shape into a compile-time constant.

### Language Classes

```{eval-rst} .. currentmodule:: helion.language

Tile

This class should not be instantiated directly, it is the result of hl.tile(...) and represents a single tile of the iteration space.

StackTensor

This class should not be instantiated directly.

Tile Helpers

tile_index

Retrieve the index (a 1D tensor containing offsets) of the given tile.

tile_begin

Retrieve the start offset of the given tile.

tile_end

Retrieve the end offset of the given tile.

tile_block_size

Retrieve block size of a given tile, usually set the autotuner.

tile_id

Retrieve tile_id of a given tile or list of tiles.