Rate this Page

API Reference#

Complete API documentation for Helion.

Kernel Creation and Control#

Everything you need to create and configure Helion kernels using the helion.kernel() decorator:

Language Constructs#

The helion.language module contains DSL constructs for authoring kernels:

Debugging and Utilities#

Advanced Topics#

Quick Reference#

Main Functions#

kernel

Decorator to create a Kernel object from a Python function.

Config

param block_sizes:

Settings

Settings can be passed to hl.kernel as kwargs and control the behavior of the compilation process.

Language Functions#

tile

Break up an iteration space defined by a size or sequence of sizes into tiles.

grid

Iterate over individual indices of the given iteration space.

static_range

Create a range that gets unrolled at compile time by iterating over constant integer values.

load

Load a value from a tensor using a list of indices.

store

Store a value to a tensor using a list of indices.

atomic_add

Atomically add a value to a target tensor.

atomic_and

Atomically apply bitwise AND with value to target[index].

atomic_or

Atomically apply bitwise OR with value to target[index].

atomic_xor

Atomically apply bitwise XOR with value to target[index].

atomic_xchg

Atomically exchange (set) a value at target[index].

atomic_max

Atomically update target[index] with the maximum of current value and value.

atomic_min

Atomically update target[index] with the minimum of current value and value.

atomic_cas

Atomically compare-and-swap a value at target[index].

device_print

Print values from device code.

signal

Set global memory barriers.

wait

Wait for global memory barriers.

stacktensor_like

Creates a StackTensor from a tensor of data pointers (dev_ptrs) pointing to tensors alike residing at different memory locations.

zeros

Return a device-tensor filled with zeros.

full

Create a device-tensor filled with a specified value.

arange

Same as torch.arange(), but defaults to same device as the current kernel.

subscript

Equivalent to tensor[index] where tensor is a kernel-tensor (not a host-tensor).

split

Split the last dimension of a tensor with size two into two separate tensors.

join

Join two tensors along a new minor dimension.

reduce

Applies a reduction operation along a specified dimension or all dimensions.

associative_scan

Applies an associative scan operation along a specified dimension.

cumsum

Compute the cumulative sum along a specified dimension.

cumprod

Compute the cumulative product along a specified dimension.

dot

Performs a matrix multiplication of tensors with support for multiple dtypes.

inline_asm_elementwise

Execute inline assembly over a tensor.

inline_triton

Inline a raw Triton snippet inside a Helion kernel.

register_block_size

Explicitly register a block size that should be autotuned and can be used for allocations and inside hl.tile(..., block_size=...).

register_tunable

Register a tunable parameter for autotuning.

constexpr

alias of ConstExpr

specialize

Turn dynamic shapes into compile-time constants. Examples::.

Language Classes#

Tile

This class should not be instantiated directly, it is the result of hl.tile(...) and represents a single tile of the iteration space.

StackTensor

This class should not be instantiated directly.

Tile Helpers#

tile_index

Retrieve the index (a 1D tensor containing offsets) of the given tile.

tile_begin

Retrieve the start offset of the given tile.

tile_end

Retrieve the end offset of the given tile.

tile_block_size

Retrieve block size of a given tile, usually set the autotuner.

tile_id

Retrieve tile_id of a given tile or list of tiles.