Rate this Page

Helion Examples#

This directory contains examples demonstrating how to use Helion for high-performance tensor operations. The examples are organized into the following categories:

Pretuned Kernels (run as-is, no autotuning)#

The pretuned_kernels/ directory holds runnable kernels that ship with checked-in AOT heuristic files (currently tuned for NVIDIA B200 / sm100). Helion picks the checked-in config at startup, so these kernels run immediately without any online autotuning — useful as copy/paste recipes for common patterns or as a quick way to try Helion on a supported GPU. Each kernel module has a main() that benchmarks against the PyTorch eager baseline.

  • vector_add — element-wise addition.

  • softmax — softmax with a long-context shape sweep.

  • layer_norm — layer normalization across realistic hidden sizes.

  • rms_norm — RMS normalization with NPOT and LLM-shaped inputs.

  • cross_entropy — cross-entropy across LLM vocabulary sizes.

To pretune one of these kernels for a different GPU (or to ship a heuristic for your own kernel), see the AOT Heuristic Tuning section of the deployment guide.

Basic Operations#

  • add.py: Element-wise addition with broadcasting support

  • exp.py: Element-wise exponential function

  • sum.py: Sum reduction along the last dimension

  • long_sum.py: Efficient sum reduction along a long dimension

  • softmax.py: Different implementations of the softmax function

  • batch_softmax.py: Batched (3D) softmax with arithmetic broadcasting

  • concatenate.py: Tensor concatenation along a dimension

  • low_mem_dropout.py: Memory-efficient dropout implementation

Matrix Multiplication Operations#

Attention Operations#

Normalization#

Activation Functions#

  • geglu.py: Gated Linear Unit (GEGLU) activation

  • swiglu.py: SwiGLU activation function

Loss Functions#

Sparse and Jagged Tensors#

Sequence Models#

Statistics#

  • welford.py: Welford’s online algorithm for computing variance

Neural Network Components#

Advanced Usage#

  • aot_example.py: Ahead-of-time (AOT) autotuning workflow with batch-aware heuristics

  • acfs/softmax_acf.py: Using Advanced Controls Files (ACFs) with kernel configurations and autotuning

Distributed Operations#