Helion Examples#
This directory contains examples demonstrating how to use Helion for high-performance tensor operations. The examples are organized into the following categories:
Basic Operations#
add.py: Element-wise addition with broadcasting support
exp.py: Element-wise exponential function
sum.py: Sum reduction along the last dimension
long_sum.py: Efficient sum reduction along a long dimension
softmax.py: Different implementations of the softmax function
concatenate.py: Tensor concatenation along a dimension
low_mem_dropout.py: Memory-efficient dropout implementation
Matrix Multiplication Operations#
matmul.py: Basic matrix multiplication
bmm.py: Batch matrix multiplication
matmul_split_k.py: Matrix multiplication using split-K algorithm for better parallelism
matmul_layernorm.py: Fused matrix multiplication and layer normalization
fp8_gemm.py: Matrix multiplication using FP8 precision
bf16xint16_gemm.py: BF16 x INT16 matrix multiplication
int4_gemm.py: INT4 quantized matrix multiplication
grouped_gemm.py: Grouped matrix multiplication
gather_gemv.py: Gather-based matrix-vector multiplication
Attention Operations#
attention.py: Scaled dot-product attention mechanism
fp8_attention.py: Attention mechanism using FP8 precision
blackwell_attention.py: Attention optimized for Blackwell architecture
Normalization#
rms_norm.py: Root Mean Square (RMS) normalization
layer_norm.py: Layer normalization
Activation Functions#
Loss Functions#
cross_entropy.py: Cross entropy loss function
grpo_loss.py: Group Relative Policy Optimization (GRPO) loss function
jsd.py: Jensen-Shannon Divergence
fused_linear_jsd.py: Fused linear layer with JSD loss
kl_div.py: Kullback-Leibler divergence
Sparse and Jagged Tensors#
jagged_dense_add.py: Addition between a jagged tensor and a dense tensor
jagged_dense_bmm.py: Batch matrix multiplication with jagged tensors
jagged_mean.py: Computing the mean of each row in a jagged tensor
jagged_sum.py: Sum reduction for jagged tensors
jagged_softmax.py: Softmax for jagged tensors
jagged_layer_norm.py: Layer normalization for jagged tensors
jagged_hstu_attn.py: HSTU attention for jagged tensors
segment_reduction.py: Segmented reduction operation
moe_matmul_ogs.py: Mixture-of-Experts matrix multiplication using Outer-Gather-Scatter
Sequence Models#
mamba2_chunk_scan.py: Mamba2 chunk scan operation
mamba2_chunk_state.py: Mamba2 chunk state operation
Statistics#
welford.py: Welford’s online algorithm for computing variance
Neural Network Components#
embedding.py: Embedding lookup operation
squeeze_and_excitation_net.py: Squeeze-and-Excitation network
gdn_fwd_h.py: Generalized Divisive Normalization (GDN) forward pass
Distributed Operations#
distributed/all_gather_matmul.py: All-gather operation followed by matrix multiplication
distributed/all_reduce.py: All-reduce operation (one-shot)
distributed/matmul_reduce_scatter.py: Fused matmul with reduce-scatter
distributed/one_shot_allreduce_bias_rmsnorm.py: Fused all-reduce, bias add, and RMS normalization