helion.language.inline_asm_elementwise

helion.language.inline_asm_elementwise(asm, constraints, args, dtype, is_pure, pack)[source]

Execute inline assembly over a tensor. Essentially, this is map where the function is inline assembly.

The input tensors args are implicitly broadcasted to the same shape. dtype can be a tuple of types, in which case the output is a tuple of tensors.

Each invocation of the inline asm processes pack elements at a time. Exactly which set of inputs a block receives is unspecified. Input elements of size less than 4 bytes are packed into 4-byte registers.

This op does not support empty dtype – the inline asm must return at least one tensor, even if you don’t need it. You can work around this by returning a dummy tensor of arbitrary type; it shouldn’t cost you anything if you don’t use it.

Parameters:
  • asm (str) – assembly to run. Must match target’s assembly format.

  • constraints (str) – asm constraints in LLVM format

  • args (Sequence[Tensor]) – the input tensors, whose values are passed to the asm block

  • dtype (Union[dtype, Sequence[dtype]]) – the element type(s) of the returned tensor(s)

  • is_pure (bool) – if true, the compiler assumes the asm block has no side-effects

  • pack (int) – the number of elements to be processed by one instance of inline assembly

Return type:

Tensor | tuple[Tensor, ...]

Returns:

one tensor or a tuple of tensors of the given dtypes