helion.language.inline_asm_elementwise
- helion.language.inline_asm_elementwise(asm, constraints, args, dtype, is_pure, pack)[source]
Execute inline assembly over a tensor. Essentially, this is map where the function is inline assembly.
The input tensors args are implicitly broadcasted to the same shape. dtype can be a tuple of types, in which case the output is a tuple of tensors.
Each invocation of the inline asm processes pack elements at a time. Exactly which set of inputs a block receives is unspecified. Input elements of size less than 4 bytes are packed into 4-byte registers.
This op does not support empty dtype – the inline asm must return at least one tensor, even if you don’t need it. You can work around this by returning a dummy tensor of arbitrary type; it shouldn’t cost you anything if you don’t use it.
- Parameters:
asm (
str
) – assembly to run. Must match target’s assembly format.constraints (
str
) – asm constraints in LLVM formatargs (
Sequence
[Tensor
]) – the input tensors, whose values are passed to the asm blockdtype (
Union
[dtype
,Sequence
[dtype
]]) – the element type(s) of the returned tensor(s)is_pure (
bool
) – if true, the compiler assumes the asm block has no side-effectspack (
int
) – the number of elements to be processed by one instance of inline assembly
- Return type:
- Returns:
one tensor or a tuple of tensors of the given dtypes