[−][src]Crate safe_arch
A crate that safely exposes arch intrinsics via #[cfg()]
.
safe_arch
lets you safely use CPU intrinsics. Those things in the
core::arch
modules. It works purely via #[cfg()]
and
compile time CPU feature declaration. If you want to check for a feature at
runtime and then call an intrinsic or use a fallback path based on that then
this crate is sadly not for you.
SIMD register types are "newtype'd" so that better trait impls can be given
to them, but the inner value is a pub
field so feel free to just grab it
out if you need to. Trait impls of the newtypes include: Default
(zeroed),
From
/Into
of appropriate data types, and appropriate operator
overloading.
- Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
- Some intrinsics take a pointer of an assumed minimum alignment and
validity span. For these, the
safe_arch
function takes a reference of an appropriate type to uphold safety.- Try the bytemuck crate (and turn on the
bytemuck
feature of this crate) if you want help safely casting between reference types.
- Try the bytemuck crate (and turn on the
- Some intrinsics are not safe unless you're very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren't exposed here.
- Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn't really support you messing with that within a high level language, so those operations aren't exposed here. Use assembly or something if you want to do that.
Naming Conventions
The safe_arch
crate does not simply use the "official" names for each
intrinsic, because the official names are generally poor. Instead, the
operations have been given better names that makes things hopefully easier
to understand then you're reading the code.
For a full explanation of the naming used, see the Naming Conventions page.
Current Support
x86
/x86_64
(Intel, AMD, etc)- 128-bit:
sse
,sse2
,sse3
,ssse3
,sse4.1
,sse4.2
- 256-bit:
avx
,avx2
- Other:
adx
,aes
,bmi1
,bmi2
,fma
,lzcnt
,pclmulqdq
,popcnt
,rdrand
,rdseed
- 128-bit:
Compile Time CPU Target Features
At the time of me writing this, Rust enables the sse
and sse2
CPU
features by default for all i686
(x86) and x86_64
builds. Those CPU
features are built into the design of x86_64
, and you'd need a super old
x86
CPU for it to not support at least sse
and sse2
, so they're a safe
bet for the language to enable all the time. In fact, because the standard
library is compiled with them enabled, simply trying to disable those
features would actually cause ABI issues and fill your program with UB
(link).
If you want additional CPU features available at compile time you'll have to
enable them with an additional arg to rustc
. For a feature named name
you pass -C target-feature=+name
, such as -C target-feature=+sse3
for
sse3
.
You can alternately enable all target features of the current CPU with -C target-cpu=native
. This is primarily of use if you're building a program
you'll only run on your own system.
It's sometimes hard to know if your target platform will support a given
feature set, but the Steam Hardware Survey is generally
taken as a guide to what you can expect people to have available. If you
click "Other Settings" it'll expand into a list of CPU target features and
how common they are. These days, it seems that sse3
can be safely assumed,
and ssse3
, sse4.1
, and sse4.2
are pretty safe bets as well. The stuff
above 128-bit isn't as common yet, give it another few years.
Please note that executing a program on a CPU that doesn't support the target features it was compiles for is Undefined Behavior.
Currently, Rust doesn't actually support an easy way for you to check that a
feature enabled at compile time is actually available at runtime. There is
the "feature_detected" family of macros, but if you
enable a feature they will evaluate to a constant true
instead of actually
deferring the check for the feature to runtime. This means that, if you
did want a check at the start of your program, to confirm that all the
assumed features are present and error out when the assumptions don't hold,
you can't use that macro. You gotta use CPUID and check manually. rip.
Hopefully we can make that process easier in a future version of this crate.
A Note On Working With Cfg
There's two main ways to use cfg
:
- Via an attribute placed on an item, block, or expression:
#[cfg(debug_assertions)] println!("hello");
- Via a macro used within an expression position:
if cfg!(debug_assertions) { println!("hello"); }
The difference might seem small but it's actually very important:
- The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don't always exist as long as the things they name do exist whenever that code is configured into the build.
- The macro form will include the configured code no matter what, and then
the macro resolves to a constant
true
orfalse
and the compiler uses dead code elimination to cut out the path not taken.
This crate uses cfg
via the attribute, so the functions it exposes don't
exist at all when the appropriate CPU target features aren't enabled.
Accordingly, if you plan to call this crate or not depending on what
features are enabled in the build you'll also need to control your use of
this crate via cfg attribute, not cfg macro.
Modules
naming_conventions | An explanation of the crate's naming conventions. |
Macros
aes_key_gen_assist_m128i | aes ? |
blend_i32_m128i | avx2 Blends the |
blend_imm_i16_m128i | Blends the |
blend_imm_i16_m256i | avx2 Blends the |
blend_imm_i32_m256i | avx2 Blends the |
blend_imm_m128d | Blends the lanes according to the immediate mask. |
blend_imm_m128 | Blends the lanes according to the immediate mask. |
blend_imm_m256d | avx Blends the |
blend_imm_m256 | avx Blends the |
byte_shl_imm_u128_m128i | Shifts all bits in the entire register left by a number of bytes. |
byte_shl_imm_u128_m256i | avx2 Shifts each |
byte_shr_imm_u128_m128i | Shifts all bits in the entire register right by a number of bytes. |
byte_shr_imm_u128_m256i | avx2 Shifts each |
cmp_op_mask_m128 | avx Compare |
cmp_op_mask_m128_s | avx Compare |
cmp_op_mask_m128d | avx Compare |
cmp_op_mask_m128d_s | avx Compare |
cmp_op_mask_m256 | avx Compare |
cmp_op_mask_m256d | avx Compare |
combined_byte_shr_imm_m128i | Counts |
combined_byte_shr_imm_m256i | Works like |
comparison_operator_translation | avx Turns a comparison operator token to the correct constant value. |
dot_product_m128d | Performs a dot product of two |
dot_product_m128 | Performs a dot product of two |
dot_product_m256 | avx This works like |
extract_f32_as_i32_bits_imm_m128 | Gets the |
extract_i16_as_i32_m128i | Gets an |
extract_i16_as_i32_m256i | avx2 Gets an |
extract_i32_from_m256i | avx Extracts an |
extract_i32_imm_m128i | Gets the |
extract_i64_from_m256i | avx Extracts an |
extract_i64_imm_m128i | Gets the |
extract_i8_as_i32_imm_m128i | Gets the |
extract_i8_as_i32_m256i | avx2 Gets an |
extract_m128_from_m256 | avx Extracts an |
extract_m128d_from_m256d | avx Extracts an |
extract_m128i_from_m256i | avx Extracts an |
extract_m128i_m256i | avx2 Gets an |
insert_f32_imm_m128 | Inserts a lane from |
insert_i16_from_i32_m128i | Inserts the low 16 bits of an |
insert_i16_to_m256i | avx Inserts an |
insert_i32_imm_m128i | Inserts a new value for the |
insert_i32_to_m256i | avx Inserts an |
insert_i64_imm_m128i | Inserts a new value for the |
insert_i64_to_m256i | avx Inserts an |
insert_i8_imm_m128i | Inserts a new value for the |
insert_i8_to_m256i | avx Inserts an |
insert_m128_to_m256 | avx Inserts an |
insert_m128d_to_m256d | avx Inserts an |
insert_m128i_to_m256i_slow_avx | avx Slowly inserts an |
insert_m128i_to_m256i | avx Inserts an |
mul_i64_carryless_m128i | pclmulqdq Performs a "carryless" multiplication of two |
multi_packed_sum_abs_diff_u8_m128i | Computes eight |
multi_packed_sum_abs_diff_u8_m256i | avx2 Computes eight |
round_m128d | Rounds each lane in the style specified. |
round_m128d_s | Rounds |
round_m128 | Rounds each lane in the style specified. |
round_m128_s | Rounds |
round_m256d | avx Rounds each lane in the style specified. |
round_m256 | avx Rounds each lane in the style specified. |
shl_imm_u16_m128i | Shifts all |
shl_imm_u16_m256i | avx2 Shifts all |
shl_imm_u32_m128i | Shifts all |
shl_imm_u32_m256i | avx2 Shifts all |
shl_imm_u64_m128i | Shifts both |
shl_imm_u64_m256i | avx2 Shifts all |
shr_imm_i16_m128i | Shifts all |
shr_imm_i16_m256i | avx2 Shifts all |
shr_imm_i32_m128i | Shifts all |
shr_imm_i32_m256i | avx2 Shifts all |
shr_imm_u16_m128i | Shifts all |
shr_imm_u16_m256i | avx2 Shifts all |
shr_imm_u32_m128i | Shifts all |
shr_imm_u32_m256i | avx2 Shifts all |
shr_imm_u64_m128i | Shifts both |
shr_imm_u64_m256i | avx2 Shifts all |
shuffle_abi_f128z_all_m256d | avx Shuffle 128 bits of floating point data at a time from |
shuffle_abi_f128z_all_m256 | avx Shuffle 128 bits of floating point data at a time from |
shuffle_abi_f128z_all_m256i | avx Slowly swizzle 128 bits of integer data from |
shuffle_abi_f32_all_m128 | sse Shuffle the |
shuffle_abi_f32_half_m256 | avx Shuffle the |
shuffle_abi_f64_all_m128d | Shuffle the |
shuffle_abi_f64_half_m256d | avx Shuffle the |
shuffle_abi_i128z_all_m256i | avx Shuffle 128 bits of integer data from |
shuffle_ai_f32_all_m128i | Shuffle the |
shuffle_ai_f32_all_m128 | avx Shuffle the |
shuffle_ai_f32_half_m256 | avx Shuffle the |
shuffle_ai_f64_all_m128d | avx Shuffle the |
shuffle_ai_f64_all_m256d | avx2 Shuffle the |
shuffle_ai_f64_half_m256d | avx Shuffle the |
shuffle_ai_i16_h64all_m128i | Shuffle the high |
shuffle_ai_i16_h64half_m256i | avx2 Shuffle the high |
shuffle_ai_i16_l64all_m128i | Shuffle the low |
shuffle_ai_i16_l64half_m256i | avx2 Shuffle the low |
shuffle_ai_i32_half_m256i | avx2 Shuffle the |
shuffle_ai_i64_all_m256i | avx2 Shuffle the |
string_search_for_index | sse4.2 Looks for |
string_search_for_mask | sse4.2 Looks for |
Structs
m128 | The data for a 128-bit SSE register of four |
m128d | The data for a 128-bit SSE register of two |
m128i | The data for a 128-bit SSE register of integer data. |
m256 | The data for a 256-bit AVX register of eight |
m256d | The data for a 256-bit AVX register of four |
m256i | The data for a 256-bit AVX register of integer data. |
Functions
abs_i16_m128i | ssse3 Lanewise absolute value with lanes as |
abs_i16_m256i | avx2 Absolute value of |
abs_i32_m128i | ssse3 Lanewise absolute value with lanes as |
abs_i32_m256i | avx2 Absolute value of |
abs_i8_m128i | ssse3 Lanewise absolute value with lanes as |
abs_i8_m256i | avx2 Absolute value of |
add_carry_u32 | adx Add two |
add_carry_u64 | adx Add two |
add_horizontal_i16_m128i | ssse3 Add horizontal pairs of |
add_horizontal_i16_m256i | avx2 Horizontal |
add_horizontal_i32_m128i | ssse3 Add horizontal pairs of |
add_horizontal_i32_m256i | avx2 Horizontal |
add_horizontal_m128d | sse3 Add each lane horizontally, pack the outputs as |
add_horizontal_m128 | sse3 Add each lane horizontally, pack the outputs as |
add_horizontal_m256d | avx Add adjacent |
add_horizontal_m256 | avx Add adjacent |
add_horizontal_saturating_i16_m128i | ssse3 Add horizontal pairs of |
add_horizontal_saturating_i16_m256i | avx2 Horizontal saturating |
add_i16_m128i | sse2 Lanewise |
add_i16_m256i | avx2 Lanewise |
add_i32_m128i | sse2 Lanewise |
add_i32_m256i | avx2 Lanewise |
add_i64_m128i | sse2 Lanewise |
add_i64_m256i | avx2 Lanewise |
add_i8_m128i | sse2 Lanewise |
add_i8_m256i | avx2 Lanewise |
add_m128 | sse Lanewise |
add_m128_s | sse Low lane |
add_m128d | sse2 Lanewise |
add_m128d_s | sse2 Lowest lane |
add_m256d | avx Lanewise |
add_m256 | avx Lanewise |
add_saturating_i16_m128i | sse2 Lanewise saturating |
add_saturating_i16_m256i | avx2 Lanewise saturating |
add_saturating_i8_m128i | sse2 Lanewise saturating |
add_saturating_i8_m256i | avx2 Lanewise saturating |
add_saturating_u16_m128i | sse2 Lanewise saturating |
add_saturating_u16_m256i | avx2 Lanewise saturating |
add_saturating_u8_m128i | sse2 Lanewise saturating |
add_saturating_u8_m256i | avx2 Lanewise saturating |
addsub_m128d | sse3 Add the high lane and subtract the low lane. |
addsub_m128 | sse3 Alternately, from the top, add a lane and then subtract a lane. |
addsub_m256d | avx Alternately, from the top, add |
addsub_m256 | avx Alternately, from the top, add |
aes_decrypt_last_m128i | aes "Perform the last round of AES decryption flow on |
aes_decrypt_m128i | aes "Perform one round of AES decryption flow on |
aes_encrypt_last_m128i | aes "Perform the last round of AES encryption flow on |
aes_encrypt_m128i | aes "Perform one round of AES encryption flow on |
aes_inv_mix_columns_m128i | aes "Perform the InvMixColumns transform on |
average_u16_m128i | sse2 Lanewise average of the |
average_u16_m256i | avx2 Average |
average_u8_m128i | sse2 Lanewise average of the |
average_u8_m256i | avx2 Average |
bit_extract2_u32 | bmi1 Extract a span of bits from the |
bit_extract2_u64 | bmi1 Extract a span of bits from the |
bit_extract_u32 | bmi1 Extract a span of bits from the |
bit_extract_u64 | bmi1 Extract a span of bits from the |
bit_lowest_set_mask_u32 | bmi1 Gets the mask of all bits up to and including the lowest set bit in a |
bit_lowest_set_mask_u64 | bmi1 Gets the mask of all bits up to and including the lowest set bit in a |
bit_lowest_set_reset_u32 | bmi1 Resets (clears) the lowest set bit. |
bit_lowest_set_reset_u64 | bmi1 Resets (clears) the lowest set bit. |
bit_lowest_set_value_u32 | bmi1 Gets the value of the lowest set bit in a |
bit_lowest_set_value_u64 | bmi1 Gets the value of the lowest set bit in a |
bit_zero_high_index_u32 | bmi2 Zero out all high bits in a |
bit_zero_high_index_u64 | bmi2 Zero out all high bits in a |
bitand_m128 | sse Bitwise |
bitand_m128d | sse2 Bitwise |
bitand_m128i | sse2 Bitwise |
bitand_m256d | avx Bitwise |
bitand_m256 | avx Bitwise |
bitand_m256i | avx2 Bitwise |
bitandnot_m128 | sse Bitwise |
bitandnot_m128d | sse2 Bitwise |
bitandnot_m128i | sse2 Bitwise |
bitandnot_m256d | avx Bitwise |
bitandnot_m256 | avx Bitwise |
bitandnot_m256i | avx2 Bitwise |
bitandnot_u32 | bmi1 Bitwise |
bitandnot_u64 | bmi1 Bitwise |
bitor_m128 | sse Bitwise |
bitor_m128d | sse2 Bitwise |
bitor_m128i | sse2 Bitwise |
bitor_m256d | avx Bitwise |
bitor_m256 | avx Bitwise |
bitor_m256i | avx2 Bitwise |
bitxor_m128 | sse Bitwise |
bitxor_m128d | sse2 Bitwise |
bitxor_m128i | sse2 Bitwise |
bitxor_m256d | avx Bitwise |
bitxor_m256 | avx Bitwise |
bitxor_m256i | avx2 Bitwise |
blend_varying_i8_m128i | sse4.1 Blend the |
blend_varying_i8_m256i | avx2 Blend |
blend_varying_m128d | sse4.1 Blend the lanes according to a runtime varying mask. |
blend_varying_m128 | sse4.1 Blend the lanes according to a runtime varying mask. |
blend_varying_m256d | avx Blend the lanes according to a runtime varying mask. |
blend_varying_m256 | avx Blend the lanes according to a runtime varying mask. |
byte_swap_i32 | Swap the bytes of the given 32-bit value. |
byte_swap_i64 | Swap the bytes of the given 64-bit value. |
cast_to_m128_from_m128d | sse2 Bit-preserving cast to |
cast_to_m128_from_m128i | sse2 Bit-preserving cast to |
cast_to_m128_from_m256 | avx Bit-preserving cast to |
cast_to_m128d_from_m128 | sse2 Bit-preserving cast to |
cast_to_m128d_from_m128i | sse2 Bit-preserving cast to |
cast_to_m128d_from_m256d | avx Bit-preserving cast to |
cast_to_m128i_from_m128d | sse2 Bit-preserving cast to |
cast_to_m128i_from_m128 | sse2 Bit-preserving cast to |
cast_to_m128i_from_m256i | avx Bit-preserving cast to |
cast_to_m256_from_m256d | avx Bit-preserving cast to |
cast_to_m256_from_m256i | avx Bit-preserving cast to |
cast_to_m256d_from_m256 | avx Bit-preserving cast to |
cast_to_m256d_from_m256i | avx Bit-preserving cast to |
cast_to_m256i_from_m256d | avx Bit-preserving cast to |
cast_to_m256i_from_m256 | avx Bit-preserving cast to |
ceil_m128d | sse4.1 Round each lane to a whole number, towards positive infinity |
ceil_m128 | sse4.1 Round each lane to a whole number, towards positive infinity |
ceil_m128d_s | sse4.1 Round the low lane of |
ceil_m128_s | sse4.1 Round the low lane of |
ceil_m256d | avx Round |
ceil_m256 | avx Round |
cmp_eq_i32_m128_s | sse Low lane equality. |
cmp_eq_i32_m128d_s | sse2 Low lane |
cmp_eq_mask_i16_m128i | sse2 Lanewise |
cmp_eq_mask_i16_m256i | avx2 Compare |
cmp_eq_mask_i32_m128i | sse2 Lanewise |
cmp_eq_mask_i32_m256i | avx2 Compare |
cmp_eq_mask_i64_m128i | sse4.1 Lanewise |
cmp_eq_mask_i64_m256i | avx2 Compare |
cmp_eq_mask_i8_m128i | sse2 Lanewise |
cmp_eq_mask_i8_m256i | avx2 Compare |
cmp_eq_mask_m128 | sse Lanewise |
cmp_eq_mask_m128_s | sse Low lane |
cmp_eq_mask_m128d | sse2 Lanewise |
cmp_eq_mask_m128d_s | sse2 Low lane |
cmp_ge_i32_m128_s | sse Low lane greater than or equal to. |
cmp_ge_i32_m128d_s | sse2 Low lane |
cmp_ge_mask_m128 | sse Lanewise |
cmp_ge_mask_m128_s | sse Low lane |
cmp_ge_mask_m128d | sse2 Lanewise |
cmp_ge_mask_m128d_s | sse2 Low lane |
cmp_gt_i32_m128_s | sse Low lane greater than. |
cmp_gt_i32_m128d_s | sse2 Low lane |
cmp_gt_mask_i16_m128i | sse2 Lanewise |
cmp_gt_mask_i16_m256i | avx2 Compare |
cmp_gt_mask_i32_m128i | sse2 Lanewise |
cmp_gt_mask_i32_m256i | avx2 Compare |
cmp_gt_mask_i64_m128i | sse4.2 Lanewise |
cmp_gt_mask_i64_m256i | avx2 Compare |
cmp_gt_mask_i8_m128i | sse2 Lanewise |
cmp_gt_mask_i8_m256i | avx2 Compare |
cmp_gt_mask_m128 | sse Lanewise |
cmp_gt_mask_m128_s | sse Low lane |
cmp_gt_mask_m128d | sse2 Lanewise |
cmp_gt_mask_m128d_s | sse2 Low lane |
cmp_le_i32_m128_s | sse Low lane less than or equal to. |
cmp_le_i32_m128d_s | sse2 Low lane |
cmp_le_mask_m128 | sse Lanewise |
cmp_le_mask_m128_s | sse Low lane |
cmp_le_mask_m128d | sse2 Lanewise |
cmp_le_mask_m128d_s | sse2 Low lane |
cmp_lt_i32_m128_s | sse Low lane less than. |
cmp_lt_i32_m128d_s | sse2 Low lane |
cmp_lt_mask_i16_m128i | sse2 Lanewise |
cmp_lt_mask_i32_m128i | sse2 Lanewise |
cmp_lt_mask_i8_m128i | sse2 Lanewise |
cmp_lt_mask_m128 | sse Lanewise |
cmp_lt_mask_m128_s | sse Low lane |
cmp_lt_mask_m128d | sse2 Lanewise |
cmp_lt_mask_m128d_s | sse2 Low lane |
cmp_neq_i32_m128_s | sse Low lane not equal to. |
cmp_neq_i32_m128d_s | sse2 Low lane |
cmp_neq_mask_m128 | sse Lanewise |
cmp_neq_mask_m128_s | sse Low lane |
cmp_neq_mask_m128d | sse2 Lanewise |
cmp_neq_mask_m128d_s | sse2 Low lane |
cmp_nge_mask_m128 | sse Lanewise |
cmp_nge_mask_m128_s | sse Low lane |
cmp_nge_mask_m128d | sse2 Lanewise |
cmp_nge_mask_m128d_s | sse2 Low lane |
cmp_ngt_mask_m128 | sse Lanewise |
cmp_ngt_mask_m128_s | sse Low lane |
cmp_ngt_mask_m128d | sse2 Lanewise |
cmp_ngt_mask_m128d_s | sse2 Low lane |
cmp_nle_mask_m128 | sse Lanewise |
cmp_nle_mask_m128_s | sse Low lane |
cmp_nle_mask_m128d | sse2 Lanewise |
cmp_nle_mask_m128d_s | sse2 Low lane |
cmp_nlt_mask_m128 | sse Lanewise |
cmp_nlt_mask_m128_s | sse Low lane |
cmp_nlt_mask_m128d | sse2 Lanewise |
cmp_nlt_mask_m128d_s | sse2 Low lane |
cmp_ordinary_mask_m128 | sse Lanewise |
cmp_ordinary_mask_m128_s | sse Low lane |
cmp_ordinary_mask_m128d | sse2 Lanewise |
cmp_ordinary_mask_m128d_s | sse2 Low lane |
cmp_unord_mask_m128 | sse Lanewise |
cmp_unord_mask_m128_s | sse Low lane |
cmp_unord_mask_m128d | sse2 Lanewise |
cmp_unord_mask_m128d_s | sse2 Low lane |
convert_i32_replace_m128_s | sse Convert |
convert_i32_replace_m128d_s | sse2 Convert |
convert_i64_replace_m128d_s | sse2 Convert |
convert_m128_s_replace_m128d_s | sse2 Converts the lower |
convert_m128d_s_replace_m128_s | sse2 Converts the low |
convert_to_f32_from_m256_s | avx Convert the lowest |
convert_to_f64_from_m256d_s | avx Convert the lowest |
convert_to_i16_m128i_from_lower2_i16_m128i | sse4.1 Convert the lower two |
convert_to_i16_m128i_from_lower8_i8_m128i | sse4.1 Convert the lower eight |
convert_to_i16_m256i_from_i8_m128i | avx2 Convert |
convert_to_i16_m256i_from_lower4_u8_m128i | avx2 Convert lower 4 |
convert_to_i16_m256i_from_lower8_u8_m128i | avx2 Convert lower 8 |
convert_to_i16_m256i_from_u8_m128i | avx2 Convert |
convert_to_i32_from_m256i_s | avx Convert the lowest |
convert_to_i32_m128i_from_lower4_i16_m128i | sse4.1 Convert the lower four |
convert_to_i32_m128i_from_lower4_i8_m128i | sse4.1 Convert the lower four |
convert_to_i32_m128i_from_m128d | sse2 Rounds the two |
convert_to_i32_m128i_from_m128 | sse2 Rounds the |
convert_to_i32_m128i_from_m256d | avx Convert |
convert_to_i32_m256i_from_i16_m128i | avx2 Convert |
convert_to_i32_m256i_from_lower8_i8_m128i | avx2 Convert the lower 8 |
convert_to_i32_m256i_from_m256 | avx Convert |
convert_to_i32_m256i_from_u16_m128i | avx2 Convert |
convert_to_i64_m128i_from_lower2_i32_m128i | sse4.1 Convert the lower two |
convert_to_i64_m128i_from_lower2_i8_m128i | sse4.1 Convert the lower two |
convert_to_i64_m256i_from_i32_m128i | avx2 Convert |
convert_to_i64_m256i_from_lower4_i16_m128i | avx2 Convert |
convert_to_i64_m256i_from_lower4_i8_m128i | avx2 Convert the lower 4 |
convert_to_i64_m256i_from_lower4_u16_m128i | avx2 Convert |
convert_to_i64_m256i_from_u32_m128i | avx2 Convert |
convert_to_m128_from_i32_m128i | sse2 Rounds the four |
convert_to_m128_from_m128d | sse2 Rounds the two |
convert_to_m128_from_m256d | avx Convert |
convert_to_m128d_from_lower2_i32_m128i | sse2 Rounds the lower two |
convert_to_m128d_from_lower2_m128 | sse2 Rounds the two |
convert_to_m256_from_i32_m256i | avx Convert |
convert_to_m256d_from_i32_m128i | avx Convert |
convert_to_m256d_from_m128 | avx Convert |
convert_to_u16_m128i_from_lower8_u8_m128i | sse4.1 Convert the lower eight |
convert_to_u32_m128i_from_lower4_u16_m128i | sse4.1 Convert the lower four |
convert_to_u32_m128i_from_lower4_u8_m128i | sse4.1 Convert the lower four |
convert_to_u64_m128i_from_lower2_u16_m128i | sse4.1 Convert the lower two |
convert_to_u64_m128i_from_lower2_u32_m128i | sse4.1 Convert the lower two |
convert_to_u64_m128i_from_lower2_u8_m128i | sse4.1 Convert the lower two |
convert_truncate_to_i32_m128i_from_m256d | avx Convert |
convert_truncate_to_i32_m256i_from_m256 | avx Convert |
copy_i64_m128i_s | sse2 Copy the low |
copy_replace_low_f64_m128d | sse2 Copies the |
crc32_u8 | sse4.2 Accumulates the |
crc32_u16 | sse4.2 Accumulates the |
crc32_u32 | sse4.2 Accumulates the |
crc32_u64 | sse4.2 Accumulates the |
div_m128 | sse Lanewise |
div_m128_s | sse Low lane |
div_m128d | sse2 Lanewise |
div_m128d_s | sse2 Lowest lane |
div_m256d | avx Lanewise |
div_m256 | avx Lanewise |
duplicate_even_lanes_m128 | sse3 Duplicate the odd lanes to the even lanes. |
duplicate_even_lanes_m256 | avx Duplicate the even-indexed lanes to the odd lanes. |
duplicate_low_lane_m128d_s | sse3 Copy the low lane of the input to both lanes of the output. |
duplicate_odd_lanes_m128 | sse3 Duplicate the odd lanes to the even lanes. |
duplicate_odd_lanes_m256d | avx Duplicate the odd-indexed lanes to the even lanes. |
duplicate_odd_lanes_m256 | avx Duplicate the odd-indexed lanes to the even lanes. |
floor_m128d | sse4.1 Round each lane to a whole number, towards negative infinity |
floor_m128 | sse4.1 Round each lane to a whole number, towards negative infinity |
floor_m128d_s | sse4.1 Round the low lane of |
floor_m128_s | sse4.1 Round the low lane of |
floor_m256d | avx Round |
floor_m256 | avx Round |
fused_mul_add_m128 | fma Lanewise fused |
fused_mul_add_m128_s | fma Low lane fused |
fused_mul_add_m128d | fma Lanewise fused |
fused_mul_add_m128d_s | fma Low lane fused |
fused_mul_add_m256 | fma Lanewise fused |
fused_mul_add_m256d | fma Lanewise fused |
fused_mul_addsub_m128 | fma Lanewise fused |
fused_mul_addsub_m128d | fma Lanewise fused |
fused_mul_addsub_m256 | fma Lanewise fused |
fused_mul_addsub_m256d | fma Lanewise fused |
fused_mul_neg_add_m128 | fma Lanewise fused |
fused_mul_neg_add_m128_s | fma Low lane |
fused_mul_neg_add_m128d | fma Lanewise fused |
fused_mul_neg_add_m128d_s | fma Low lane |
fused_mul_neg_add_m256 | fma Lanewise fused |
fused_mul_neg_add_m256d | fma Lanewise fused |
fused_mul_neg_sub_m128 | fma Lanewise fused |
fused_mul_neg_sub_m128_s | fma Low lane fused |
fused_mul_neg_sub_m128d | fma Lanewise fused |
fused_mul_neg_sub_m128d_s | fma Low lane fused |
fused_mul_neg_sub_m256 | fma Lanewise fused |
fused_mul_neg_sub_m256d | fma Lanewise fused |
fused_mul_sub_m128 | fma Lanewise fused |
fused_mul_sub_m128_s | fma Low lane fused |
fused_mul_sub_m128d | fma Lanewise fused |
fused_mul_sub_m128d_s | fma Low lane fused |
fused_mul_sub_m256 | fma Lanewise fused |
fused_mul_sub_m256d | fma Lanewise fused |
fused_mul_subadd_m128 | fma Lanewise fused |
fused_mul_subadd_m128d | fma Lanewise fused |
fused_mul_subadd_m256 | fma Lanewise fused |
fused_mul_subadd_m256d | fma Lanewise fused |
get_f32_from_m128_s | sse Gets the low lane as an individual |
get_f64_from_m128d_s | sse2 Gets the lower lane as an |
get_i32_from_m128_s | sse Converts the low lane to |
get_i32_from_m128d_s | sse2 Converts the lower lane to an |
get_i32_from_m128i_s | sse2 Converts the lower lane to an |
get_i64_from_m128d_s | sse2 Converts the lower lane to an |
get_i64_from_m128i_s | sse2 Converts the lower lane to an |
leading_zero_count_u32 | lzcnt Count the leading zeroes in a |
leading_zero_count_u64 | lzcnt Count the leading zeroes in a |
load_f32_m128_s | sse Loads the |
load_f32_splat_m128 | sse Loads the |
load_f32_splat_m256 | avx Load an |
load_f64_m128d_s | sse2 Loads the reference into the low lane of the register. |
load_f64_splat_m128d | sse2 Loads the |
load_f64_splat_m256d | avx Load an |
load_i64_m128i_s | sse2 Loads the low |
load_m128 | sse Loads the reference into a register. |
load_m128d | sse2 Loads the reference into a register. |
load_m128i | sse2 Loads the reference into a register. |
load_m256d | avx Load data from memory into a register. |
load_m256 | avx Load data from memory into a register. |
load_m256i | avx Load data from memory into a register. |
load_m128_splat_m256 | avx Load an |
load_m128d_splat_m256d | avx Load an |
load_masked_i32_m128i | avx2 Loads the reference given and zeroes any |
load_masked_i32_m256i | avx2 Loads the reference given and zeroes any |
load_masked_i64_m128i | avx2 Loads the reference given and zeroes any |
load_masked_i64_m256i | avx2 Loads the reference given and zeroes any |
load_masked_m128d | avx Load data from memory into a register according to a mask. |
load_masked_m128 | avx Load data from memory into a register according to a mask. |
load_masked_m256d | avx Load data from memory into a register according to a mask. |
load_masked_m256 | avx Load data from memory into a register according to a mask. |
load_replace_high_m128d | sse2 Loads the reference into a register, replacing the high lane. |
load_replace_low_m128d | sse2 Loads the reference into a register, replacing the low lane. |
load_reverse_m128 | sse Loads the reference into a register with reversed order. |
load_reverse_m128d | sse2 Loads the reference into a register with reversed order. |
load_unaligned_hi_lo_m256d | avx Load data from memory into a register. |
load_unaligned_hi_lo_m256 | avx Load data from memory into a register. |
load_unaligned_hi_lo_m256i | avx Load data from memory into a register. |
load_unaligned_m128 | sse Loads the reference into a register. |
load_unaligned_m128d | sse2 Loads the reference into a register. |
load_unaligned_m128i | sse2 Loads the reference into a register. |
load_unaligned_m256d | avx Load data from memory into a register. |
load_unaligned_m256 | avx Load data from memory into a register. |
load_unaligned_m256i | avx Load data from memory into a register. |
max_i16_m128i | sse2 Lanewise |
max_i16_m256i | avx2 Lanewise |
max_i32_m128i | sse4.1 Lanewise |
max_i32_m256i | avx2 Lanewise |
max_i8_m128i | sse4.1 Lanewise |
max_i8_m256i | avx2 Lanewise |
max_m128 | sse Lanewise |
max_m128_s | sse Low lane |
max_m128d | sse2 Lanewise |
max_m128d_s | sse2 Low lane |
max_m256d | avx Lanewise |
max_m256 | avx Lanewise |
max_u16_m128i | sse4.1 Lanewise |
max_u16_m256i | avx2 Lanewise |
max_u32_m128i | sse4.1 Lanewise |
max_u32_m256i | avx2 Lanewise |
max_u8_m128i | sse2 Lanewise |
max_u8_m256i | avx2 Lanewise |
min_i16_m128i | sse2 Lanewise |
min_i16_m256i | avx2 Lanewise |
min_i32_m128i | sse4.1 Lanewise |
min_i32_m256i | avx2 Lanewise |
min_i8_m128i | sse4.1 Lanewise |
min_i8_m256i | avx2 Lanewise |
min_m128 | sse Lanewise |
min_m128_s | sse Low lane |
min_m128d | sse2 Lanewise |
min_m128d_s | sse2 Low lane |
min_m256d | avx Lanewise |
min_m256 | avx Lanewise |
min_position_u16_m128i | sse4.1 Min |
min_u16_m128i | sse4.1 Lanewise |
min_u16_m256i | avx2 Lanewise |
min_u32_m128i | sse4.1 Lanewise |
min_u32_m256i | avx2 Lanewise |
min_u8_m128i | sse2 Lanewise |
min_u8_m256i | avx2 Lanewise |
move_high_low_m128 | sse Move the high lanes of |
move_low_high_m128 | sse Move the low lanes of |
move_m128_s | sse Move the low lane of |
move_mask_i8_m128i | sse2 Gathers the |
move_mask_m128 | sse Gathers the sign bit of each lane. |
move_mask_m128d | sse2 Gathers the sign bit of each lane. |
move_mask_m256d | avx Collects the sign bit of each lane into a 4-bit value. |
move_mask_m256 | avx Collects the sign bit of each lane into a 4-bit value. |
move_mask_m256i | avx2 Create an |
mul_extended_u32 | bmi2 Multiply two |
mul_extended_u64 | bmi2 Multiply two |
mul_i16_horizontal_add_m128i | sse2 Multiply |
mul_i16_horizontal_add_m256i | avx2 Multiply |
mul_i16_keep_high_m128i | sse2 Lanewise |
mul_i16_keep_high_m256i | avx2 Multiply the |
mul_i16_keep_low_m128i | sse2 Lanewise |
mul_i16_keep_low_m256i | avx2 Multiply the |
mul_i16_scale_round_m128i | ssse3 Multiply |
mul_i16_scale_round_m256i | avx2 Multiply |
mul_i32_keep_low_m128i | sse4.1 Lanewise |
mul_i32_keep_low_m256i | avx2 Multiply the |
mul_i64_low_bits_m256i | avx2 Multiply the lower |
mul_m128 | sse Lanewise |
mul_m128_s | sse Low lane |
mul_m128d | sse2 Lanewise |
mul_m128d_s | sse2 Lowest lane |
mul_m256d | avx Lanewise |
mul_m256 | avx Lanewise |
mul_u16_keep_high_m128i | sse2 Lanewise |
mul_u16_keep_high_m256i | avx2 Multiply the |
mul_u64_low_bits_m256i | avx2 Multiply the lower |
mul_u8i8_add_horizontal_saturating_m128i | ssse3 This is dumb and weird. |
mul_u8i8_add_horizontal_saturating_m256i | avx2 This is dumb and weird. |
mul_widen_i32_odd_m128i | sse4.1 Multiplies the odd |
mul_widen_u32_odd_m128i | sse2 Multiplies the odd |
pack_i16_to_i8_m128i | sse2 Saturating convert |
pack_i16_to_i8_m256i | avx2 Saturating convert |
pack_i16_to_u8_m128i | sse2 Saturating convert |
pack_i16_to_u8_m256i | avx2 Saturating convert |
pack_i32_to_i16_m128i | sse2 Saturating convert |
pack_i32_to_i16_m256i | avx2 Saturating convert |
pack_i32_to_u16_m128i | sse4.1 Saturating convert |
pack_i32_to_u16_m256i | avx2 Saturating convert |
population_count_i32 | popcnt Count the number of bits set within an |
population_count_i64 | popcnt Count the number of bits set within an |
population_deposit_u32 | bmi2 Deposit contiguous low bits from a |
population_deposit_u64 | bmi2 Deposit contiguous low bits from a |
population_extract_u32 | bmi2 Extract bits from a |
population_extract_u64 | bmi2 Extract bits from a |
rdrand_u16 | rdrand Try to obtain a random |
rdrand_u32 | rdrand Try to obtain a random |
rdrand_u64 | rdrand Try to obtain a random |
rdseed_u16 | rdseed Try to obtain a random |
rdseed_u32 | rdseed Try to obtain a random |
rdseed_u64 | rdseed Try to obtain a random |
read_timestamp_counter | Reads the CPU's timestamp counter value. |
read_timestamp_counter_p | Reads the CPU's timestamp counter value and store the processor signature. |
reciprocal_m128 | sse Lanewise |
reciprocal_m128_s | sse Low lane |
reciprocal_m256 | avx Reciprocal of |
reciprocal_sqrt_m128 | sse Lanewise |
reciprocal_sqrt_m128_s | sse Low lane |
reciprocal_sqrt_m256 | avx Reciprocal of |
set_i16_m128i | sse2 Sets the args into an |
set_i16_m256i | avx Set |
set_i32_m128i_s | sse2 Set an |
set_i32_m128i | sse2 Sets the args into an |
set_i32_m256i | avx Set |
set_i64_m128i_s | sse2 Set an |
set_i64_m128i | sse2 Sets the args into an |
set_i8_m128i | sse2 Sets the args into an |
set_i8_m256i | avx Set |
set_m128 | sse Sets the args into an |
set_m128_s | sse Sets the args into an |
set_m128d | sse2 Sets the args into an |
set_m128d_s | sse2 Sets the args into the low lane of a |
set_m256d | avx Set |
set_m256 | avx Set |
set_m128d_m256d | avx Set |
set_m128i_m256i | avx Set |
set_reversed_i16_m128i | sse2 Sets the args into an |
set_reversed_i16_m256i | avx Set |
set_reversed_i32_m128i | sse2 Sets the args into an |
set_reversed_i32_m256i | avx Set |
set_reversed_i8_m128i | sse2 Sets the args into an |
set_reversed_i8_m256i | avx Set |
set_reversed_m128 | sse Sets the args into an |
set_reversed_m128d | sse2 Sets the args into an |
set_reversed_m256d | avx Set |
set_reversed_m256 | avx Set |
set_reversed_m128d_m256d | avx Set |
set_reversed_m128i_m256i | avx Set |
set_splat_i16_m128i | sse2 Splats the |
set_splat_i16_m256i | avx Splat an |
set_splat_i16_m128i_s_m256i | avx2 Sets the lowest |
set_splat_i32_m128i | sse2 Splats the |
set_splat_i32_m256i | avx Splat an |
set_splat_i32_m128i_s_m256i | avx2 Sets the lowest |
set_splat_i64_m128i | sse2 Splats the |
set_splat_i64_m128i_s_m256i | avx2 Sets the lowest |
set_splat_i8_m128i | sse2 Splats the |
set_splat_i8_m256i | avx Splat an |
set_splat_i8_m128i_s_m256i | avx2 Sets the lowest |
set_splat_m128 | sse Splats the value to all lanes. |
set_splat_m128d | sse2 Splats the args into both lanes of the |
set_splat_m256d | avx Splat an |
set_splat_m256 | avx Splat an |
set_splat_m128_s_m256 | avx2 Sets the lowest lane of an |
set_splat_m128d_s_m256d | avx2 Sets the lowest lane of an |
shl_all_u16_m128i | sse2 Shift all |
shl_all_u16_m256i | avx2 Lanewise |
shl_all_u32_m128i | sse2 Shift all |
shl_all_u32_m256i | avx2 Shift all |
shl_all_u64_m128i | sse2 Shift all |
shl_all_u64_m256i | avx2 Shift all |
shl_each_u32_m128i | avx2 Shift |
shl_each_u32_m256i | avx2 Lanewise |
shl_each_u64_m128i | avx2 Shift |
shl_each_u64_m256i | avx2 Lanewise |
shr_all_i16_m128i | sse2 Shift each |
shr_all_i16_m256i | avx2 Lanewise |
shr_all_i32_m128i | sse2 Shift each |
shr_all_i32_m256i | avx2 Lanewise |
shr_all_u16_m128i | sse2 Shift each |
shr_all_u16_m256i | avx2 Lanewise |
shr_all_u32_m128i | sse2 Shift each |
shr_all_u32_m256i | avx2 Lanewise |
shr_all_u64_m128i | sse2 Shift each |
shr_all_u64_m256i | avx2 Lanewise |
shr_each_i32_m128i | avx2 Shift |
shr_each_i32_m256i | avx2 Lanewise |
shr_each_u32_m128i | avx2 Shift |
shr_each_u32_m256i | avx2 Lanewise |
shr_each_u64_m128i | avx2 Shift |
shr_each_u64_m256i | avx2 Lanewise |
shuffle_av_f32_all_m128 | avx Shuffle |
shuffle_av_f32_half_m256 | avx Shuffle |
shuffle_av_f64_all_m128d | avx Shuffle |
shuffle_av_f64_half_m256d | avx Shuffle |
shuffle_av_i32_all_m256i | avx2 Shuffle |
shuffle_av_i32_all_m256 | avx2 Shuffle |
shuffle_av_i8z_all_m128i | ssse3 Shuffle |
shuffle_av_i8z_half_m256i | avx2 Shuffle |
sign_apply_i16_m128i | ssse3 Applies the sign of |
sign_apply_i16_m256i | avx2 Lanewise |
sign_apply_i32_m128i | ssse3 Applies the sign of |
sign_apply_i32_m256i | avx2 Lanewise |
sign_apply_i8_m128i | ssse3 Applies the sign of |
sign_apply_i8_m256i | avx2 Lanewise |
splat_i16_m128i_s_m128i | avx2 Splat the lowest 16-bit lane across the entire 128 bits. |
splat_i32_m128i_s_m128i | avx2 Splat the lowest 32-bit lane across the entire 128 bits. |
splat_i64_m128i_s_m128i | avx2 Splat the lowest 64-bit lane across the entire 128 bits. |
splat_i8_m128i_s_m128i | avx2 Splat the lowest 8-bit lane across the entire 128 bits. |
splat_m128_s_m128 | avx2 Splat the lowest |
splat_m128d_s_m128d | avx2 Splat the lower |
splat_m128i_m256i | avx2 Splat the 128-bits across 256-bits. |
sqrt_m128 | sse Lanewise |
sqrt_m128_s | sse Low lane |
sqrt_m128d | sse2 Lanewise |
sqrt_m128d_s | sse2 Low lane |
sqrt_m256d | avx Lanewise |
sqrt_m256 | avx Lanewise |
store_high_m128d_s | sse2 Stores the high lane value to the reference given. |
store_i64_m128i_s | sse2 Stores the value to the reference given. |
store_m128 | sse Stores the value to the reference given. |
store_m128_s | sse Stores the low lane value to the reference given. |
store_m128d | sse2 Stores the value to the reference given. |
store_m128d_s | sse2 Stores the low lane value to the reference given. |
store_m128i | sse2 Stores the value to the reference given. |
store_m256d | avx Store data from a register into memory. |
store_m256 | avx Store data from a register into memory. |
store_m256i | avx Store data from a register into memory. |
store_masked_i32_m128i | avx2 Stores the |
store_masked_i32_m256i | avx2 Stores the |
store_masked_i64_m128i | avx2 Stores the |
store_masked_i64_m256i | avx2 Stores the |
store_masked_m128d | avx Store data from a register into memory according to a mask. |
store_masked_m128 | avx Store data from a register into memory according to a mask. |
store_masked_m256d | avx Store data from a register into memory according to a mask. |
store_masked_m256 | avx Store data from a register into memory according to a mask. |
store_reverse_m128 | sse Stores the value to the reference given in reverse order. |
store_reversed_m128d | sse2 Stores the value to the reference given. |
store_splat_m128 | sse Stores the low lane value to all lanes of the reference given. |
store_splat_m128d | sse2 Stores the low lane value to all lanes of the reference given. |
store_unaligned_hi_lo_m256d | avx Store data from a register into memory. |
store_unaligned_hi_lo_m256 | avx Store data from a register into memory. |
store_unaligned_hi_lo_m256i | avx Store data from a register into memory. |
store_unaligned_m128 | sse Stores the value to the reference given. |
store_unaligned_m128d | sse2 Stores the value to the reference given. |
store_unaligned_m128i | sse2 Stores the value to the reference given. |
store_unaligned_m256d | avx Store data from a register into memory. |
store_unaligned_m256 | avx Store data from a register into memory. |
store_unaligned_m256i | avx Store data from a register into memory. |
sub_horizontal_i16_m128i | ssse3 Subtract horizontal pairs of |
sub_horizontal_i16_m256i | avx2 Horizontal |
sub_horizontal_i32_m128i | ssse3 Subtract horizontal pairs of |
sub_horizontal_i32_m256i | avx2 Horizontal |
sub_horizontal_m128d | sse3 Subtract each lane horizontally, pack the outputs as |
sub_horizontal_m128 | sse3 Subtract each lane horizontally, pack the outputs as |
sub_horizontal_m256d | avx Subtract adjacent |
sub_horizontal_m256 | avx Subtract adjacent |
sub_horizontal_saturating_i16_m128i | ssse3 Subtract horizontal pairs of |
sub_horizontal_saturating_i16_m256i | avx2 Horizontal saturating |
sub_i16_m128i | sse2 Lanewise |
sub_i16_m256i | avx2 Lanewise |
sub_i32_m128i | sse2 Lanewise |
sub_i32_m256i | avx2 Lanewise |
sub_i64_m128i | sse2 Lanewise |
sub_i64_m256i | avx2 Lanewise |
sub_i8_m128i | sse2 Lanewise |
sub_i8_m256i | avx2 Lanewise |
sub_m128 | sse Lanewise |
sub_m128_s | sse Low lane |
sub_m128d | sse2 Lanewise |
sub_m128d_s | sse2 Lowest lane |
sub_m256d | avx Lanewise |
sub_m256 | avx Lanewise |
sub_saturating_i16_m128i | sse2 Lanewise saturating |
sub_saturating_i16_m256i | avx2 Lanewise saturating |
sub_saturating_i8_m128i | sse2 Lanewise saturating |
sub_saturating_i8_m256i | avx2 Lanewise saturating |
sub_saturating_u16_m128i | sse2 Lanewise saturating |
sub_saturating_u16_m256i | avx2 Lanewise saturating |
sub_saturating_u8_m128i | sse2 Lanewise saturating |
sub_saturating_u8_m256i | avx2 Lanewise saturating |
sum_of_u8_abs_diff_m128i | sse2 Compute "sum of |
sum_of_u8_abs_diff_m256i | avx2 Compute "sum of |
test_all_ones_m128i | sse4.1 Tests if all bits are 1. |
test_all_zeroes_m128i | sse4.1 Returns if all masked bits are 0, |
test_mixed_ones_and_zeroes_m128i | sse4.1 Returns if, among the masked bits, there's both 0s and 1s |
trailing_zero_count_u32 | bmi1 Counts the number of trailing zero bits in a |
trailing_zero_count_u64 | bmi1 Counts the number of trailing zero bits in a |
transpose_four_m128 | sse Transpose four |
truncate_m128_to_m128i | sse2 Truncate the |
truncate_m128d_to_m128i | sse2 Truncate the |
truncate_to_i32_m128d_s | sse2 Truncate the lower lane into an |
truncate_to_i64_m128d_s | sse2 Truncate the lower lane into an |
unpack_hi_m256d | avx Unpack and interleave the high lanes. |
unpack_hi_m256 | avx Unpack and interleave the high lanes. |
unpack_high_i16_m128i | sse2 Unpack and interleave high |
unpack_high_i16_m256i | avx2 Unpack and interleave high |
unpack_high_i32_m128i | sse2 Unpack and interleave high |
unpack_high_i32_m256i | avx2 Unpack and interleave high |
unpack_high_i64_m128i | sse2 Unpack and interleave high |
unpack_high_i64_m256i | avx2 Unpack and interleave high |
unpack_high_i8_m128i | sse2 Unpack and interleave high |
unpack_high_i8_m256i | avx2 Unpack and interleave high |
unpack_high_m128 | sse Unpack and interleave high lanes of |
unpack_high_m128d | sse2 Unpack and interleave high lanes of |
unpack_lo_m256d | avx Unpack and interleave the high lanes. |
unpack_lo_m256 | avx Unpack and interleave the high lanes. |
unpack_low_i16_m128i | sse2 Unpack and interleave low |
unpack_low_i16_m256i | avx2 Unpack and interleave low |
unpack_low_i32_m128i | sse2 Unpack and interleave low |
unpack_low_i32_m256i | avx2 Unpack and interleave low |
unpack_low_i64_m128i | sse2 Unpack and interleave low |
unpack_low_i64_m256i | avx2 Unpack and interleave low |
unpack_low_i8_m128i | sse2 Unpack and interleave low |
unpack_low_i8_m256i | avx2 Unpack and interleave low |
unpack_low_m128 | sse Unpack and interleave low lanes of |
unpack_low_m128d | sse2 Unpack and interleave low lanes of |
zero_extend_m128d | avx Zero extend an |
zero_extend_m128 | avx Zero extend an |
zero_extend_m128i | avx Zero extend an |
zeroed_m128 | sse All lanes zero. |
zeroed_m128i | sse2 All lanes zero. |
zeroed_m128d | sse2 Both lanes zero. |
zeroed_m256d | avx A zeroed |
zeroed_m256 | avx A zeroed |
zeroed_m256i | avx A zeroed |