Expand description
A crate that safely exposes arch intrinsics via #[cfg()]
.
safe_arch
lets you safely use CPU intrinsics. Those things in the
core::arch
modules. It works purely via #[cfg()]
and
compile time CPU feature declaration. If you want to check for a feature at
runtime and then call an intrinsic or use a fallback path based on that then
this crate is sadly not for you.
SIMD register types are “newtype’d” so that better trait impls can be given
to them, but the inner value is a pub
field so feel free to just grab it
out if you need to. Trait impls of the newtypes include: Default
(zeroed),
From
/Into
of appropriate data types, and appropriate operator
overloading.
- Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
- Some intrinsics take a pointer of an assumed minimum alignment and
validity span. For these, the
safe_arch
function takes a reference of an appropriate type to uphold safety.- Try the bytemuck crate (and turn on the
bytemuck
feature of this crate) if you want help safely casting between reference types.
- Try the bytemuck crate (and turn on the
- Some intrinsics are not safe unless you’re very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren’t exposed here.
- Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn’t really support you messing with that within a high level language, so those operations aren’t exposed here. Use assembly or something if you want to do that.
§Naming Conventions
The safe_arch
crate does not simply use the “official” names for each
intrinsic, because the official names are generally poor. Instead, the
operations have been given better names that makes things hopefully easier
to understand then you’re reading the code.
For a full explanation of the naming used, see the Naming Conventions page.
§Current Support
x86
/x86_64
(Intel, AMD, etc)- 128-bit:
sse
,sse2
,sse3
,ssse3
,sse4.1
,sse4.2
- 256-bit:
avx
,avx2
- Other:
adx
,aes
,bmi1
,bmi2
,fma
,lzcnt
,pclmulqdq
,popcnt
,rdrand
,rdseed
- 128-bit:
§Compile Time CPU Target Features
At the time of me writing this, Rust enables the sse
and sse2
CPU
features by default for all i686
(x86) and x86_64
builds. Those CPU
features are built into the design of x86_64
, and you’d need a super old
x86
CPU for it to not support at least sse
and sse2
, so they’re a safe
bet for the language to enable all the time. In fact, because the standard
library is compiled with them enabled, simply trying to disable those
features would actually cause ABI issues and fill your program with UB
(link).
If you want additional CPU features available at compile time you’ll have to
enable them with an additional arg to rustc
. For a feature named name
you pass -C target-feature=+name
, such as -C target-feature=+sse3
for
sse3
.
You can alternately enable all target features of the current CPU with -C target-cpu=native
. This is primarily of use if you’re building a program
you’ll only run on your own system.
It’s sometimes hard to know if your target platform will support a given
feature set, but the Steam Hardware Survey is generally
taken as a guide to what you can expect people to have available. If you
click “Other Settings” it’ll expand into a list of CPU target features and
how common they are. These days, it seems that sse3
can be safely assumed,
and ssse3
, sse4.1
, and sse4.2
are pretty safe bets as well. The stuff
above 128-bit isn’t as common yet, give it another few years.
Please note that executing a program on a CPU that doesn’t support the target features it was compiles for is Undefined Behavior.
Currently, Rust doesn’t actually support an easy way for you to check that a
feature enabled at compile time is actually available at runtime. There is
the “feature_detected” family of macros, but if you
enable a feature they will evaluate to a constant true
instead of actually
deferring the check for the feature to runtime. This means that, if you
did want a check at the start of your program, to confirm that all the
assumed features are present and error out when the assumptions don’t hold,
you can’t use that macro. You gotta use CPUID and check manually. rip.
Hopefully we can make that process easier in a future version of this crate.
§A Note On Working With Cfg
There’s two main ways to use cfg
:
- Via an attribute placed on an item, block, or expression:
#[cfg(debug_assertions)] println!("hello");
- Via a macro used within an expression position:
if cfg!(debug_assertions) { println!("hello"); }
The difference might seem small but it’s actually very important:
- The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don’t always exist as long as the things they name do exist whenever that code is configured into the build.
- The macro form will include the configured code no matter what, and then
the macro resolves to a constant
true
orfalse
and the compiler uses dead code elimination to cut out the path not taken.
This crate uses cfg
via the attribute, so the functions it exposes don’t
exist at all when the appropriate CPU target features aren’t enabled.
Accordingly, if you plan to call this crate or not depending on what
features are enabled in the build you’ll also need to control your use of
this crate via cfg attribute, not cfg macro.
Modules§
- An explanation of the crate’s naming conventions.
Macros§
- cmp_op
avx
Turns a comparison operator token to the correct constant value. - round_op
avx
Turns a round operator token to the correct constant value.
Structs§
- The data for a 128-bit SSE register of four
f32
lanes. - The data for a 128-bit SSE register of two
f64
values. - The data for a 128-bit SSE register of integer data.
- The data for a 256-bit AVX register of eight
f32
lanes. - The data for a 256-bit AVX register of four
f64
values. - The data for a 256-bit AVX register of integer data.
Constants§
- Return the bitwise mask of matches.
- Matches when any haystack character equals any needle character, regardless of position.
- Matches when a character position in the needle is equal to the character at the same position in the haystack.
- Matches when the complete needle string is a substring somewhere in the haystack.
- Return the index of the first match found.
- string segment elements are i8 values
- string segment elements are i16 values
- Return the index of the last match found.
- Interprets consecutive pairs of characters in the needle as
(low..=high)
ranges to compare each haystack character to. - string segment elements are u8 values
- string segment elements are u16 values
- Return the lanewise mask of matches.
Functions§
- abs_i8_m128i
ssse3
Lanewise absolute value with lanes asi8
. - abs_i8_m256i
avx2
Absolute value ofi8
lanes. - abs_i16_m128i
ssse3
Lanewise absolute value with lanes asi16
. - abs_i16_m256i
avx2
Absolute value ofi16
lanes. - abs_i32_m128i
ssse3
Lanewise absolute value with lanes asi32
. - abs_i32_m256i
avx2
Absolute value ofi32
lanes. - Add two
u32
with a carry value. - Add two
u64
with a carry value. - Add horizontal pairs of
i16
values, pack the outputs asa
thenb
. - Horizontal
a + b
with lanes asi16
. - Add horizontal pairs of
i32
values, pack the outputs asa
thenb
. - Horizontal
a + b
with lanes asi32
. - Add each lane horizontally, pack the outputs as
a
thenb
. - Add each lane horizontally, pack the outputs as
a
thenb
. - Add adjacent
f32
lanes. - Add adjacent
f64
lanes. - Add horizontal pairs of
i16
values, saturating, pack the outputs asa
thenb
. - Horizontal saturating
a + b
with lanes asi16
. - add_i8_m128i
sse2
Lanewisea + b
with lanes asi8
. - add_i8_m256i
avx2
Lanewisea + b
with lanes asi8
. - add_i16_m128i
sse2
Lanewisea + b
with lanes asi16
. - add_i16_m256i
avx2
Lanewisea + b
with lanes asi16
. - add_i32_m128i
sse2
Lanewisea + b
with lanes asi32
. - add_i32_m256i
avx2
Lanewisea + b
with lanes asi32
. - add_i64_m128i
sse2
Lanewisea + b
with lanes asi64
. - add_i64_m256i
avx2
Lanewisea + b
with lanes asi64
. - add_m128
sse
Lanewisea + b
. - add_m128_s
sse
Low lanea + b
, other lanes unchanged. - add_m128d
sse2
Lanewisea + b
. - add_m128d_s
sse2
Lowest lanea + b
, high lane unchanged. - add_m256
avx
Lanewisea + b
withf32
lanes. - add_m256d
avx
Lanewisea + b
withf64
lanes. - Lanewise saturating
a + b
with lanes asi8
. - Lanewise saturating
a + b
with lanes asi8
. - Lanewise saturating
a + b
with lanes asi16
. - Lanewise saturating
a + b
with lanes asi16
. - Lanewise saturating
a + b
with lanes asu8
. - Lanewise saturating
a + b
with lanes asu8
. - Lanewise saturating
a + b
with lanes asu16
. - Lanewise saturating
a + b
with lanes asu16
. - addsub_m128
sse3
Alternately, from the top, add a lane and then subtract a lane. - addsub_m128d
sse3
Add the high lane and subtract the low lane. - addsub_m256
avx
Alternately, from the top, addf32
then subf32
. - addsub_m256d
avx
Alternately, from the top, addf64
then subf64
. - Perform the last round of an AES decryption flow on
a
using theround_key
. - Perform one round of an AES decryption flow on
a
using theround_key
. - Perform the last round of an AES encryption flow on
a
using theround_key
. - Perform one round of an AES encryption flow on
a
using theround_key
. - Perform the InvMixColumns transform on
a
. - Assist in expanding an AES cipher key.
- average_u8_m128i
sse2
Lanewise average of theu8
values. - average_u8_m256i
avx2
Averageu8
lanes. - Lanewise average of the
u16
values. - Average
u16
lanes. - bit_extract2_u32
bmi1
Extract a span of bits from theu32
, control value style. - bit_extract2_u64
bmi1
Extract a span of bits from theu64
, control value style. - bit_extract_u32
bmi1
Extract a span of bits from theu32
, start and len style. - bit_extract_u64
bmi1
Extract a span of bits from theu64
, start and len style. - Gets the mask of all bits up to and including the lowest set bit in a
u32
. - Gets the mask of all bits up to and including the lowest set bit in a
u64
. - Resets (clears) the lowest set bit.
- Resets (clears) the lowest set bit.
- Gets the value of the lowest set bit in a
u32
. - Gets the value of the lowest set bit in a
u64
. - Zero out all high bits in a
u32
starting at the index given. - Zero out all high bits in a
u64
starting at the index given. - bitand_m128
sse
Bitwisea & b
. - bitand_m128d
sse2
Bitwisea & b
. - bitand_m128i
sse2
Bitwisea & b
. - bitand_m256
avx
Bitwisea & b
. - bitand_m256d
avx
Bitwisea & b
. - bitand_m256i
avx2
Bitwisea & b
. - Bitwise
(!a) & b
. - bitandnot_m128d
sse2
Bitwise(!a) & b
. - bitandnot_m128i
sse2
Bitwise(!a) & b
. - Bitwise
(!a) & b
. - Bitwise
(!a) & b
. - bitandnot_m256i
avx2
Bitwise(!a) & b
. - bitandnot_u32
bmi1
Bitwise(!a) & b
foru32
- bitandnot_u64
bmi1
Bitwise(!a) & b
foru64
- bitor_m128
sse
Bitwisea | b
. - bitor_m128d
sse2
Bitwisea | b
. - bitor_m128i
sse2
Bitwisea | b
. - bitor_m256
avx
Bitwisea | b
. - bitor_m256d
avx
Bitwisea | b
. - bitor_m256i
avx2
Bitwisea | b
- bitxor_m128
sse
Bitwisea ^ b
. - bitxor_m128d
sse2
Bitwisea ^ b
. - bitxor_m128i
sse2
Bitwisea ^ b
. - bitxor_m256
avx
Bitwisea ^ b
. - bitxor_m256d
avx
Bitwisea ^ b
. - bitxor_m256i
avx2
Bitwisea ^ b
. - Blends the
i16
lanes according to the immediate mask. - Blends the
i16
lanes according to the immediate value. - Blends the
i32
lanes ina
andb
into a single value. - Blends the
i32
lanes according to the immediate value. - Blends the lanes according to the immediate mask.
- Blends the
i16
lanes according to the immediate mask. - Blends the
f32
lanes according to the immediate mask. - Blends the
f64
lanes according to the immediate mask. - blend_varying_i8_m128i
sse4.1
Blend thei8
lanes according to a runtime varying mask. - Blend
i8
lanes according to a runtime varying mask. - blend_varying_m128
sse4.1
Blend the lanes according to a runtime varying mask. - blend_varying_m128d
sse4.1
Blend the lanes according to a runtime varying mask. - Blend the lanes according to a runtime varying mask.
- Blend the lanes according to a runtime varying mask.
- Shifts all bits in the entire register left by a number of bytes.
- Shifts each
u128
lane left by a number of bytes. - Shifts all bits in the entire register right by a number of bytes.
- Shifts each
u128
lane right by a number of bytes. - Swap the bytes of the given 32-bit value.
- Swap the bytes of the given 64-bit value.
- Bit-preserving cast to
m128
fromm128d
- Bit-preserving cast to
m128
fromm128i
- Bit-preserving cast to
m128
fromm256
. - Bit-preserving cast to
m128d
fromm128
- Bit-preserving cast to
m128d
fromm128i
- Bit-preserving cast to
m128d
fromm256d
. - Bit-preserving cast to
m128i
fromm128
- Bit-preserving cast to
m128i
fromm128d
- Bit-preserving cast to
m128i
fromm256i
. - Bit-preserving cast to
m256
fromm256d
. - Bit-preserving cast to
m256
fromm256i
. - Bit-preserving cast to
m256i
fromm256
. - Bit-preserving cast to
m256d
fromm256i
. - Bit-preserving cast to
m256i
fromm256
. - Bit-preserving cast to
m256i
fromm256d
. - ceil_m128
sse4.1
Round each lane to a whole number, towards positive infinity. - ceil_m128_s
sse4.1
Round the low lane ofb
toward positive infinity, other lanesa
. - ceil_m128d
sse4.1
Round each lane to a whole number, towards positive infinity. - ceil_m128d_s
sse4.1
Round the low lane ofb
toward positive infinity, high lane isa
. - ceil_m256
avx
Roundf32
lanes towards positive infinity. - ceil_m256d
avx
Roundf64
lanes towards positive infinity. - Low lane equality.
- Low lane
f64
equal to. - Lanewise
a == b
with lanes asi8
. - Compare
i8
lanes for equality, mask output. - Lanewise
a == b
with lanes asi16
. - Compare
i16
lanes for equality, mask output. - Lanewise
a == b
with lanes asi32
. - Compare
i32
lanes for equality, mask output. - cmp_eq_mask_i64_m128i
sse4.1
Lanewisea == b
with lanes asi64
. - Compare
i64
lanes for equality, mask output. - Lanewise
a == b
. - Low lane
a == b
, other lanes unchanged. - Lanewise
a == b
, mask output. - Low lane
a == b
, other lanes unchanged. - Low lane greater than or equal to.
- Low lane
f64
greater than or equal to. - Lanewise
a >= b
. - Low lane
a >= b
, other lanes unchanged. - Lanewise
a >= b
. - Low lane
a >= b
, other lanes unchanged. - Low lane greater than.
- Low lane
f64
greater than. - Lanewise
a > b
with lanes asi8
. - Compare
i8
lanes fora > b
, mask output. - Lanewise
a > b
with lanes asi16
. - Compare
i16
lanes fora > b
, mask output. - Lanewise
a > b
with lanes asi32
. - Compare
i32
lanes fora > b
, mask output. - cmp_gt_mask_i64_m128i
sse4.2
Lanewisea > b
with lanes asi64
. - Compare
i64
lanes fora > b
, mask output. - Lanewise
a > b
. - Low lane
a > b
, other lanes unchanged. - Lanewise
a > b
. - Low lane
a > b
, other lanes unchanged. - Low lane less than or equal to.
- Low lane
f64
less than or equal to. - Lanewise
a <= b
. - Low lane
a <= b
, other lanes unchanged. - Lanewise
a <= b
. - Low lane
a <= b
, other lanes unchanged. - Low lane less than.
- Low lane
f64
less than. - Lanewise
a < b
with lanes asi8
. - Lanewise
a < b
with lanes asi16
. - Lanewise
a < b
with lanes asi32
. - Lanewise
a < b
. - Low lane
a < b
, other lanes unchanged. - Lanewise
a < b
. - Low lane
a < b
, other lane unchanged. - Low lane not equal to.
- Low lane
f64
less than. - Lanewise
a != b
. - Low lane
a != b
, other lanes unchanged. - Lanewise
a != b
. - Low lane
a != b
, other lane unchanged. - Lanewise
!(a >= b)
. - Low lane
!(a >= b)
, other lanes unchanged. - Lanewise
!(a >= b)
. - Low lane
!(a >= b)
, other lane unchanged. - Lanewise
!(a > b)
. - Low lane
!(a > b)
, other lanes unchanged. - Lanewise
!(a > b)
. - Low lane
!(a > b)
, other lane unchanged. - Lanewise
!(a <= b)
. - Low lane
!(a <= b)
, other lanes unchanged. - Lanewise
!(a <= b)
. - Low lane
!(a <= b)
, other lane unchanged. - Lanewise
!(a < b)
. - Low lane
!(a < b)
, other lanes unchanged. - Lanewise
!(a < b)
. - Low lane
!(a < b)
, other lane unchanged. - Compare
f32
lanes according to the operation specified, mask output. - Compare
f32
lanes according to the operation specified, mask output. - Compare
f64
lanes according to the operation specified, mask output. - Compare
f64
lanes according to the operation specified, mask output. - Compare
f32
lanes according to the operation specified, mask output. - Compare
f64
lanes according to the operation specified, mask output. - Lanewise
(!a.is_nan()) & (!b.is_nan())
. - Low lane
(!a.is_nan()) & (!b.is_nan())
, other lanes unchanged. - Lanewise
(!a.is_nan()) & (!b.is_nan())
. - Low lane
(!a.is_nan()) & (!b.is_nan())
, other lane unchanged. - Lanewise
a.is_nan() | b.is_nan()
. - Low lane
a.is_nan() | b.is_nan()
, other lanes unchanged. - Lanewise
a.is_nan() | b.is_nan()
. - Low lane
a.is_nan() | b.is_nan()
, other lane unchanged. - Counts
$a
as the high bytes and$b
as the low bytes then performs a byte shift to the right by the immediate value. - Works like
combined_byte_shr_imm_m128i
, but twice as wide. - Convert
i32
tof32
and replace the low lane of the input. - Convert
i32
tof64
and replace the low lane of the input. - Convert
i64
tof64
and replace the low lane of the input. - Converts the lower
f32
tof64
and replace the low lane of the input - Converts the low
f64
tof32
and replaces the low lane of the input. - Convert the lowest
f32
lane to a singlef32
. - Convert the lowest
f64
lane to a singlef64
. - Convert the lower two
i64
lanes to twoi32
lanes. - Convert the lower eight
i8
lanes to eighti16
lanes. - Convert
i8
values toi16
values. - Convert lower 4
u8
values toi16
values. - Convert lower 8
u8
values toi16
values. - Convert
u8
values toi16
values. - Convert the lowest
i32
lane to a singlei32
. - Convert the lower four
i8
lanes to fouri32
lanes. - Convert the lower four
i16
lanes to fouri32
lanes. - Rounds the
f32
lanes toi32
lanes. - Rounds the two
f64
lanes to the low twoi32
lanes. - Convert
f64
lanes to bei32
lanes. - Convert
i16
values toi32
values. - Convert the lower 8
i8
values toi32
values. - Convert
f32
lanes to bei32
lanes. - Convert
u16
values toi32
values. - Convert the lower two
i8
lanes to twoi64
lanes. - Convert the lower two
i32
lanes to twoi64
lanes. - Convert
i32
values toi64
values. - Convert the lower 4
i8
values toi64
values. - Convert
i16
values toi64
values. - Convert
u16
values toi64
values. - Convert
u32
values toi64
values. - Rounds the four
i32
lanes to fourf32
lanes. - Rounds the two
f64
lanes to the low twof32
lanes. - Convert
f64
lanes to bef32
lanes. - Rounds the lower two
i32
lanes to twof64
lanes. - Rounds the two
f64
lanes to the low twof32
lanes. - Convert
i32
lanes to bef32
lanes. - Convert
i32
lanes to bef64
lanes. - Convert
f32
lanes to bef64
lanes. - Convert the lower eight
u8
lanes to eightu16
lanes. - Convert the lower four
u8
lanes to fouru32
lanes. - Convert the lower four
u16
lanes to fouru32
lanes. - Convert the lower two
u8
lanes to twou64
lanes. - Convert the lower two
u16
lanes to twou64
lanes. - Convert the lower two
u32
lanes to twou64
lanes. - Convert
f64
lanes toi32
lanes with truncation. - Convert
f32
lanes toi32
lanes with truncation. - copy_i64_m128i_s
sse2
Copy the lowi64
lane to a new register, upper bits 0. - Copies the
a
value and replaces the low lane with the lowb
value. - crc32_u8
sse4.2
Accumulates theu8
into a running CRC32 value. - crc32_u16
sse4.2
Accumulates theu16
into a running CRC32 value. - crc32_u32
sse4.2
Accumulates theu32
into a running CRC32 value. - crc32_u64
sse4.2
Accumulates theu64
into a running CRC32 value. - div_m128
sse
Lanewisea / b
. - div_m128_s
sse
Low lanea / b
, other lanes unchanged. - div_m128d
sse2
Lanewisea / b
. - div_m128d_s
sse2
Lowest lanea / b
, high lane unchanged. - div_m256
avx
Lanewisea / b
withf32
. - div_m256d
avx
Lanewisea / b
withf64
. - dot_product_m128
sse4.1
Performs a dot product of twom128
registers. - dot_product_m128d
sse4.1
Performs a dot product of twom128d
registers. - This works like
dot_product_m128
, but twice as wide. - Duplicate the odd lanes to the even lanes.
- Duplicate the even-indexed lanes to the odd lanes.
- Copy the low lane of the input to both lanes of the output.
- Duplicate the odd lanes to the even lanes.
- Duplicate the odd-indexed lanes to the even lanes.
- Duplicate the odd-indexed lanes to the even lanes.
- Gets the
f32
lane requested. Returns as ani32
bit pattern. - Gets the
i8
lane requested. Only the lowest 4 bits are considered. - Gets an
i8
value out of anm256i
, returns asi32
. - Gets an
i16
value out of anm128i
, returns asi32
. - Gets an
i16
value out of anm256i
, returns asi32
. - Extracts an
i32
lane fromm256i
- extract_i32_imm_m128i
sse4.1
Gets thei32
lane requested. Only the lowest 2 bits are considered. - Extracts an
i64
lane fromm256i
- extract_i64_imm_m128i
sse4.1
Gets thei64
lane requested. Only the lowest bit is considered. - Extracts an
m128
fromm256
- Extracts an
m128d
fromm256d
- Extracts an
m128i
fromm256i
- Gets an
m128i
value out of anm256i
. - floor_m128
sse4.1
Round each lane to a whole number, towards negative infinity - floor_m128_s
sse4.1
Round the low lane ofb
toward negative infinity, other lanesa
. - floor_m128d
sse4.1
Round each lane to a whole number, towards negative infinity - floor_m128d_s
sse4.1
Round the low lane ofb
toward negative infinity, high lane isa
. - floor_m256
avx
Roundf32
lanes towards negative infinity. - floor_m256d
avx
Roundf64
lanes towards negative infinity. - Lanewise fused
(a * b) + c
- Low lane fused
(a * b) + c
, other lanes unchanged - Lanewise fused
(a * b) + c
- Low lane fused
(a * b) + c
, other lanes unchanged - Lanewise fused
(a * b) + c
- Lanewise fused
(a * b) + c
- Lanewise fused
(a * b) addsub c
(adds odd lanes and subtracts even lanes) - Lanewise fused
(a * b) addsub c
(adds odd lanes and subtracts even lanes) - Lanewise fused
(a * b) addsub c
(adds odd lanes and subtracts even lanes) - Lanewise fused
(a * b) addsub c
(adds odd lanes and subtracts even lanes) - Lanewise fused
-(a * b) + c
- Low lane
-(a * b) + c
, other lanes unchanged. - Lanewise fused
-(a * b) + c
- Low lane
-(a * b) + c
, other lanes unchanged. - Lanewise fused
-(a * b) + c
- Lanewise fused
-(a * b) + c
- Lanewise fused
-(a * b) - c
- Low lane fused
-(a * b) - c
, other lanes unchanged. - Lanewise fused
-(a * b) - c
- Low lane fused
-(a * b) - c
, other lanes unchanged. - Lanewise fused
-(a * b) - c
- Lanewise fused
-(a * b) - c
- Lanewise fused
(a * b) - c
- Low lane fused
(a * b) - c
, other lanes unchanged. - Lanewise fused
(a * b) - c
- Low lane fused
(a * b) - c
, other lanes unchanged. - Lanewise fused
(a * b) - c
- Lanewise fused
(a * b) - c
- Lanewise fused
(a * b) subadd c
(subtracts odd lanes and adds even lanes) - Lanewise fused
(a * b) subadd c
(subtracts odd lanes and adds even lanes) - Lanewise fused
(a * b) subadd c
(subtracts odd lanes and adds even lanes) - Lanewise fused
(a * b) subadd c
(subtracts odd lanes and adds even lanes) - Gets the low lane as an individual
f32
value. - Gets the lower lane as an
f64
value. - Converts the low lane to
i32
and extracts as an individual value. - Converts the lower lane to an
i32
value. - Converts the lower lane to an
i32
value. - Converts the lower lane to an
i64
value. - Converts the lower lane to an
i64
value. - insert_f32_imm_m128
sse4.1
Inserts a lane from$b
into$a
, optionally at a new position. - insert_i8_imm_m128i
sse4.1
Inserts a new value for thei64
lane specified. - Inserts an
i8
tom256i
- Inserts the low 16 bits of an
i32
value into anm128i
. - Inserts an
i16
tom256i
- insert_i32_imm_m128i
sse4.1
Inserts a new value for thei32
lane specified. - Inserts an
i32
tom256i
- insert_i64_imm_m128i
sse4.1
Inserts a new value for thei64
lane specified. - Inserts an
i64
tom256i
- Inserts an
m128
tom256
- Inserts an
m128d
tom256d
- Inserts an
m128i
to anm256i
at the high or low position. - Slowly inserts an
m128i
tom256i
. - Count the leading zeroes in a
u32
. - Count the leading zeroes in a
u64
. - Loads the
f32
reference into the low lane of the register. - Loads the
f32
reference into all lanes of a register. - Load an
f32
and splat it to all lanes of anm256d
- load_f64_m128d_s
sse2
Loads the reference into the low lane of the register. - Loads the
f64
reference into all lanes of a register. - Load an
f64
and splat it to all lanes of anm256d
- load_i64_m128i_s
sse2
Loads the lowi64
into a register. - load_m128
sse
Loads the reference into a register. - Load an
m128
and splat it to the lower and upper half of anm256
- load_m128d
sse2
Loads the reference into a register. - Load an
m128d
and splat it to the lower and upper half of anm256d
- load_m128i
sse2
Loads the reference into a register. - load_m256
avx
Load data from memory into a register. - load_m256d
avx
Load data from memory into a register. - load_m256i
avx
Load data from memory into a register. - Loads the reference given and zeroes any
i32
lanes not in the mask. - Loads the reference given and zeroes any
i32
lanes not in the mask. - Loads the reference given and zeroes any
i64
lanes not in the mask. - Loads the reference given and zeroes any
i64
lanes not in the mask. - Load data from memory into a register according to a mask.
- Load data from memory into a register according to a mask.
- Load data from memory into a register according to a mask.
- Load data from memory into a register according to a mask.
- Loads the reference into a register, replacing the high lane.
- Loads the reference into a register, replacing the low lane.
- Loads the reference into a register with reversed order.
- Loads the reference into a register with reversed order.
- Load data from memory into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Loads the reference into a register.
- Loads the reference into a register.
- Loads the reference into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- max_i8_m128i
sse4.1
Lanewisemax(a, b)
with lanes asi8
. - max_i8_m256i
avx2
Lanewisemax(a, b)
with lanes asi8
. - max_i16_m128i
sse2
Lanewisemax(a, b)
with lanes asi16
. - max_i16_m256i
avx2
Lanewisemax(a, b)
with lanes asi16
. - max_i32_m128i
sse4.1
Lanewisemax(a, b)
with lanes asi32
. - max_i32_m256i
avx2
Lanewisemax(a, b)
with lanes asi32
. - max_m128
sse
Lanewisemax(a, b)
. - max_m128_s
sse
Low lanemax(a, b)
, other lanes unchanged. - max_m128d
sse2
Lanewisemax(a, b)
. - max_m128d_s
sse2
Low lanemax(a, b)
, other lanes unchanged. - max_m256
avx
Lanewisemax(a, b)
. - max_m256d
avx
Lanewisemax(a, b)
. - max_u8_m128i
sse2
Lanewisemax(a, b)
with lanes asu8
. - max_u8_m256i
avx2
Lanewisemax(a, b)
with lanes asu8
. - max_u16_m128i
sse4.1
Lanewisemax(a, b)
with lanes asu16
. - max_u16_m256i
avx2
Lanewisemax(a, b)
with lanes asu16
. - max_u32_m128i
sse4.1
Lanewisemax(a, b)
with lanes asu32
. - max_u32_m256i
avx2
Lanewisemax(a, b)
with lanes asu32
. - min_i8_m128i
sse4.1
Lanewisemin(a, b)
with lanes asi8
. - min_i8_m256i
avx2
Lanewisemin(a, b)
with lanes asi8
. - min_i16_m128i
sse2
Lanewisemin(a, b)
with lanes asi16
. - min_i16_m256i
avx2
Lanewisemin(a, b)
with lanes asi16
. - min_i32_m128i
sse4.1
Lanewisemin(a, b)
with lanes asi32
. - min_i32_m256i
avx2
Lanewisemin(a, b)
with lanes asi32
. - min_m128
sse
Lanewisemin(a, b)
. - min_m128_s
sse
Low lanemin(a, b)
, other lanes unchanged. - min_m128d
sse2
Lanewisemin(a, b)
. - min_m128d_s
sse2
Low lanemin(a, b)
, other lanes unchanged. - min_m256
avx
Lanewisemin(a, b)
. - min_m256d
avx
Lanewisemin(a, b)
. - min_position_u16_m128i
sse4.1
Minu16
value, position, and other lanes zeroed. - min_u8_m128i
sse2
Lanewisemin(a, b)
with lanes asu8
. - min_u8_m256i
avx2
Lanewisemin(a, b)
with lanes asu8
. - min_u16_m128i
sse4.1
Lanewisemin(a, b)
with lanes asu16
. - min_u16_m256i
avx2
Lanewisemin(a, b)
with lanes asu16
. - min_u32_m128i
sse4.1
Lanewisemin(a, b)
with lanes asu32
. - min_u32_m256i
avx2
Lanewisemin(a, b)
with lanes asu32
. - Move the high lanes of
b
to the low lanes ofa
, other lanes unchanged. - Move the low lanes of
b
to the high lanes ofa
, other lanes unchanged. - move_m128_s
sse
Move the low lane ofb
toa
, other lanes unchanged. - Gathers the
i8
sign bit of each lane. - Create an
i32
mask of each sign bit in thei8
lanes. - Gathers the sign bit of each lane.
- move_mask_m128d
sse2
Gathers the sign bit of each lane. - Collects the sign bit of each lane into a 4-bit value.
- Collects the sign bit of each lane into a 4-bit value.
- mul_32_m128i
sse4.1
Lanewisea * b
with 32-bit lanes. - mul_extended_u32
bmi2
Multiply twou32
, outputting the low bits and storing the high bits in the reference. - mul_extended_u64
bmi2
Multiply twou64
, outputting the low bits and storing the high bits in the reference. - Multiply
i16
lanes producingi32
values, horizontal add pairs ofi32
values to produce the final output. - Multiply
i16
lanes producingi32
values, horizontal add pairs ofi32
values to produce the final output. - Lanewise
a * b
with lanes asi16
, keep the high bits of thei32
intermediates. - Multiply the
i16
lanes and keep the high half of each 32-bit output. - Lanewise
a * b
with lanes asi16
, keep the low bits of thei32
intermediates. - Multiply the
i16
lanes and keep the low half of each 32-bit output. - Multiply
i16
lanes intoi32
intermediates, keep the high 18 bits, round by adding 1, right shift by 1. - Multiply
i16
lanes intoi32
intermediates, keep the high 18 bits, round by adding 1, right shift by 1. - Multiply the
i32
lanes and keep the low half of each 64-bit output. - Performs a “carryless” multiplication of two
i64
values. - Multiply the lower
i32
within eachi64
lane,i64
output. - mul_m128
sse
Lanewisea * b
. - mul_m128_s
sse
Low lanea * b
, other lanes unchanged. - mul_m128d
sse2
Lanewisea * b
. - mul_m128d_s
sse2
Lowest lanea * b
, high lane unchanged. - mul_m256
avx
Lanewisea * b
withf32
lanes. - mul_m256d
avx
Lanewisea * b
withf64
lanes. - This is dumb and weird.
- This is dumb and weird.
- Lanewise
a * b
with lanes asu16
, keep the high bits of theu32
intermediates. - Multiply the
u16
lanes and keep the high half of each 32-bit output. - Multiply the lower
u32
within eachu64
lane,u64
output. - mul_widen_i32_odd_m128i
sse4.1
Multiplies the oddi32
lanes and gives the widened (i64
) results. - Multiplies the odd
u32
lanes and gives the widened (u64
) results. - Computes eight
u16
“sum of absolute difference” values according to the bytes selected. - Computes eight
u16
“sum of absolute difference” values according to the bytes selected. - Saturating convert
i16
toi8
, and pack the values. - Saturating convert
i16
toi8
, and pack the values. - Saturating convert
i16
tou8
, and pack the values. - Saturating convert
i16
tou8
, and pack the values. - Saturating convert
i32
toi16
, and pack the values. - Saturating convert
i32
toi16
, and pack the values. - pack_i32_to_u16_m128i
sse4.1
Saturating converti32
tou16
, and pack the values. - Saturating convert
i32
tou16
, and pack the values. - Shuffle 128 bits of floating point data at a time from
$a
and$b
using an immediate control value. - Shuffle 128 bits of floating point data at a time from
a
andb
using an immediate control value. - Slowly swizzle 128 bits of integer data from
a
andb
using an immediate control value. - permute_m128
avx
Shuffle thef32
lanes froma
using an immediate control value. - Shuffle the
f64
lanes ina
using an immediate control value. - permute_m256
avx
Shuffle thef32
lanes ina
using an immediate control value. - Shuffle the
f64
lanes froma
together using an immediate control value. - population_count_i32
popcnt
Count the number of bits set within ani32
- population_count_i64
popcnt
Count the number of bits set within ani64
- Deposit contiguous low bits from a
u32
according to a mask. - Deposit contiguous low bits from a
u64
according to a mask. - Extract bits from a
u32
according to a mask. - Extract bits from a
u64
according to a mask. - prefetch_et0
sse
Fetches the cache line containingaddr
into all levels of the cache hierarchy, anticipating write - prefetch_et1
sse
Fetches into L2 and higher, anticipating write - prefetch_nta
sse
Fetch data using the non-temporal access (NTA) hint. It may be a place closer than main memory but outside of the cache hierarchy. This is used to reduce access latency without polluting the cache. - prefetch_t0
sse
Fetches the cache line containingaddr
into all levels of the cache hierarchy. - prefetch_t1
sse
Fetches into L2 and higher. - prefetch_t2
sse
Fetches into L3 and higher or an implementation-specific choice (e.g., L2 if there is no L3). - rdrand_u16
rdrand
Try to obtain a randomu16
from the hardware RNG. - rdrand_u32
rdrand
Try to obtain a randomu32
from the hardware RNG. - rdrand_u64
rdrand
Try to obtain a randomu64
from the hardware RNG. - rdseed_u16
rdseed
Try to obtain a randomu16
from the hardware RNG. - rdseed_u32
rdseed
Try to obtain a randomu32
from the hardware RNG. - rdseed_u64
rdseed
Try to obtain a randomu64
from the hardware RNG. - Reads the CPU’s timestamp counter value.
- Reads the CPU’s timestamp counter value and store the processor signature.
- Lanewise
1.0 / a
approximation. - Low lane
1.0 / a
approximation, other lanes unchanged. - Reciprocal of
f32
lanes. - Lanewise
1.0 / sqrt(a)
approximation. - Low lane
1.0 / sqrt(a)
approximation, other lanes unchanged. - Reciprocal of
f32
lanes. - round_m128
sse4.1
Rounds each lane in the style specified. - round_m128_s
sse4.1
Rounds$b
low as specified, other lanes use$a
. - round_m128d
sse4.1
Rounds each lane in the style specified. - round_m128d_s
sse4.1
Rounds$b
low as specified, keeps$a
high. - round_m256
avx
Rounds each lane in the style specified. - round_m256d
avx
Rounds each lane in the style specified. - Search for
needle
in `haystack, with explicit string length. - Search for
needle
in `haystack, with explicit string length. - Search for
needle
in `haystack, with implicit string length. - Search for
needle
in `haystack, with implicit string length. - set_i8_m128i
sse2
Sets the args into anm128i
, first arg is the high lane. - set_i8_m256i
avx
Seti8
args into anm256i
lane. - set_i16_m128i
sse2
Sets the args into anm128i
, first arg is the high lane. - Set
i16
args into anm256i
lane. - set_i32_m128i
sse2
Sets the args into anm128i
, first arg is the high lane. - set_i32_m128i_s
sse2
Set ani32
as the low 32-bit lane of anm128i
, other lanes blank. - Set
i32
args into anm256i
lane. - set_i64_m128i
sse2
Sets the args into anm128i
, first arg is the high lane. - set_i64_m128i_s
sse2
Set ani64
as the low 64-bit lane of anm128i
, other lanes blank. - Set
i64
args into anm256i
lane. - set_m128
sse
Sets the args into anm128
, first arg is the high lane. - Set
m128
args into anm256
. - set_m128_s
sse
Sets the args into anm128
, first arg is the high lane. - set_m128d
sse2
Sets the args into anm128d
, first arg is the high lane. - Set
m128d
args into anm256d
. - set_m128d_s
sse2
Sets the args into the low lane of am128d
. - Set
m128i
args into anm256i
. - set_m256
avx
Setf32
args into anm256
lane. - set_m256d
avx
Setf64
args into anm256d
lane. - Sets the args into an
m128i
, first arg is the low lane. - Set
i8
args into anm256i
lane. - Sets the args into an
m128i
, first arg is the low lane. - Set
i16
args into anm256i
lane. - Sets the args into an
m128i
, first arg is the low lane. - Set
i32
args into anm256i
lane. - Set
i64
args into anm256i
lane. - Sets the args into an
m128
, first arg is the low lane. - Set
m128
args into anm256
. - Sets the args into an
m128d
, first arg is the low lane. - Set
m128d
args into anm256d
. - Set
m128i
args into anm256i
. - Set
f32
args into anm256
lane. - Set
f64
args into anm256d
lane. - Splats the
i8
to all lanes of them128i
. - Sets the lowest
i8
lane of anm128i
as all lanes of anm256i
. - Splat an
i8
arg into anm256i
lane. - Splats the
i16
to all lanes of them128i
. - Sets the lowest
i16
lane of anm128i
as all lanes of anm256i
. - Splat an
i16
arg into anm256i
lane. - Splats the
i32
to all lanes of them128i
. - Sets the lowest
i32
lane of anm128i
as all lanes of anm256i
. - Splat an
i32
arg into anm256i
lane. - Splats the
i64
to both lanes of them128i
. - Sets the lowest
i64
lane of anm128i
as all lanes of anm256i
. - Splat an
i64
arg into anm256i
lane. - Splats the value to all lanes.
- Sets the lowest lane of an
m128
as all lanes of anm256
. - set_splat_m128d
sse2
Splats the args into both lanes of them128d
. - Sets the lowest lane of an
m128d
as all lanes of anm256d
. - Splat an
f32
arg into anm256
lane. - Splat an
f64
arg into anm256d
lane. - Shift all
u16
lanes to the left by thecount
in the loweru64
lane. - Lanewise
u16
shift left by the loweru64
lane ofcount
. - Shift all
u32
lanes to the left by thecount
in the loweru64
lane. - Shift all
u32
lanes left by the loweru64
lane ofcount
. - Shift all
u64
lanes to the left by thecount
in the loweru64
lane. - Shift all
u64
lanes left by the loweru64
lane ofcount
. - Shift
u32
values to the left bycount
bits. - Lanewise
u32
shift left by the matchingi32
lane incount
. - Shift
u64
values to the left bycount
bits. - Lanewise
u64
shift left by the matchingu64
lane incount
. - Shifts all
u16
lanes left by an immediate. - Shifts all
u16
lanes left by an immediate. - Shifts all
u32
lanes left by an immediate. - Shifts all
u32
lanes left by an immediate. - Shifts both
u64
lanes left by an immediate. - Shifts all
u64
lanes left by an immediate. - Shift each
i16
lane to the right by thecount
in the loweri64
lane. - Lanewise
i16
shift right by the loweri64
lane ofcount
. - Shift each
i32
lane to the right by thecount
in the loweri64
lane. - Lanewise
i32
shift right by the loweri64
lane ofcount
. - Shift each
u16
lane to the right by thecount
in the loweru64
lane. - Lanewise
u16
shift right by the loweru64
lane ofcount
. - Shift each
u32
lane to the right by thecount
in the loweru64
lane. - Lanewise
u32
shift right by the loweru64
lane ofcount
. - Shift each
u64
lane to the right by thecount
in the loweru64
lane. - Lanewise
u64
shift right by the loweru64
lane ofcount
. - Shift
i32
values to the right bycount
bits. - Lanewise
i32
shift right by the matchingi32
lane incount
. - Shift
u32
values to the left bycount
bits. - Lanewise
u32
shift right by the matchingu32
lane incount
. - Shift
u64
values to the left bycount
bits. - Lanewise
u64
shift right by the matchingi64
lane incount
. - Shifts all
i16
lanes right by an immediate. - Shifts all
i16
lanes left by an immediate. - Shifts all
i32
lanes right by an immediate. - Shifts all
i32
lanes left by an immediate. - Shifts all
u16
lanes right by an immediate. - Shifts all
u16
lanes right by an immediate. - Shifts all
u32
lanes right by an immediate. - Shifts all
u32
lanes right by an immediate. - Shifts both
u64
lanes right by an immediate. - Shifts all
u64
lanes right by an immediate. - Shuffle the
f32
lanes from$a
and$b
together using an immediate control value. - Shuffle the
f64
lanes from$a
and$b
together using an immediate control value. - Shuffle 128 bits of integer data from
$a
and$b
using an immediate control value. - Shuffle the
i32
lanes in$a
using an immediate control value. - Shuffle the
f64
lanes from$a
using an immediate control value. - Shuffle the high
i16
lanes in$a
using an immediate control value. - Shuffle the high
i16
lanes in$a
using an immediate control value. - Shuffle the low
i16
lanes in$a
using an immediate control value. - Shuffle the low
i16
lanes in$a
using an immediate control value. - Shuffle the
i32
lanes ina
using an immediate control value. - Shuffle the
f64
lanes in$a
using an immediate control value. - Shuffle
f32
values ina
usingi32
values inv
. - Shuffle
f32
values ina
usingi32
values inv
. - Shuffle
f64
lanes ina
using bit 1 of thei64
lanes inv
- Shuffle
f64
lanes ina
using bit 1 of thei64
lanes inv
. - Shuffle
i8
lanes ina
usingi8
values inv
. - Shuffle
i8
lanes ina
usingi8
values inv
. - Shuffle
f32
lanes ina
usingi32
values inv
. - Shuffle
i32
lanes ina
usingi32
values inv
. - shuffle_m256
avx
Shuffle thef32
lanes froma
andb
together using an immediate control value. - Shuffle the
f64
lanes froma
andb
together using an immediate control value. - sign_apply_i8_m128i
ssse3
Applies the sign ofi8
values inb
to the values ina
. - Lanewise
a * signum(b)
with lanes asi8
- sign_apply_i16_m128i
ssse3
Applies the sign ofi16
values inb
to the values ina
. - Lanewise
a * signum(b)
with lanes asi16
- sign_apply_i32_m128i
ssse3
Applies the sign ofi32
values inb
to the values ina
. - Lanewise
a * signum(b)
with lanes asi32
- Splat the lowest 8-bit lane across the entire 128 bits.
- Splat the lowest 16-bit lane across the entire 128 bits.
- Splat the lowest 32-bit lane across the entire 128 bits.
- Splat the lowest 64-bit lane across the entire 128 bits.
- Splat the lowest
f32
across all four lanes. - Splat the lower
f64
across both lanes ofm128d
. - Splat the 128-bits across 256-bits.
- sqrt_m128
sse
Lanewisesqrt(a)
. - sqrt_m128_s
sse
Low lanesqrt(a)
, other lanes unchanged. - sqrt_m128d
sse2
Lanewisesqrt(a)
. - sqrt_m128d_s
sse2
Low lanesqrt(b)
, upper lane is unchanged froma
. - sqrt_m256
avx
Lanewisesqrt
onf64
lanes. - sqrt_m256d
avx
Lanewisesqrt
onf64
lanes. - Stores the high lane value to the reference given.
- Stores the value to the reference given.
- store_m128
sse
Stores the value to the reference given. - store_m128_s
sse
Stores the low lane value to the reference given. - store_m128d
sse2
Stores the value to the reference given. - store_m128d_s
sse2
Stores the low lane value to the reference given. - store_m128i
sse2
Stores the value to the reference given. - store_m256
avx
Store data from a register into memory. - store_m256d
avx
Store data from a register into memory. - store_m256i
avx
Store data from a register into memory. - Stores the
i32
masked lanes given to the reference. - Stores the
i32
masked lanes given to the reference. - Stores the
i32
masked lanes given to the reference. - Stores the
i32
masked lanes given to the reference. - Store data from a register into memory according to a mask.
- Store data from a register into memory according to a mask.
- Store data from a register into memory according to a mask.
- Store data from a register into memory according to a mask.
- Stores the value to the reference given in reverse order.
- Stores the value to the reference given.
- Stores the low lane value to all lanes of the reference given.
- Stores the low lane value to all lanes of the reference given.
- Store data from a register into memory.
- Store data from a register into memory.
- Store data from a register into memory.
- Stores the value to the reference given.
- Stores the value to the reference given.
- Stores the value to the reference given.
- Store data from a register into memory.
- Store data from a register into memory.
- Store data from a register into memory.
- Subtract horizontal pairs of
i16
values, pack the outputs asa
thenb
. - Horizontal
a - b
with lanes asi16
. - Subtract horizontal pairs of
i32
values, pack the outputs asa
thenb
. - Horizontal
a - b
with lanes asi32
. - Subtract each lane horizontally, pack the outputs as
a
thenb
. - Subtract each lane horizontally, pack the outputs as
a
thenb
. - Subtract adjacent
f32
lanes. - Subtract adjacent
f64
lanes. - Subtract horizontal pairs of
i16
values, saturating, pack the outputs asa
thenb
. - Horizontal saturating
a - b
with lanes asi16
. - sub_i8_m128i
sse2
Lanewisea - b
with lanes asi8
. - sub_i8_m256i
avx2
Lanewisea - b
with lanes asi8
. - sub_i16_m128i
sse2
Lanewisea - b
with lanes asi16
. - sub_i16_m256i
avx2
Lanewisea - b
with lanes asi16
. - sub_i32_m128i
sse2
Lanewisea - b
with lanes asi32
. - sub_i32_m256i
avx2
Lanewisea - b
with lanes asi32
. - sub_i64_m128i
sse2
Lanewisea - b
with lanes asi64
. - sub_i64_m256i
avx2
Lanewisea - b
with lanes asi64
. - sub_m128
sse
Lanewisea - b
. - sub_m128_s
sse
Low lanea - b
, other lanes unchanged. - sub_m128d
sse2
Lanewisea - b
. - sub_m128d_s
sse2
Lowest lanea - b
, high lane unchanged. - sub_m256
avx
Lanewisea - b
withf32
lanes. - sub_m256d
avx
Lanewisea - b
withf64
lanes. - Lanewise saturating
a - b
with lanes asi8
. - Lanewise saturating
a - b
with lanes asi8
. - Lanewise saturating
a - b
with lanes asi16
. - Lanewise saturating
a - b
with lanes asi16
. - Lanewise saturating
a - b
with lanes asu8
. - Lanewise saturating
a - b
with lanes asu8
. - Lanewise saturating
a - b
with lanes asu16
. - Lanewise saturating
a - b
with lanes asu16
. - Compute “sum of
u8
absolute differences”. - Compute “sum of
u8
absolute differences”. - test_all_ones_m128i
sse4.1
Tests if all bits are 1. - test_all_zeroes_m128i
sse4.1
Returns if all masked bits are 0,(a & mask) as u128 == 0
- Returns if, among the masked bits, there’s both 0s and 1s
- testc_m128
avx
Compute the bitwise of sign bit NOT ofa
and then AND withb
, returns 1 if the result is zero, otherwise 0. - testc_m128d
avx
Compute the bitwise of sign bit NOT ofa
and then AND withb
, returns 1 if the result is zero, otherwise 0. - testc_m128i
sse4.1
Compute the bitwise NOT ofa
and then AND withb
, returns 1 if the result is zero, otherwise 0. - testc_m256
avx
Compute the bitwise of sign bit NOT ofa
and then AND withb
, returns 1 if the result is zero, otherwise 0. - testc_m256d
avx
Compute the bitwise of sign bit NOT ofa
and then AND withb
, returns 1 if the result is zero, otherwise 0. - testc_m256i
avx
Compute the bitwise NOT ofa
and then AND withb
, returns 1 if the result is zero, otherwise 0. - testz_m128
avx
Computes the bitwise AND of 256 bits ina
andb
, returns 1 if the result is zero, otherwise 0. - testz_m128d
avx
Computes the bitwise of sign bitAND of 256 bits ina
andb
, returns 1 if the result is zero, otherwise 0. - testz_m128i
sse4.1
Computes the bitwise AND of 256 bits ina
andb
, returns 1 if the result is zero, otherwise 0. - testz_m256
avx
Computes the bitwise AND of 256 bits ina
andb
, returns 1 if the result is zero, otherwise 0. - testz_m256d
avx
Computes the bitwise of sign bit AND of 256 bits ina
andb
, returns 1 if the result is zero, otherwise 0. - testz_m256i
avx
Computes the bitwise of sign bit AND of 256 bits ina
andb
, returns 1 if the result is zero, otherwise 0. - Counts the number of trailing zero bits in a
u32
. - Counts the number of trailing zero bits in a
u64
. - Transpose four
m128
as if they were a 4x4 matrix. - Truncate the
f32
lanes toi32
lanes. - Truncate the
f64
lanes to the loweri32
lanes (upperi32
lanes 0). - Truncate the lower lane into an
i32
. - Truncate the lower lane into an
i64
. - Unpack and interleave the high lanes.
- Unpack and interleave the high lanes.
- Unpack and interleave high
i8
lanes ofa
andb
. - Unpack and interleave high
i8
lanes ofa
andb
. - Unpack and interleave high
i16
lanes ofa
andb
. - Unpack and interleave high
i16
lanes ofa
andb
. - Unpack and interleave high
i32
lanes ofa
andb
. - Unpack and interleave high
i32
lanes ofa
andb
. - Unpack and interleave high
i64
lanes ofa
andb
. - Unpack and interleave high
i64
lanes ofa
andb
. - Unpack and interleave high lanes of
a
andb
. - Unpack and interleave high lanes of
a
andb
. - Unpack and interleave the high lanes.
- Unpack and interleave the high lanes.
- Unpack and interleave low
i8
lanes ofa
andb
. - Unpack and interleave low
i8
lanes ofa
andb
. - Unpack and interleave low
i16
lanes ofa
andb
. - Unpack and interleave low
i16
lanes ofa
andb
. - Unpack and interleave low
i32
lanes ofa
andb
. - Unpack and interleave low
i32
lanes ofa
andb
. - Unpack and interleave low
i64
lanes ofa
andb
. - Unpack and interleave low
i64
lanes ofa
andb
. - Unpack and interleave low lanes of
a
andb
. - unpack_low_m128d
sse2
Unpack and interleave low lanes ofa
andb
. - Zero extend an
m128
tom256
- Zero extend an
m128d
tom256d
- Zero extend an
m128i
tom256i
- zeroed_m128
sse
All lanes zero. - zeroed_m128d
sse2
Both lanes zero. - zeroed_m128i
sse2
All lanes zero. - zeroed_m256
avx
A zeroedm256
- zeroed_m256d
avx
A zeroedm256d
- zeroed_m256i
avx
A zeroedm256i