no_std_compat::arch::x86_64

Function _mm_sfence

1.27.0 · Source
pub unsafe fn _mm_sfence()
Available on x86-64 only.
Expand description

Performs a serializing operation on all non-temporal (“streaming”) store instructions that were issued by the current thread prior to this instruction.

Guarantees that every non-temporal store instruction that precedes this fence, in program order, is ordered before any load or store instruction which follows the fence in synchronization order.

Intel’s documentation (but note that Intel is only documenting the hardware-level concerns related to this instruction; the Intel documentation does not take into account the extra concerns that arise because the Rust memory model is different from the x86 memory model.)

§Safety of non-temporal stores

After using any non-temporal store intrinsic, but before any other access to the memory that the intrinsic mutates, a call to _mm_sfence must be performed on the thread that used the intrinsic.

Non-temporal stores behave very different from regular stores. For the purpose of the Rust memory model, these stores are happening asynchronously in a background thread. This means a non-temporal store can cause data races with other accesses, even other accesses on the same thread. It also means that cross-thread synchronization does not work as expected: let’s say the intrinsic is called on thread T1, and T1 performs synchronization with some other thread T2. The non-temporal store acts as if it happened not in T1 but in a different thread T3, and T2 has not synchronized with T3! Calling _mm_sfence makes the current thread wait for and synchronize with all the non-temporal stores previously started on this thread, which means in particular that subsequent synchronization with other threads will then work as intended again.

The general pattern to use non-temporal stores correctly is to call _mm_sfence before your code jumps back to code outside your library. This ensures all stores inside your function are synchronized-before the return, and thus transitively synchronized-before everything the caller does after your function returns.