seahash/
lib.rs

1//! SeaHash: A blazingly fast, portable hash function with proven statistical guarantees.
2//!
3//! SeaHash is a hash function with performance better than (around 3-20% improvement) xxHash and
4//! MetroHash. Furthermore, SeaHash has mathematically provable statistical guarantees.
5//!
6//! SeaHash is a portable hash function, meaning that the output is not dependent on the hosting
7//! architecture, and makes no assumptions on endianness or the alike. This stable layout allows it
8//! to be used for on-disk/permanent storage (e.g. checksums).
9//!
10//! # Design, advantages, and features
11//!
12//! - **High quality**: It beats most other general purpose hash functions because it provides full
13//!   avalanche inbetween state updates.
14//! - **Performance**: SeaHash beats every high-quality (grading 10/10 in smhasher) hash function
15//!    that I know of.
16//! - **Provable quality guarantees**: Contrary to most other non-cryptographic hash function,
17//!   SeaHash can be proved to satisfy the avalanche criterion as well as BIC.
18//! - **Parallelizable**: Consists of multiple, independent states to take advantage of ILP and/or
19//!   software threads.
20//! - **Bulk reads**: Reads 8 or 4 bytes a time.
21//! - **Stable and portable**: Does not depend on the target architecture, and produces a stable
22//!   value, which is only changed in major version bumps.
23//! - **Keyed**: Designed to not leak the seed/key. Note that it has not gone through
24//!   cryptoanalysis yet, so the keyed version shouldn't be relied on when security is needed.
25//! - **Hardware accelerateable**: SeaHash is designed such that ASICs can implement it with really
26//!   high performance.
27//!
28//! # A word of warning!
29//!
30//! This is **not** a cryptographic function, and it certainly should not be used as one. If you
31//! want a good cryptographic hash function, you should use SHA-3 (Keccak) or BLAKE2.
32//!
33//! It is not secure, nor does it aim to be. It aims to have high quality pseudorandom output and
34//! few collisions, as well as being fast.
35//!
36//! # Benchmark
37//!
38//! On normal hardware, it is expected to run with a rate around 5.9-6.7 GB/S on a 2.5 GHz CPU.
39//! Further improvement can be seen when hashing very big buffers in parallel.
40//!
41//! | Function    | Quality       | Cycles per byte (lower is better) | Author
42//! |-------------|---------------|-----------------------------------|-------------------
43//! | **SeaHash** | **Excellent** | **0.24**                          | **Ticki**
44//! | xxHash      | Excellent     | 0.31                              | Collet
45//! | MetroHash   | Excellent     | 0.35                              | Rogers
46//! | Murmur      | Excellent     | 0.64                              | Appleby
47//! | Rabin       | Medium        | 1.51                              | Rabin
48//! | CityHash    | Excellent     | 1.62                              | Pike, Alakuijala
49//! | LoseLose    | Terrible      | 2.01                              | Kernighan, Ritchie
50//! | FNV         | Poor          | 3.12                              | Fowler, Noll, Vo
51//! | SipHash     | Pseudorandom  | 3.21                              | Aumasson, Bernstein
52//! | CRC         | Good          | 3.91                              | Peterson
53//! | DJB2        | Poor          | 4.13                              | Bernstein
54//!
55//! ## Ideal architecture
56//!
57//! SeaHash is designed and optimized for the most common architecture in use:
58//!
59//! - Little-endian
60//! - 64-bit
61//! - 64 or more bytes cache lines
62//! - 4 or more instruction pipelines
63//! - 4 or more 64-bit registers
64//!
65//! Anything that does not hold the above requirements will perform worse by up to 30-40%. Note that
66//! this means it is still faster than CityHash (~1 GB/S), MurMurHash (~2.6 GB/S), FNV (~0.5 GB/S),
67//! etc.
68//!
69//! # Achieving the performance
70//!
71//! Like any good general-purpose hash function, SeaHash reads 8 bytes at once effectively reducing
72//! the running time by an order of ~5.
73//!
74//! Secondly, SeaHash achieves the performance by heavily exploiting Instruction-Level Parallelism.
75//! In particular, it fetches 4 integers in every round and independently diffuses them. This
76//! yields four different states, which are finally combined.
77//!
78//! # Statistical guarantees
79//!
80//! SeaHash comes with certain proven guarantees about the statistical properties of the output:
81//!
82//! 1. Pick some _n_-byte sequence, _s_. The number of _n_-byte sequence colliding with _s_ is
83//!    independent of the choice of _s_ (all equivalence class have equal size).
84//! 2. If you flip any bit in the input, the probability for any bit in the output to be flipped is
85//!    0.5.
86//! 3. The hash value of a sequence of uniformly distributed bytes is itself uniformly distributed.
87//!
88//! The first guarantee can be derived through deduction, by proving that the diffusion function is
89//! bijective (reverse the XORs and find the congruence inverses to the primes).
90//!
91//! The second guarantee requires more complex calculations: Construct a matrix of probabilities
92//! and set one to certain (1), then apply transformations through the respective operations. The
93//! proof is a bit long, but relatively simple.
94//!
95//! The third guarantee requires proving that the hash value is a tree, such that:
96//! - Leafs represents the input values.
97//! - Single-child nodes reduce to the diffusion of the child.
98//! - Multiple-child nodes reduce to the sum of the children.
99//!
100//! Then simply show that each of these reductions transform uniformly distributed variables to
101//! uniformly distributed variables.
102//!
103//! # Inner workings
104//!
105//! In technical terms, SeaHash follows a alternating 4-state length-padded Merkle–Damgård
106//! construction with an XOR-diffuse compression function (click to enlarge):
107//!
108//! [![A diagram.](http://ticki.github.io/img/seahash_construction_diagram.svg)]
109//! (http://ticki.github.io/img/seahash_construction_diagram.svg)
110//!
111//! It starts with 4 initial states, then it alternates between them (increment, wrap on 4) and
112//! does XOR with the respective block. When a state has been visited the diffusion function (f) is
113//! applied. The very last block is padded with zeros.
114//!
115//! After all the blocks have been gone over, all the states are XOR'd to the number of bytes
116//! written. The sum is then passed through the diffusion function, which produces the final hash
117//! value.
118//!
119//! The diffusion function is drawn below.
120//!
121//! ```notest
122//! x ← px
123//! x ← x ⊕ ((x ≫ 32) ≫ (x ≫ 60))
124//! x ← px
125//! ```
126//!
127//! The advantage of having four completely segregated (note that there is no mix round, so they're
128//! entirely independent) states is that fast parallelism is possible. For example, if I were to
129//! hash 1 TB, I can spawn up four threads which can run independently without _any_
130//! intercommunication or synchronization before the last round.
131//!
132//! If the diffusion function (f) was cryptographically secure, it would pass cryptoanalysis
133//! trivially. This might seem irrelevant, as it clearly isn't cryptographically secure, but it
134//! tells us something about the inner semantics. In particular, any diffusion function with
135//! sufficient statistical quality will make up a good hash function in this construction.
136//!
137//! Read [the blog post](http://ticki.github.io/blog/seahash-explained/) for more details.
138//!
139//! # ASIC version
140//!
141//! SeaHash is specifically designed such that it can be efficiently implemented in the form of
142//! ASIC while only using very few transistors.
143//!
144//! # Specification
145//!
146//! See the [`reference`](./reference) module.
147//!
148//! # Credits
149//!
150//! Aside for myself (@ticki), there are couple of other people who have helped creating this.
151//! Joshua Landau suggested using the [PCG family of diffusions](http://www.pcg-random.org/),
152//! created by Melissa E. O'Neill. Sokolov Yura spotted multiple bugs in SeaHash.
153
154#![warn(missing_docs)]
155#![cfg_attr(all(not(test), not(feature = "use_std")), no_std)]
156#[cfg(all(not(test), not(feature = "use_std")))]
157extern crate core as std;
158
159pub use buffer::{hash, hash_seeded, State};
160pub use stream::SeaHasher;
161
162mod buffer;
163mod helper;
164pub mod reference;
165mod stream;
166
167#[cfg(feature = "use_std")]
168mod impl_std;