pub fn sha256_digest_block(state: &mut [u32; 8], block: &[u8])
Expand description
Process a block with the SHA-256 algorithm. (See more…)
Internally, this uses functions which resemble the new Intel SHA instruction sets, and so it’s data locality properties may improve performance. However, to benefit the most from this implementation, replace these functions with x86 intrinsics to get a possible speed boost.
§Implementation
The Sha256
algorithm is implemented with functions that resemble the new
Intel SHA instruction set extensions. These intructions fall into two categories:
message schedule calculation, and the message block 64-round digest calculation.
The schedule-related instructions allow 4 rounds to be calculated as:
use std::simd::u32x4;
use self::crypto::sha2::{
sha256msg1,
sha256msg2,
sha256load
};
fn schedule4_data(work: &mut [u32x4], w: &[u32]) {
// this is to illustrate the data order
work[0] = u32x4(w[3], w[2], w[1], w[0]);
work[1] = u32x4(w[7], w[6], w[5], w[4]);
work[2] = u32x4(w[11], w[10], w[9], w[8]);
work[3] = u32x4(w[15], w[14], w[13], w[12]);
}
fn schedule4_work(work: &mut [u32x4], t: usize) {
// this is the core expression
work[t] = sha256msg2(sha256msg1(work[t - 4], work[t - 3]) +
sha256load(work[t - 2], work[t - 1]),
work[t - 1])
}
instead of 4 rounds of:
fn schedule_work(w: &mut [u32], t: usize) {
w[t] = sigma1!(w[t - 2]) + w[t - 7] + sigma0!(w[t - 15]) + w[t - 16];
}
and the digest-related instructions allow 4 rounds to be calculated as:
use std::simd::u32x4;
use self::crypto::sha2::{K32X4,
sha256rnds2,
sha256swap
};
fn rounds4(state: &mut [u32; 8], work: &mut [u32x4], t: usize) {
let [a, b, c, d, e, f, g, h]: [u32; 8] = *state;
// this is to illustrate the data order
let mut abef = u32x4(a, b, e, f);
let mut cdgh = u32x4(c, d, g, h);
let temp = K32X4[t] + work[t];
// this is the core expression
cdgh = sha256rnds2(cdgh, abef, temp);
abef = sha256rnds2(abef, cdgh, sha256swap(temp));
*state = [abef.0, abef.1, cdgh.0, cdgh.1,
abef.2, abef.3, cdgh.2, cdgh.3];
}
instead of 4 rounds of:
fn round(state: &mut [u32; 8], w: &mut [u32], t: usize) {
let [a, b, c, mut d, e, f, g, mut h]: [u32; 8] = *state;
h += big_sigma1!(e) + choose!(e, f, g) + K32[t] + w[t]; d += h;
h += big_sigma0!(a) + majority!(a, b, c);
*state = [h, a, b, c, d, e, f, g];
}
NOTE: It is important to note, however, that these instructions are not implemented by any CPU (at the time of this writing), and so they are emulated in this library until the instructions become more common, and gain support in LLVM (and GCC, etc.).