reed_solomon_simd

Module algorithm

Source
Expand description

Algorithm documentation.

As I don’t understand algorithm fully myself, I’ll just document some parts which I do understand.

§Shard

  • Reed-Solomon GF(2^16) erasure coding works on 16-bit elements (GfElement).
  • A shard is a byte-array which is interpreted as an array of GfElement:s.

A naive implementation could e.g. require shards to be a multiple of 2 bytes and then interpret each byte-pair as low/high parts of a single GfElement:

[ low_0, high_0, low_1, high_1, ...]

However that approach isn’t good for SIMD optimizations. Instead shards are required to be a multiple of 64 bytes. In each 64-byte block first 32 bytes are low parts of 32 GfElement:s and last 32 bytes are high parts of those 32 GfElement:s.

[ low_0, low_1, ..., low_31, high_0, high_1, ..., high_31 ]

A shard then consists of one or more of these 64-byte blocks:

// -------- first 64-byte block --------- | --------- second 64-byte block ---------- | ...
[ low_0, ..., low_31, high_0, ..., high_31, low_32, ..., low_63, high_32, ..., high_63, ... ]

§Rate

Encoding and decoding both have two variations:

  • High rate refers to having more original shards than recovery shards.
    • High rate must be used when there are over 32768 original shards.
    • High rate encoding uses chunks of recovery_count.next_power_of_two() shards.
  • Low rate refers to having more recovery shards than original shards.
    • Low rate must be used when there are over 32768 recovery shards.
    • Low rate encoding uses chunks of original_count.next_power_of_two() shards.
  • Because of padding either rate can be used when there are at most 32768 original shards and at most 32768 recovery shards.
    • High rate and low rate are not 1 compatible with each other, i.e. decoding must use same rate that encoding used.
    • With multiple chunks “correct” rate is generally faster in encoding and not-slower in decoding.
    • With single chunk “wrong” rate is generally faster in decoding if original_count and recovery_count differ a lot.

§Benchmarks

  • These benchmarks are from cargo bench rate and use similar setup than main benchmarks, except with maximum possible shard loss.
original : recoveryChunksHighRateEncoderLowRateEncoderHighRateDecoderLowRateDecoder
1024 : 10241x 1024175 MiB/s176 MiB/s76 MiB/s75 MiB/s
1024 : 1025 (Low)2x 10241401534759
1025 : 1024 (High)2x 10241521326046
1024 : 2048 (Low)2x 10241571697070
2048 : 1024 (High)2x 10241671516968
1025 : 10251x 20481251264443
1025 : 2048 (Low)1x 204814414465 !!!53
2048 : 1025 (High)1x 20481441455362 !!!
2048 : 20481x 20481561577069

§Encoding

Encoding takes original shards as input and generates recovery shards.

§High rate encoding

  • Encoding is done in chunks of recovery_count.next_power_of_two() shards.
  • Original shards are split into chunks and last chunk is padded with zero-filled shards if needed.
    • In theory original shards are padded to GF_ORDER - chunk_size shards but since IFFT([0u8; x]) == [0u8; x] and xor with 0 is no-op, the chunks which contain only 0u8:s can be ignored.
  • Recovery shards fit into a single chunk which is padded with unused shards if needed.
  • Recovery chunk is generated with following algorithm
recovery_chunk = FFT(
    IFFT(original_chunk_0, skew_0) xor
    IFFT(original_chunk_1, skew_1) xor
    ...
)

This is implemented in HighRateEncoder.

§Low rate encoding

  • Encoding is done in chunks of original_count.next_power_of_two() shards.
  • Original shards fit into a single chunk which is padded with zero-filled shards if needed.
  • Recovery shards are generated in chunks and last chunk is padded with unused shards if needed.
    • In theory recovery shards are padded to GF_ORDER - chunk_size shards but chunks which contain only unused shards can be ignored.
  • Recovery chunks are generated with following algorithm
recovery_chunk_0 = FFT( IFFT(original_chunk), skew_0 )
recovery_chunk_1 = FFT( IFFT(original_chunk), skew_1 )
...

This is implemented in LowRateEncoder.

§Decoding

TODO


  1. They seem to be compatible with single chunk. However I don’t quite understand why and I don’t recommend relying on this.