lance_encoding::decoder

Trait PrimitivePageDecoder

Source
pub trait PrimitivePageDecoder: Send + Sync {
    // Required method
    fn decode(&self, rows_to_skip: u64, num_rows: u64) -> Result<DataBlock>;
}
Expand description

A decoder for single-column encodings of primitive data (this includes fixed size lists of primitive data)

Physical decoders are able to decode into existing buffers for zero-copy operation.

Instances should be stateless and Send / Sync. This is because multiple decode tasks could reference the same page. For example, imagine a page covers rows 0-2000 and the decoder stream has a batch size of 1024. The decoder will be needed by both the decode task for batch 0 and the decode task for batch 1.

See crate::decoder for more information

Required Methods§

Source

fn decode(&self, rows_to_skip: u64, num_rows: u64) -> Result<DataBlock>

Decode data into buffers

This may be a simple zero-copy from a disk buffer or could involve complex decoding such as decompressing from some compressed representation.

Capacity is stored as a tuple of (num_bytes: u64, is_needed: bool). The is_needed portion only needs to be updated if the encoding has some concept of an “optional” buffer.

Encodings can have any number of input or output buffers. For example, a dictionary decoding will convert two buffers (indices + dictionary) into a single buffer

Binary decodings have two output buffers (one for values, one for offsets)

Other decodings could even expand the # of output buffers. For example, we could decode fixed size strings into variable length strings going from one input buffer to multiple output buffers.

Each Arrow data type typically has a fixed structure of buffers and the encoding chain will generally end at one of these structures. However, intermediate structures may exist which do not correspond to any Arrow type at all. For example, a bitpacking encoding will deal with buffers that have bits-per-value that is not a multiple of 8.

The primitive_array_from_buffers method has an expected buffer layout for each arrow type (order matters) and encodings that aim to decode into arrow types should respect this layout.

§Arguments
  • rows_to_skip - how many rows to skip (within the page) before decoding
  • num_rows - how many rows to decode
  • all_null - A mutable bool, set to true if a decoder determines all values are null

Implementors§