Expand description
Data layouts to represent encoded data in a sub-Arrow format
These DataBlock
structures represent physical layouts. They fill a gap somewhere
between [arrow_data::data::ArrayData
] (which, as a collection of buffers, is too
generic because it doesn’t give us enough information about what those buffers represent)
and arrow_array::array::Array
(which is too specific, because it cares about the
logical data type).
In addition, the layouts represented here are slightly stricter than Arrow’s layout rules. For example, offset buffers MUST start with 0. These additional restrictions impose a slight penalty on encode (to normalize arrow data) but make the development of encoders and decoders easier (since they can rely on a normalized representation)
Structs§
- A data block with no buffers where everything is null
- A block representing the same constant value repeated many times
- A data block for dictionary encoded data
- A data block to represent a fixed size list
- A data block for a single buffer of data where each element has a fixed number of bits
- Wraps a data block and adds nullability information to it
- A data block with no regular structure. There is no available spot to attach validity / repdef information and it cannot be converted to Arrow without being decoded
- A data block representing a struct
- A data block for variable-width data (e.g. strings, packed rows, etc.)
Enums§
- A DataBlock is a collection of buffers that represents an “array” of data in very generic terms