pub trait FieldEncoder: Send {
// Required methods
fn maybe_encode(
&mut self,
array: ArrayRef,
external_buffers: &mut OutOfLineBuffers,
repdef: RepDefBuilder,
row_number: u64,
) -> Result<Vec<EncodeTask>>;
fn flush(
&mut self,
external_buffers: &mut OutOfLineBuffers,
) -> Result<Vec<EncodeTask>>;
fn finish(
&mut self,
external_buffers: &mut OutOfLineBuffers,
) -> BoxFuture<'_, Result<Vec<EncodedColumn>>>;
fn num_columns(&self) -> u32;
}
Expand description
Top level encoding trait to code any Arrow array type into one or more pages.
The field encoder implements buffering and encoding of a single input column but it may map to multiple output columns. For example, a list array or struct array will be encoded into multiple columns.
Also, fields may be encoded at different speeds. For example, given a struct column with three fields (a boolean field, an int32 field, and a 4096-dimension tensor field) the tensor field is likely to emit encoded pages much more frequently than the boolean field.
Required Methods§
Sourcefn maybe_encode(
&mut self,
array: ArrayRef,
external_buffers: &mut OutOfLineBuffers,
repdef: RepDefBuilder,
row_number: u64,
) -> Result<Vec<EncodeTask>>
fn maybe_encode( &mut self, array: ArrayRef, external_buffers: &mut OutOfLineBuffers, repdef: RepDefBuilder, row_number: u64, ) -> Result<Vec<EncodeTask>>
Buffer the data and, if there is enough data in the buffer to form a page, return an encoding task to encode the data.
This may return more than one task because a single column may be mapped to multiple output columns. For example, if encoding a struct column with three children then up to three tasks may be returned from each call to maybe_encode.
It may also return multiple tasks for a single column if the input array is larger than a single disk page.
It could also return an empty Vec if there is not enough data yet to encode any pages.
Sourcefn flush(
&mut self,
external_buffers: &mut OutOfLineBuffers,
) -> Result<Vec<EncodeTask>>
fn flush( &mut self, external_buffers: &mut OutOfLineBuffers, ) -> Result<Vec<EncodeTask>>
Flush any remaining data from the buffers into encoding tasks
Each encode task produces a single page. The order of these pages will be maintained in the file (we do not worry about order between columns but all pages in the same column should maintain order)
This may be called intermittently throughout encoding but will always be called once at the end of encoding just before calling finish
Sourcefn finish(
&mut self,
external_buffers: &mut OutOfLineBuffers,
) -> BoxFuture<'_, Result<Vec<EncodedColumn>>>
fn finish( &mut self, external_buffers: &mut OutOfLineBuffers, ) -> BoxFuture<'_, Result<Vec<EncodedColumn>>>
Finish encoding and return column metadata
This is called only once, after all encode tasks have completed
This returns a Vec because a single field may have created multiple columns
Sourcefn num_columns(&self) -> u32
fn num_columns(&self) -> u32
The number of output columns this encoding will create