lance_encoding::decoder

Trait FieldScheduler

Source
pub trait FieldScheduler:
    Send
    + Sync
    + Debug {
    // Required methods
    fn initialize<'a>(
        &'a self,
        filter: &'a FilterExpression,
        context: &'a SchedulerContext,
    ) -> BoxFuture<'a, Result<()>>;
    fn schedule_ranges<'a>(
        &'a self,
        ranges: &[Range<u64>],
        filter: &FilterExpression,
    ) -> Result<Box<dyn SchedulingJob + 'a>>;
    fn num_rows(&self) -> u64;
}
Expand description

A scheduler for a field’s worth of data

Each field in a reader’s output schema maps to one field scheduler. This scheduler may map to more than one column. For example, one field of struct data may cover many columns of child data. In fact, the entire file is treated as one top-level struct field.

The scheduler is responsible for calculating the necessary I/O. One schedule_range request could trigger multiple batches of I/O across multiple columns. The scheduler should emit decoders into the sink as quickly as possible.

As soon as the scheduler encounters a batch of data that can decoded then the scheduler should emit a decoder in the “unloaded” state. The decode stream will pull the decoder and start decoding.

The order in which decoders are emitted is important. Pages should be emitted in row-major order allowing decode of complete rows as quickly as possible.

The FieldScheduler should be stateless and Send and Sync. This is because it might need to be shared. For example, a list page has a reference to the field schedulers for its items column. This is shared with the follow-up I/O task created when the offsets are loaded.

See crate::decoder for more information

Required Methods§

Source

fn initialize<'a>( &'a self, filter: &'a FilterExpression, context: &'a SchedulerContext, ) -> BoxFuture<'a, Result<()>>

Called at the beginning of scheduling to initialize the scheduler

Source

fn schedule_ranges<'a>( &'a self, ranges: &[Range<u64>], filter: &FilterExpression, ) -> Result<Box<dyn SchedulingJob + 'a>>

Schedules I/O for the requested portions of the field.

Note: ranges must be ordered and non-overlapping TODO: Support unordered or overlapping ranges in file scheduler

Source

fn num_rows(&self) -> u64

The number of rows in this field

Implementors§