datafusion_physical_plan::coalesce

Struct BatchCoalescer

Source
pub struct BatchCoalescer { /* private fields */ }
Expand description

Concatenate multiple RecordBatches

BatchCoalescer concatenates multiple small RecordBatches, produced by operations such as FilterExec and RepartitionExec, into larger ones for more efficient processing by subsequent operations.

§Background

Generally speaking, larger RecordBatches are more efficient to process than smaller record batches (until the CPU cache is exceeded) because there is fixed processing overhead per batch. DataFusion tries to operate on batches of target_batch_size rows to amortize this overhead

┌────────────────────┐
│    RecordBatch     │
│   num_rows = 23    │
└────────────────────┘                 ┌────────────────────┐
                                       │                    │
┌────────────────────┐     Coalesce    │                    │
│                    │      Batches    │                    │
│    RecordBatch     │                 │                    │
│   num_rows = 50    │  ─ ─ ─ ─ ─ ─ ▶  │                    │
│                    │                 │    RecordBatch     │
│                    │                 │   num_rows = 106   │
└────────────────────┘                 │                    │
                                       │                    │
┌────────────────────┐                 │                    │
│                    │                 │                    │
│    RecordBatch     │                 │                    │
│   num_rows = 33    │                 └────────────────────┘
│                    │
└────────────────────┘

§Notes:

  1. Output rows are produced in the same order as the input rows

  2. The output is a sequence of batches, with all but the last being at least target_batch_size rows.

  3. Eventually this may also be able to handle other optimizations such as a combined filter/coalesce operation.

Implementations§

Source§

impl BatchCoalescer

Source

pub fn new( schema: SchemaRef, target_batch_size: usize, fetch: Option<usize>, ) -> Self

Create a new BatchCoalescer

§Arguments
  • schema - the schema of the output batches
  • target_batch_size - the minimum number of rows for each output batch (until limit reached)
  • fetch - the maximum number of rows to fetch, None means fetch all rows
Source

pub fn schema(&self) -> SchemaRef

Return the schema of the output batches

Source

pub fn push_batch(&mut self, batch: RecordBatch) -> CoalescerState

Push next batch, and returns CoalescerState indicating the current state of the buffer.

Source

pub fn is_empty(&self) -> bool

Return true if the there is no data buffered

Source

pub fn finish_batch(&mut self) -> Result<RecordBatch>

Concatenates and returns all buffered batches, and clears the buffer.

Trait Implementations§

Source§

impl Debug for BatchCoalescer

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T