Struct SortMergeJoinExec

Source

pub struct SortMergeJoinExec {
    pub left: Arc<dyn ExecutionPlan>,
    pub right: Arc<dyn ExecutionPlan>,
    pub on: JoinOn,
    pub filter: Option<JoinFilter>,
    pub join_type: JoinType,
    pub sort_options: Vec<SortOptions>,
    pub null_equals_null: bool,
    /* private fields */
}

Expand description

Join execution plan that executes equi-join predicates on multiple partitions using Sort-Merge join algorithm and applies an optional filter post join. Can be used to join arbitrarily large inputs where one or both of the inputs don’t fit in the available memory.

§Join Expressions

Equi-join predicate (e.g. <col1> = <col2>) expressions are represented by Self::on.

Non-equality predicates, which can not be pushed down to join inputs (e.g. <col1> != <col2>) are known as “filter expressions” and are evaluated after the equijoin predicates. They are represented by Self::filter. These are optional expressions.

§Sorting

Assumes that both the left and right input to the join are pre-sorted. It is not the responsibility of this execution plan to sort the inputs.

§“Streamed” vs “Buffered”

The number of record batches of streamed input currently present in the memory will depend on the output batch size of the execution plan. There is no spilling support for streamed input. The comparisons are performed from values of join keys in streamed input with the values of join keys in buffered input. One row in streamed record batch could be matched with multiple rows in buffered input batches. The streamed input is managed through the states in StreamedState and streamed input batches are represented by StreamedBatch.

Buffered input is buffered for all record batches having the same value of join key. If the memory limit increases beyond the specified value and spilling is enabled, buffered batches could be spilled to disk. If spilling is disabled, the execution will fail under the same conditions. Multiple record batches of buffered could currently reside in memory/disk during the execution. The number of buffered batches residing in memory/disk depends on the number of rows of buffered input having the same value of join key as that of streamed input rows currently present in memory. Due to pre-sorted inputs, the algorithm understands when it is not needed anymore, and releases the buffered batches from memory/disk. The buffered input is managed through the states in BufferedState and buffered input batches are represented by BufferedBatch.

Depending on the type of join, left or right input may be selected as streamed or buffered respectively. For example, in a left-outer join, the left execution plan will be selected as streamed input while in a right-outer join, the right execution plan will be selected as the streamed input.

Reference for the algorithm: https://en.wikipedia.org/wiki/Sort-merge_join.

Helpful short video demonstration: https://www.youtube.com/watch?v=jiWCPJtDE2c.

Fields§

§left: Arc<dyn ExecutionPlan>

Left sorted joining execution plan

§right: Arc<dyn ExecutionPlan>

Right sorting joining execution plan

§on: JoinOn

Set of common columns used to join on

§filter: Option<JoinFilter>

Filters which are applied while finding matching rows

§join_type: JoinType

How the join is performed

§sort_options: Vec<SortOptions>

Sort options of join columns used in sorting left and right execution plans

§null_equals_null: bool

If null_equals_null is true, null == null else null != null

Struct SortMergeJoinExecCopy item path

§Join Expressions

§Sorting

§“Streamed” vs “Buffered”

Fields§

Implementations§

impl SortMergeJoinExec

pub fn try_new( left: Arc<dyn ExecutionPlan>, right: Arc<dyn ExecutionPlan>, on: JoinOn, filter: Option<JoinFilter>, join_type: JoinType, sort_options: Vec<SortOptions>, null_equals_null: bool, ) -> Result<Self>

§Error

pub fn probe_side(join_type: &JoinType) -> JoinSide

pub fn on(&self) -> &[(PhysicalExprRef, PhysicalExprRef)]

pub fn right(&self) -> &Arc<dyn ExecutionPlan>

pub fn join_type(&self) -> JoinType

pub fn left(&self) -> &Arc<dyn ExecutionPlan>

pub fn filter(&self) -> &Option<JoinFilter>

pub fn sort_options(&self) -> &[SortOptions]

pub fn null_equals_null(&self) -> bool

pub fn swap_inputs(&self) -> Result<Arc<dyn ExecutionPlan>>

Trait Implementations§

impl Clone for SortMergeJoinExec

fn clone(&self) -> SortMergeJoinExec

fn clone_from(&mut self, source: &Self)

impl Debug for SortMergeJoinExec

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl DisplayAs for SortMergeJoinExec

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result

impl ExecutionPlan for SortMergeJoinExec

fn try_swapping_with_projection( &self, projection: &ProjectionExec, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

fn name(&self) -> &'static str

fn as_any(&self) -> &dyn Any

fn properties(&self) -> &PlanProperties

fn required_input_distribution(&self) -> Vec<Distribution>

fn required_input_ordering(&self) -> Vec<Option<LexRequirement>>

fn maintains_input_order(&self) -> Vec<bool>

fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>

fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>

fn metrics(&self) -> Option<MetricsSet>

fn statistics(&self) -> Result<Statistics>

fn static_name() -> &'static strwhere Self: Sized,

fn schema(&self) -> SchemaRef

fn check_invariants(&self, _check: InvariantLevel) -> Result<()>

fn benefits_from_input_partitioning(&self) -> Vec<bool>

fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

fn supports_limit_pushdown(&self) -> bool

fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>

fn fetch(&self) -> Option<usize>

fn cardinality_effect(&self) -> CardinalityEffect

Auto Trait Implementations§

impl Freeze for SortMergeJoinExec

impl !RefUnwindSafe for SortMergeJoinExec

impl Send for SortMergeJoinExec

impl Sync for SortMergeJoinExec

impl Unpin for SortMergeJoinExec

impl !UnwindSafe for SortMergeJoinExec

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

Struct SortMergeJoinExec

fn static_name() -> &'static str
where Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

impl<T> ErasedDestructor for T
where T: 'static,