pub struct NestedLoopJoinExec { /* private fields */ }
Expand description
NestedLoopJoinExec is build-probe join operator, whose main task is to
perform joins without any equijoin conditions in ON
clause.
Execution consists of following phases:
§1. Build phase
Collecting build-side data in memory, by polling all available data from build-side input. Due to the absence of equijoin conditions, it’s not possible to partition build-side data across multiple threads of the operator, so build-side is always collected in a single batch shared across all threads. The operator always considers LEFT input as build-side input, so it’s crucial to adjust smaller input to be the LEFT one. Normally this selection is handled by physical optimizer.
§2. Probe phase
Sequentially polling batches from the probe-side input and processing them according to the following logic:
- apply join filter (
ON
clause) to Cartesian product of probe batch and build side data – filter evaluation is executed once per build-side data row - update shared bitmap of joined (“visited”) build-side row indices, if required – allows to produce unmatched build-side data in case of e.g. LEFT/FULL JOIN after probing phase completed
- perform join index alignment is required – depending on
JoinType
- produce output join batch
Probing phase is executed in parallel, according to probe-side input partitioning – one thread per partition. After probe input is exhausted, each thread ATTEMPTS to produce unmatched build-side data.
§3. Producing unmatched build-side data
Producing unmatched build-side data as an output batch, after probe input is exhausted. This step is also executed in parallel (once per probe input partition), and to avoid duplicate output of unmatched data (due to shared nature build-side data), each thread “reports” about probe phase completion (which means that “visited” bitmap won’t be updated anymore), and only the last thread, reporting about completion, will return output.
Note that the Clone
trait is not implemented for this struct due to the
left_fut
[OnceAsync
], which is used to coordinate the loading of the
left side with the processing in each output stream.
Implementations§
Source§impl NestedLoopJoinExec
impl NestedLoopJoinExec
Sourcepub fn try_new(
left: Arc<dyn ExecutionPlan>,
right: Arc<dyn ExecutionPlan>,
filter: Option<JoinFilter>,
join_type: &JoinType,
) -> Result<Self>
pub fn try_new( left: Arc<dyn ExecutionPlan>, right: Arc<dyn ExecutionPlan>, filter: Option<JoinFilter>, join_type: &JoinType, ) -> Result<Self>
Try to create a new NestedLoopJoinExec
Sourcepub fn left(&self) -> &Arc<dyn ExecutionPlan>
pub fn left(&self) -> &Arc<dyn ExecutionPlan>
left side
Sourcepub fn right(&self) -> &Arc<dyn ExecutionPlan>
pub fn right(&self) -> &Arc<dyn ExecutionPlan>
right side
Sourcepub fn filter(&self) -> Option<&JoinFilter>
pub fn filter(&self) -> Option<&JoinFilter>
Filters applied before join output
Trait Implementations§
Source§impl Debug for NestedLoopJoinExec
impl Debug for NestedLoopJoinExec
Source§impl DisplayAs for NestedLoopJoinExec
impl DisplayAs for NestedLoopJoinExec
Source§impl ExecutionPlan for NestedLoopJoinExec
impl ExecutionPlan for NestedLoopJoinExec
Source§fn name(&self) -> &'static str
fn name(&self) -> &'static str
Source§fn as_any(&self) -> &dyn Any
fn as_any(&self) -> &dyn Any
Any
so that it can be
downcast to a specific implementation.Source§fn properties(&self) -> &PlanProperties
fn properties(&self) -> &PlanProperties
ExecutionPlan
, such as output
ordering(s), partitioning information etc. Read moreSource§fn required_input_distribution(&self) -> Vec<Distribution>
fn required_input_distribution(&self) -> Vec<Distribution>
ExecutionPlan
, By default it’s [Distribution::UnspecifiedDistribution] for each child,Source§fn maintains_input_order(&self) -> Vec<bool>
fn maintains_input_order(&self) -> Vec<bool>
false
if this ExecutionPlan
’s implementation may reorder
rows within or between partitions. Read moreSource§fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>
fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>
ExecutionPlan
s that act as inputs to this plan.
The returned list will be empty for leaf nodes such as scans, will contain
a single value for unary nodes, or two values for binary nodes (such as
joins).Source§fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>>
fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>
ExecutionPlan
where all existing children were replaced
by the children
, in orderSource§fn execute(
&self,
partition: usize,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream>
fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>
Source§fn metrics(&self) -> Option<MetricsSet>
fn metrics(&self) -> Option<MetricsSet>
Metric
s for this
ExecutionPlan
. If no Metric
s are available, return None. Read moreSource§fn statistics(&self) -> Result<Statistics>
fn statistics(&self) -> Result<Statistics>
ExecutionPlan
node. If statistics are not
available, should return Statistics::new_unknown
(the default), not
an error. Read moreSource§fn static_name() -> &'static strwhere
Self: Sized,
fn static_name() -> &'static strwhere
Self: Sized,
name
but can be called without an instance.Source§fn required_input_ordering(&self) -> Vec<Option<LexRequirement>>
fn required_input_ordering(&self) -> Vec<Option<LexRequirement>>
ExecutionPlan
. Read moreSource§fn benefits_from_input_partitioning(&self) -> Vec<bool>
fn benefits_from_input_partitioning(&self) -> Vec<bool>
ExecutionPlan
benefits from increased
parallelization at its input for each child. Read moreSource§fn repartitioned(
&self,
_target_partitions: usize,
_config: &ConfigOptions,
) -> Result<Option<Arc<dyn ExecutionPlan>>>
fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>
ExecutionPlan
to
produce target_partitions
partitions. Read moreSource§fn supports_limit_pushdown(&self) -> bool
fn supports_limit_pushdown(&self) -> bool
Source§fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>
fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>
ExecutionPlan
node, if it supports
fetch limits. Returns None
otherwise.Source§fn fetch(&self) -> Option<usize>
fn fetch(&self) -> Option<usize>
None
means there is no fetch.Source§fn cardinality_effect(&self) -> CardinalityEffect
fn cardinality_effect(&self) -> CardinalityEffect
Auto Trait Implementations§
impl !Freeze for NestedLoopJoinExec
impl !RefUnwindSafe for NestedLoopJoinExec
impl Send for NestedLoopJoinExec
impl Sync for NestedLoopJoinExec
impl Unpin for NestedLoopJoinExec
impl !UnwindSafe for NestedLoopJoinExec
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more