pub struct RepartitionExec { /* private fields */ }
Expand description
Maps N
input partitions to M
output partitions based on a
Partitioning
scheme.
§Background
DataFusion, like most other commercial systems, with the notable exception of DuckDB, uses the “Exchange Operator” based approach to parallelism which works well in practice given sufficient care in implementation.
DataFusion’s planner picks the target number of partitions and
then RepartionExec
redistributes RecordBatch
es to that number
of output partitions.
For example, given target_partitions=3
(trying to use 3 cores)
but scanning an input with 2 partitions, RepartitionExec
can be
used to get 3 even streams of RecordBatch
es
▲ ▲ ▲
│ │ │
│ │ │
│ │ │
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ GroupBy │ │ GroupBy │ │ GroupBy │
│ (Partial) │ │ (Partial) │ │ (Partial) │
└───────────────┘ └───────────────┘ └───────────────┘
▲ ▲ ▲
└──────────────────┼──────────────────┘
│
┌─────────────────────────┐
│ RepartitionExec │
│ (hash/round robin) │
└─────────────────────────┘
▲ ▲
┌───────────┘ └───────────┐
│ │
│ │
.─────────. .─────────.
,─' '─. ,─' '─.
; Input : ; Input :
: Partition 0 ; : Partition 1 ;
╲ ╱ ╲ ╱
'─. ,─' '─. ,─'
`───────' `───────'
§Output Ordering
If more than one stream is being repartitioned, the output will be some
arbitrary interleaving (and thus unordered) unless
Self::with_preserve_order
specifies otherwise.
§Footnote
The “Exchange Operator” was first described in the 1989 paper Encapsulation of parallelism in the Volcano query processing system Paper which uses the term “Exchange” for the concept of repartitioning data across threads.
Implementations§
source§impl RepartitionExec
impl RepartitionExec
sourcepub fn input(&self) -> &Arc<dyn ExecutionPlan>
pub fn input(&self) -> &Arc<dyn ExecutionPlan>
Input execution plan
sourcepub fn partitioning(&self) -> &Partitioning
pub fn partitioning(&self) -> &Partitioning
Partitioning scheme to use
sourcepub fn preserve_order(&self) -> bool
pub fn preserve_order(&self) -> bool
Get preserve_order flag of the RepartitionExecutor
true
means SortPreservingRepartitionExec
, false
means RepartitionExec
source§impl RepartitionExec
impl RepartitionExec
sourcepub fn try_new(
input: Arc<dyn ExecutionPlan>,
partitioning: Partitioning,
) -> Result<Self>
pub fn try_new( input: Arc<dyn ExecutionPlan>, partitioning: Partitioning, ) -> Result<Self>
Create a new RepartitionExec, that produces output partitioning
, and
does not preserve the order of the input (see Self::with_preserve_order
for more details)
sourcepub fn with_preserve_order(self) -> Self
pub fn with_preserve_order(self) -> Self
Specify if this reparititoning operation should preserve the order of rows from its input when producing output. Preserving order is more expensive at runtime, so should only be set if the output of this operator can take advantage of it.
If the input is not ordered, or has only one partition, this is a no op,
and the node remains a RepartitionExec
.
Trait Implementations§
source§impl Debug for RepartitionExec
impl Debug for RepartitionExec
source§impl DisplayAs for RepartitionExec
impl DisplayAs for RepartitionExec
source§impl ExecutionPlan for RepartitionExec
impl ExecutionPlan for RepartitionExec
source§fn name(&self) -> &'static str
fn name(&self) -> &'static str
source§fn properties(&self) -> &PlanProperties
fn properties(&self) -> &PlanProperties
ExecutionPlan
, such as output
ordering(s), partitioning information etc. Read moresource§fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>
fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>
ExecutionPlan
s that act as inputs to this plan.
The returned list will be empty for leaf nodes such as scans, will contain
a single value for unary nodes, or two values for binary nodes (such as
joins).source§fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>>
fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>
ExecutionPlan
where all existing children were replaced
by the children
, in ordersource§fn benefits_from_input_partitioning(&self) -> Vec<bool>
fn benefits_from_input_partitioning(&self) -> Vec<bool>
ExecutionPlan
benefits from increased
parallelization at its input for each child. Read moresource§fn maintains_input_order(&self) -> Vec<bool>
fn maintains_input_order(&self) -> Vec<bool>
false
if this ExecutionPlan
’s implementation may reorder
rows within or between partitions. Read moresource§fn execute(
&self,
partition: usize,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream>
fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>
source§fn metrics(&self) -> Option<MetricsSet>
fn metrics(&self) -> Option<MetricsSet>
Metric
s for this
ExecutionPlan
. If no Metric
s are available, return None. Read moresource§fn statistics(&self) -> Result<Statistics>
fn statistics(&self) -> Result<Statistics>
ExecutionPlan
node. If statistics are not
available, should return Statistics::new_unknown
(the default), not
an error.source§fn static_name() -> &'static strwhere
Self: Sized,
fn static_name() -> &'static strwhere
Self: Sized,
name
but can be called without an instance.source§fn required_input_distribution(&self) -> Vec<Distribution>
fn required_input_distribution(&self) -> Vec<Distribution>
ExecutionPlan
, By default it’s [Distribution::UnspecifiedDistribution] for each child,source§fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>
fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>
ExecutionPlan
. Read moresource§fn repartitioned(
&self,
_target_partitions: usize,
_config: &ConfigOptions,
) -> Result<Option<Arc<dyn ExecutionPlan>>>
fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>
ExecutionPlan
to
produce target_partitions
partitions. Read moresource§fn supports_limit_pushdown(&self) -> bool
fn supports_limit_pushdown(&self) -> bool
source§fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>
fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>
ExecutionPlan
node, if it supports
fetch limits. Returns None
otherwise.Auto Trait Implementations§
impl Freeze for RepartitionExec
impl !RefUnwindSafe for RepartitionExec
impl Send for RepartitionExec
impl Sync for RepartitionExec
impl Unpin for RepartitionExec
impl !UnwindSafe for RepartitionExec
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more