Struct datafusion::datasource::physical_plan::FileGroupPartitioner
source · pub struct FileGroupPartitioner { /* private fields */ }
Expand description
Repartition input files into target_partitions
partitions, if total file size exceed
repartition_file_min_size
This partitions evenly by file byte range, and does not have any knowledge
of how data is laid out in specific files. The specific FileOpener
are
responsible for the actual partitioning on specific data source type. (e.g.
the CsvOpener
will read lines overlap with byte range as well as
handle boundaries to ensure all lines will be read exactly once)
§Example
For example, if there are two files A
and B
that we wish to read with 4
partitions (with 4 threads) they will be divided as follows:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
┌─────────────────┐
│ │ │ │
│ File A │
│ │ Range: 0-2MB │ │
│ │
│ └─────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌─────────────────┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range 2-4MB │ │
│ │ │ │
│ │ │ └─────────────────┘ │
│ File A (7MB) │ ────────▶ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range: 4-6MB │ │
│ │ │ │
│ │ │ └─────────────────┘ │
└─────────────────┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌─────────────────┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ File B (1MB) │ ┌─────────────────┐
│ │ │ │ File A │ │
└─────────────────┘ │ Range: 6-7MB │
│ └─────────────────┘ │
┌─────────────────┐
│ │ File B (1MB) │ │
│ │
│ └─────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
If target_partitions = 4,
divides into 4 groups
§Maintaining Order
Within each group files are read sequentially. Thus, if the overall order of tuples must be preserved, multiple files can not be mixed in the same group.
In this case, the code will split the largest files evenly into any available empty groups, but the overall distribution may not not be as even as as even as if the order did not need to be preserved.
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
┌─────────────────┐
│ │ │ │
│ File A │
│ │ Range: 0-2MB │ │
│ │
┌─────────────────┐ │ └─────────────────┘ │
│ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range 2-4MB │ │
│ File A (6MB) │ ────────▶ │ │
│ (ordered) │ │ └─────────────────┘ │
│ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range: 4-6MB │ │
└─────────────────┘ │ │
┌─────────────────┐ │ └─────────────────┘ │
│ File B (1MB) │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ (ordered) │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
└─────────────────┘ ┌─────────────────┐
│ │ File B (1MB) │ │
│ │
│ └─────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
If target_partitions = 4,
divides into 4 groups
Implementations§
source§impl FileGroupPartitioner
impl FileGroupPartitioner
sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new FileGroupPartitioner
with default values:
target_partitions = 1
repartition_file_min_size = 10MB
preserve_order_within_groups = false
sourcepub fn with_target_partitions(self, target_partitions: usize) -> Self
pub fn with_target_partitions(self, target_partitions: usize) -> Self
Set the target partitions
sourcepub fn with_repartition_file_min_size(
self,
repartition_file_min_size: usize,
) -> Self
pub fn with_repartition_file_min_size( self, repartition_file_min_size: usize, ) -> Self
Set the minimum size at which to repartition a file
sourcepub fn with_preserve_order_within_groups(
self,
preserve_order_within_groups: bool,
) -> Self
pub fn with_preserve_order_within_groups( self, preserve_order_within_groups: bool, ) -> Self
Set whether the order of tuples within a file must be preserved
sourcepub fn repartition_file_groups(
&self,
file_groups: &[Vec<PartitionedFile>],
) -> Option<Vec<Vec<PartitionedFile>>>
pub fn repartition_file_groups( &self, file_groups: &[Vec<PartitionedFile>], ) -> Option<Vec<Vec<PartitionedFile>>>
Repartition input files according to the settings on this FileGroupPartitioner
.
If no repartitioning is needed or possible, return None
.
Trait Implementations§
source§impl Clone for FileGroupPartitioner
impl Clone for FileGroupPartitioner
source§fn clone(&self) -> FileGroupPartitioner
fn clone(&self) -> FileGroupPartitioner
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for FileGroupPartitioner
impl Debug for FileGroupPartitioner
source§impl Default for FileGroupPartitioner
impl Default for FileGroupPartitioner
impl Copy for FileGroupPartitioner
Auto Trait Implementations§
impl Freeze for FileGroupPartitioner
impl RefUnwindSafe for FileGroupPartitioner
impl Send for FileGroupPartitioner
impl Sync for FileGroupPartitioner
impl Unpin for FileGroupPartitioner
impl UnwindSafe for FileGroupPartitioner
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§default unsafe fn clone_to_uninit(&self, dst: *mut T)
default unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)source§impl<T> CloneToUninit for Twhere
T: Copy,
impl<T> CloneToUninit for Twhere
T: Copy,
source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more