pub struct TopK { /* private fields */ }
Expand description
Global TopK
§Background
“Top K” is a common query optimization used for queries such as “find the top 3 customers by revenue”. The (simplified) SQL for such a query might be:
SELECT customer_id, revenue FROM 'sales.csv' ORDER BY revenue DESC limit 3;
The simple plan would be:
> explain SELECT customer_id, revenue FROM sales ORDER BY revenue DESC limit 3;
+--------------+----------------------------------------+
| plan_type | plan |
+--------------+----------------------------------------+
| logical_plan | Limit: 3 |
| | Sort: revenue DESC NULLS FIRST |
| | Projection: customer_id, revenue |
| | TableScan: sales |
+--------------+----------------------------------------+
While this plan produces the correct answer, it will fully sorts the input before discarding everything other than the top 3 elements.
The same answer can be produced by simply keeping track of the top K=3 elements, reducing the total amount of required buffer memory.
§Structure
This operator tracks the top K items using a TopKHeap
.
Implementations§
Source§impl TopK
impl TopK
Sourcepub fn try_new(
partition_id: usize,
schema: Arc<Schema>,
expr: LexOrdering,
k: usize,
batch_size: usize,
runtime: Arc<RuntimeEnv>,
metrics: &ExecutionPlanMetricsSet,
partition: usize,
) -> Result<TopK, DataFusionError>
pub fn try_new( partition_id: usize, schema: Arc<Schema>, expr: LexOrdering, k: usize, batch_size: usize, runtime: Arc<RuntimeEnv>, metrics: &ExecutionPlanMetricsSet, partition: usize, ) -> Result<TopK, DataFusionError>
Create a new TopK
that stores the top k
values, as
defined by the sort expressions in expr
.
Sourcepub fn insert_batch(
&mut self,
batch: RecordBatch,
) -> Result<(), DataFusionError>
pub fn insert_batch( &mut self, batch: RecordBatch, ) -> Result<(), DataFusionError>
Insert batch
, remembering if any of its values are among
the top k seen so far.
Sourcepub fn emit(
self,
) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>
pub fn emit( self, ) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>
Returns the top k results broken into batch_size
RecordBatch
es, consuming the heap
Auto Trait Implementations§
impl Freeze for TopK
impl !RefUnwindSafe for TopK
impl Send for TopK
impl Sync for TopK
impl Unpin for TopK
impl !UnwindSafe for TopK
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more