datafusion_physical_plan::sorts

Module partial_sort

Source
Expand description

Partial Sort deals with input data that partially satisfies the required sort order. Such an input data can be partitioned into segments where each segment already has the required information for lexicographic sorting so sorting can be done without loading the entire dataset.

Consider a sort plan having an input with ordering a ASC, b ASC

+---+---+---+
| a | b | d |
+---+---+---+
| 0 | 0 | 3 |
| 0 | 0 | 2 |
| 0 | 1 | 1 |
| 0 | 2 | 0 |
+---+---+---+

and required ordering for the plan is a ASC, b ASC, d ASC. The first 3 rows(segment) can be sorted as the segment already has the required information for the sort, but the last row requires further information as the input can continue with a batch with a starting row where a and b does not change as below

+---+---+---+
| a | b | d |
+---+---+---+
| 0 | 2 | 4 |
+---+---+---+

The plan concats incoming data with such last rows of previous input and continues partial sorting of the segments.

Structsยง