Function arrow_ord::partition::partition

source ·
pub fn partition(columns: &[ArrayRef]) -> Result<Partitions, ArrowError>
Expand description

Given a list of lexicographically sorted columns, computes the Partitions, where a partition consists of the set of consecutive rows with equal values

Returns an error if no columns are specified or all columns do not have the same number of rows.

Example:

For example, given columns x, y and z, calling lexicographical_partition_ranges(values, (x, y)) will divide the rows into ranges where the values of (x, y) are equal:

┌ ─ ┬───┬ ─ ─┌───┐─ ─ ┬───┬ ─ ─ ┐
    │ 1 │    │ 1 │    │ A │        Range: 0..1 (x=1, y=1)
├ ─ ┼───┼ ─ ─├───┤─ ─ ┼───┼ ─ ─ ┤
    │ 1 │    │ 2 │    │ B │
│   ├───┤    ├───┤    ├───┤     │
    │ 1 │    │ 2 │    │ C │        Range: 1..4 (x=1, y=2)
│   ├───┤    ├───┤    ├───┤     │
    │ 1 │    │ 2 │    │ D │
├ ─ ┼───┼ ─ ─├───┤─ ─ ┼───┼ ─ ─ ┤
    │ 2 │    │ 1 │    │ E │        Range: 4..5 (x=2, y=1)
├ ─ ┼───┼ ─ ─├───┤─ ─ ┼───┼ ─ ─ ┤
    │ 3 │    │ 1 │    │ F │        Range: 5..6 (x=3, y=1)
└ ─ ┴───┴ ─ ─└───┘─ ─ ┴───┴ ─ ─ ┘

      x        y        z     partition(&[x, y])

Example Code

let batch = RecordBatch::try_from_iter(vec![
    ("x", Arc::new(Int64Array::from(vec![1, 1, 1, 1, 2, 3])) as ArrayRef),
    ("y", Arc::new(Int64Array::from(vec![1, 2, 2, 2, 1, 1])) as ArrayRef),
    ("z", Arc::new(StringArray::from(vec!["A", "B", "C", "D", "E", "F"])) as ArrayRef),
]).unwrap();

// Partition on first two columns
let ranges = partition(&batch.columns()[..2]).unwrap().ranges();

let expected = vec![
    (0..1),
    (1..4),
    (4..5),
    (5..6),
];

assert_eq!(ranges, expected);