lance_index::vector::ivf

Module shuffler

Source
Expand description

Disk-based shuffle a stream of RecordBatch into each IVF partition.

  1. write the entire stream to a file
  2. count the number of rows in each partition
  3. read the data back into memory and shuffle into grouped vectors

Problems for the future:

  1. while groupby column will stay the same, we may want to include extra data columns in the future
  2. shuffling into memory is fast but we should add disk buffer to support bigger datasets

Structs§

Functions§