A comparable row-oriented representation of a collection of [`Array`].
[`Row`]s are [normalized for sorting], and can therefore be very efficiently [compared],
using [`memcmp`] under the hood, or used in [non-comparison sorts] such as [radix sort].
This makes the row format ideal for implementing efficient multi-column sorting,
grouping, aggregation, windowing and more, as described in more detail
[in this blog post](https://arrow.apache.org/blog/2022/11/07/multi-column-sorts-in-arrow-rust-part-1/).
For example, given three input [`Array`], [`RowConverter`] creates byte
sequences that [compare] the same as when using [`lexsort`].
```text
┌─────┐ ┌─────┐ ┌─────┐
│ │ │ │ │ │
├─────┤ ┌ ┼─────┼ ─ ┼─────┼ ┐ ┏━━━━━━━━━━━━━┓
│ │ │ │ │ │ ─────────────▶┃ ┃
├─────┤ └ ┼─────┼ ─ ┼─────┼ ┘ ┗━━━━━━━━━━━━━┛
│ │ │ │ │ │
└─────┘ └─────┘ └─────┘
...
┌─────┐ ┌ ┬─────┬ ─ ┬─────┬ ┐ ┏━━━━━━━━┓
│ │ │ │ │ │ ─────────────▶┃ ┃
└─────┘ └ ┴─────┴ ─ ┴─────┴ ┘ ┗━━━━━━━━┛
UInt64 Utf8 F64
Input Arrays Row Format
(Columns)
```
_[`Rows`] must be generated by the same [`RowConverter`] for the comparison
to be meaningful._
# Basic Example
```
# use std::sync::Arc;
# use arrow_row::{RowConverter, SortField};
# use arrow_array::{ArrayRef, Int32Array, StringArray};
# use arrow_array::cast::{AsArray, as_string_array};
# use arrow_array::types::Int32Type;
# use arrow_schema::DataType;
let a1 = Arc::new(Int32Array::from_iter_values([-1, -1, 0, 3, 3])) as ArrayRef;
let a2 = Arc::new(StringArray::from_iter_values(["a", "b", "c", "d", "d"])) as ArrayRef;
let arrays = vec![a1, a2];
// Convert arrays to rows
let converter = RowConverter::new(vec![
SortField::new(DataType::Int32),
SortField::new(DataType::Utf8),
]).unwrap();
let rows = converter.convert_columns(&arrays).unwrap();
// Compare rows
for i in 0..4 {
assert!(rows.row(i) <= rows.row(i + 1));
}
assert_eq!(rows.row(3), rows.row(4));
// Convert rows back to arrays
let converted = converter.convert_rows(&rows).unwrap();
assert_eq!(arrays, converted);
// Compare rows from different arrays
let a1 = Arc::new(Int32Array::from_iter_values([3, 4])) as ArrayRef;
let a2 = Arc::new(StringArray::from_iter_values(["e", "f"])) as ArrayRef;
let arrays = vec![a1, a2];
let rows2 = converter.convert_columns(&arrays).unwrap();
assert!(rows.row(4) < rows2.row(0));
assert!(rows.row(4) < rows2.row(1));
// Convert selection of rows back to arrays
let selection = [rows.row(0), rows2.row(1), rows.row(2), rows2.row(0)];
let converted = converter.convert_rows(selection).unwrap();
let c1 = converted[0].as_primitive::();
assert_eq!(c1.values(), &[-1, 4, 0, 3]);
let c2 = converted[1].as_string::();
let c2_values: Vec<_> = c2.iter().flatten().collect();
assert_eq!(&c2_values, &["a", "f", "c", "e"]);
```
# Lexsort
The row format can also be used to implement a fast multi-column / lexicographic sort
```
# use arrow_row::{RowConverter, SortField};
# use arrow_array::{ArrayRef, UInt32Array};
fn lexsort_to_indices(arrays: &[ArrayRef]) -> UInt32Array {
let fields = arrays
.iter()
.map(|a| SortField::new(a.data_type().clone()))
.collect();
let converter = RowConverter::new(fields).unwrap();
let rows = converter.convert_columns(arrays).unwrap();
let mut sort: Vec<_> = rows.iter().enumerate().collect();
sort.sort_unstable_by(|(_, a), (_, b)| a.cmp(b));
UInt32Array::from_iter_values(sort.iter().map(|(i, _)| *i as u32))
}
```
[non-comparison sorts]: https://en.wikipedia.org/wiki/Sorting_algorithm#Non-comparison_sorts
[radix sort]: https://en.wikipedia.org/wiki/Radix_sort
[normalized for sorting]: http://wwwlgis.informatik.uni-kl.de/archiv/wwwdvs.informatik.uni-kl.de/courses/DBSREAL/SS2005/Vorlesungsunterlagen/Implementing_Sorting.pdf
[`memcmp`]: https://www.man7.org/linux/man-pages/man3/memcmp.3.html
[`lexsort`]: https://docs.rs/arrow-ord/latest/arrow_ord/sort/fn.lexsort.html
[compared]: PartialOrd
[compare]: PartialOrd