Module arrow_array::builder

source ·
Expand description

Defines push-based APIs for constructing arrays

§Basic Usage

Builders can be used to build simple, non-nested arrays

let mut a = Int32Builder::new();
a.append_value(1);
a.append_null();
a.append_value(2);
let a = a.finish();

assert_eq!(a, PrimitiveArray::from(vec![Some(1), None, Some(2)]));
let mut a = StringBuilder::new();
a.append_value("foo");
a.append_value("bar");
a.append_null();
let a = a.finish();

assert_eq!(a, StringArray::from_iter([Some("foo"), Some("bar"), None]));

§Nested Usage

Builders can also be used to build more complex nested arrays, such as lists

let mut a = ListBuilder::new(Int32Builder::new());
// [1, 2]
a.values().append_value(1);
a.values().append_value(2);
a.append(true);
// null
a.append(false);
// []
a.append(true);
// [3, null]
a.values().append_value(3);
a.values().append_null();
a.append(true);

// [[1, 2], null, [], [3, null]]
let a = a.finish();

assert_eq!(a, ListArray::from_iter_primitive::<Int32Type, _, _>([
    Some(vec![Some(1), Some(2)]),
    None,
    Some(vec![]),
    Some(vec![Some(3), None])]
))

§Custom Builders

It is common to have a collection of statically defined Rust types that you want to convert to Arrow arrays.

An example of doing so is below

/// A custom row representation
struct MyRow {
    i32: i32,
    optional_i32: Option<i32>,
    string: Option<String>,
    i32_list: Option<Vec<Option<i32>>>,
}

/// Converts `Vec<Row>` into `StructArray`
#[derive(Debug, Default)]
struct MyRowBuilder {
    i32: Int32Builder,
    string: StringBuilder,
    i32_list: ListBuilder<Int32Builder>,
}

impl MyRowBuilder {
    fn append(&mut self, row: &MyRow) {
        self.i32.append_value(row.i32);
        self.string.append_option(row.string.as_ref());
        self.i32_list.append_option(row.i32_list.as_ref().map(|x| x.iter().copied()));
    }

    /// Note: returns StructArray to allow nesting within another array if desired
    fn finish(&mut self) -> StructArray {
        let i32 = Arc::new(self.i32.finish()) as ArrayRef;
        let i32_field = Arc::new(Field::new("i32", DataType::Int32, false));

        let string = Arc::new(self.string.finish()) as ArrayRef;
        let string_field = Arc::new(Field::new("i32", DataType::Utf8, false));

        let i32_list = Arc::new(self.i32_list.finish()) as ArrayRef;
        let value_field = Arc::new(Field::new("item", DataType::Int32, true));
        let i32_list_field = Arc::new(Field::new("i32_list", DataType::List(value_field), true));

        StructArray::from(vec![
            (i32_field, i32),
            (string_field, string),
            (i32_list_field, i32_list),
        ])
    }
}

impl<'a> Extend<&'a MyRow> for MyRowBuilder {
    fn extend<T: IntoIterator<Item = &'a MyRow>>(&mut self, iter: T) {
        iter.into_iter().for_each(|row| self.append(row));
    }
}

/// Converts a slice of [`MyRow`] to a [`RecordBatch`]
fn rows_to_batch(rows: &[MyRow]) -> RecordBatch {
    let mut builder = MyRowBuilder::default();
    builder.extend(rows);
    RecordBatch::from(&builder.finish())
}

Structs§

Traits§

  • Trait for dealing with different array builders at runtime

Functions§

  • Returns a builder with capacity capacity that corresponds to the datatype DataType This function is useful to construct arrays from an arbitrary vectors with known/expected schema.
  • Create a view based on the given data, block id and offset Note that the code below is carefully examined with x86_64 assembly code: https://godbolt.org/z/685YPsd5G The goal is to avoid calling into ptr::copy_non_interleave, which makes function call (i.e., not inlined), which slows down things.

Type Aliases§