Expand description
Welcome to arrow2’s documentation. Thanks for checking it out!
This is a library for efficient in-memory data operations with
Arrow in-memory format.
It is a re-write from the bottom up of the official arrow
crate with soundness
and type safety in mind.
Check out the guide for an introduction. Below is an example of some of the things you can do with it:
use std::sync::Arc;
use arrow2::array::*;
use arrow2::datatypes::{Field, DataType, Schema};
use arrow2::compute::arithmetics;
use arrow2::error::Result;
use arrow2::io::parquet::write::*;
use arrow2::chunk::Chunk;
fn main() -> Result<()> {
// declare arrays
let a = Int32Array::from(&[Some(1), None, Some(3)]);
let b = Int32Array::from(&[Some(2), None, Some(6)]);
// compute (probably the fastest implementation of a nullable op you can find out there)
let c = arithmetics::basic::mul_scalar(&a, &2);
assert_eq!(c, b);
// declare a schema with fields
let schema = Schema::from(vec![
Field::new("c1", DataType::Int32, true),
Field::new("c2", DataType::Int32, true),
]);
// declare chunk
let chunk = Chunk::new(vec![a.arced(), b.arced()]);
// write to parquet (probably the fastest implementation of writing to parquet out there)
let options = WriteOptions {
write_statistics: true,
compression: CompressionOptions::Snappy,
version: Version::V1,
data_pagesize_limit: None,
};
let row_groups = RowGroupIterator::try_new(
vec![Ok(chunk)].into_iter(),
&schema,
options,
vec![vec![Encoding::Plain], vec![Encoding::Plain]],
)?;
// anything implementing `std::io::Write` works
let mut file = vec![];
let mut writer = FileWriter::try_new(file, schema, options)?;
// Write the file.
for group in row_groups {
writer.write(group?)?;
}
let _ = writer.end(None)?;
Ok(())
}
Cargo features
This crate has a significant number of cargo features to reduce compilation
time and number of dependencies. The feature "full"
activates most
functionality, such as:
io_ipc
: to interact with the Arrow IPC formatio_ipc_compression
: to read and write compressed Arrow IPC (v2)io_csv
to read and write CSVio_json
to read and write JSONio_flight
to read and write to Arrow’s Flight protocolio_parquet
to read and write parquetio_parquet_compression
to read and write compressed parquetio_print
to write batches to formatted ASCII tablescompute
to operate on arrays (addition, sum, sort, etc.)
The feature simd
(not part of full
) produces more explicit SIMD instructions
via std::simd
, but requires the
nightly channel.
Modules
- Contains the
Array
andMutableArray
trait objects declaring arrays, as well as concrete arrays (such asUtf8Array
andMutableUtf8Array
). - Contains
Buffer
, an immutable container for all Arrow physical types (e.g. i32, f64). - contains a wide range of compute operations (e.g.
arithmetics
,aggregate
,filter
,comparison
, andsort
) - Defines
Error
, representing all errors returned by this crate. - contains FFI bindings to import and export
Array
via Arrow’s C Data Interface - mmap
io_ipc
Memory maps regions defined on the IPC format intoArray
. - Contains the declaration of
Offset
- contains the
Scalar
trait object representing individual items ofArray
s, as well as concrete implementations such asBooleanScalar
. - Conversion methods for dates and times.
- Declares
TrustedLen
. - Sealed traits and implementations to handle all physical types used in this crate.
- Misc utilities used in different places in the crate.
Macros
- with_match_primitive_without_interval_type
compute_sort
MatchPrimitiveType
to standard Rust types
Structs
Enums
- The enum
Either
with variantsLeft
andRight
is a general purpose sum type with two cases.