Expand description
APIs to write to Parquet format.
§Arrow/Parquet Interoperability
As of parquet-format v2.9 there are Arrow DataTypes which do not have a parquet representation. These include but are not limited to:
ArrowDataType::Timestamp(TimeUnit::Second, _)
ArrowDataType::Int64
ArrowDataType::Duration
ArrowDataType::Date64
ArrowDataType::Time32(TimeUnit::Second)
The use of these arrow types will result in no logical type being stored within a parquet file.
Re-exports§
pub use crate::parquet::compression::BrotliLevel;
pub use crate::parquet::compression::CompressionOptions;
pub use crate::parquet::compression::GzipLevel;
pub use crate::parquet::compression::ZstdLevel;
pub use crate::parquet::encoding::Encoding;
pub use crate::parquet::metadata::Descriptor;
pub use crate::parquet::metadata::FileMetadata;
pub use crate::parquet::metadata::SchemaDescriptor;
pub use crate::parquet::page::CompressedDataPage;
pub use crate::parquet::page::CompressedPage;
pub use crate::parquet::page::Page;
pub use crate::parquet::schema::types::FieldInfo;
pub use crate::parquet::schema::types::ParquetType;
pub use crate::parquet::schema::types::PhysicalType as ParquetPhysicalType;
pub use crate::parquet::write::compress;
pub use crate::parquet::write::write_metadata_sidecar;
pub use crate::parquet::write::Compressor;
pub use crate::parquet::write::DynIter;
pub use crate::parquet::write::DynStreamingIterator;
pub use crate::parquet::write::RowGroupIterColumns;
pub use crate::parquet::write::Version;
pub use crate::parquet::fallible_streaming_iterator;
Structs§
- File
Writer - An interface to write a parquet to a
Write
- KeyValue
- Wrapper struct to store key values
- RowGroup
Iterator - An iterator adapter that converts an iterator over
RecordBatchT
into an iterator of row groups. Use it to create an iterator consumable by the parquet’s API. - Statistics
Options - The statistics to write
- Thrift
File Metadata - Description for file metadata
- Write
Options - Currently supported options to write to parquet
Enums§
- Encode
Nullability - Options to encode an array
- Nested
- Descriptor of nested information of a field
Traits§
- Fallible
Streaming Iterator - A fallible, streaming iterator.
Functions§
- array_
to_ columns - Returns a vector of iterators of
Page
, one per leaf column in the array - array_
to_ page - Converts an
Array
to aCompressedPage
based on options, descriptor andencoding
. - array_
to_ page_ simple - Converts an
Array
to aCompressedPage
based on options, descriptor andencoding
. - array_
to_ pages - Returns an iterator of
Page
. - arrays_
to_ columns - get_
max_ length - Get the length of
Array
that should be sliced. - num_
values - return number values of the nested
- row_
group_ iter - Maps a
RecordBatchT
and parquet-specific options to anRowGroupIterColumns
used to write to parquet - slice_
nested_ leaf - returns offset and length to slice the leaf values
- slice_
parquet_ array - Slices the
Array
toBox<dyn Array>
andVec<Nested>
. - to_
leaves - Convert
Array
to aVec<Box<dyn Array>>
leaves in DFS order. - to_
nested - Constructs the necessary
Vec<Vec<Nested>>
to write the rep and def levels ofarray
to parquet - to_
parquet_ leaves - Convert
ParquetType
toVec<ParquetPrimitiveType>
leaves in DFS order. - to_
parquet_ schema - Creates a parquet
SchemaDescriptor
from aArrowSchema
. - to_
parquet_ type - Creates a
ParquetType
from aField
. - transverse
- Transverses the
dtype
up to its (parquet) columns and returns a vector of items based onmap
. - write_
def_ levels - writes the def levels to a
Vec<u8>
and returns it. - write_
rep_ and_ def - Write
repetition_levels
anddefinition_levels
to buffer.