Expand description
APIs to write to Parquet format.
§Arrow/Parquet Interoperability
As of parquet-format v2.9 there are Arrow DataTypes which do not have a parquet representation. These include but are not limited to:
ArrowDataType::Timestamp(TimeUnit::Second, _)
ArrowDataType::Int64
ArrowDataType::Duration
ArrowDataType::Date64
ArrowDataType::Time32(TimeUnit::Second)
The use of these arrow types will result in no logical type being stored within a parquet file.
Re-exports§
pub use crate::parquet::compression::BrotliLevel;
pub use crate::parquet::compression::CompressionOptions;
pub use crate::parquet::compression::GzipLevel;
pub use crate::parquet::compression::ZstdLevel;
pub use crate::parquet::encoding::Encoding;
pub use crate::parquet::metadata::Descriptor;
pub use crate::parquet::metadata::FileMetadata;
pub use crate::parquet::metadata::SchemaDescriptor;
pub use crate::parquet::page::CompressedDataPage;
pub use crate::parquet::page::CompressedPage;
pub use crate::parquet::page::Page;
pub use crate::parquet::schema::types::FieldInfo;
pub use crate::parquet::schema::types::ParquetType;
pub use crate::parquet::schema::types::PhysicalType as ParquetPhysicalType;
pub use crate::parquet::write::compress;
pub use crate::parquet::write::write_metadata_sidecar;
pub use crate::parquet::write::Compressor;
pub use crate::parquet::write::DynIter;
pub use crate::parquet::write::DynStreamingIterator;
pub use crate::parquet::write::RowGroupIterColumns;
pub use crate::parquet::write::Version;
pub use crate::parquet::fallible_streaming_iterator;
Structs§
- An interface to write a parquet to a
Write
- Wrapper struct to store key values
- An iterator adapter that converts an iterator over
RecordBatchT
into an iterator of row groups. Use it to create an iterator consumable by the parquet’s API. - The statistics to write
- Description for file metadata
- Currently supported options to write to parquet
Enums§
- Options to encode an array
- Descriptor of nested information of a field
Traits§
- A fallible, streaming iterator.
Functions§
- Returns a vector of iterators of
Page
, one per leaf column in the array - Returns an iterator of
Page
. - Get the length of
Array
that should be sliced. - return number values of the nested
- Maps a
RecordBatchT
and parquet-specific options to anRowGroupIterColumns
used to write to parquet - returns offset and length to slice the leaf values
- Convert
Array
to aVec<Box<dyn Array>>
leaves in DFS order. - Constructs the necessary
Vec<Vec<Nested>>
to write the rep and def levels ofarray
to parquet - Convert
ParquetType
toVec<ParquetPrimitiveType>
leaves in DFS order. - Creates a parquet
SchemaDescriptor
from aArrowSchema
. - Creates a
ParquetType
from aField
. - Transverses the
dtype
up to its (parquet) columns and returns a vector of items based onmap
. - writes the def levels to a
Vec<u8>
and returns it. - Write
repetition_levels
anddefinition_levels
to buffer.