polars_parquet::arrow

Module write

Source
Expand description

APIs to write to Parquet format.

§Arrow/Parquet Interoperability

As of parquet-format v2.9 there are Arrow DataTypes which do not have a parquet representation. These include but are not limited to:

  • ArrowDataType::Timestamp(TimeUnit::Second, _)
  • ArrowDataType::Int64
  • ArrowDataType::Duration
  • ArrowDataType::Date64
  • ArrowDataType::Time32(TimeUnit::Second)

The use of these arrow types will result in no logical type being stored within a parquet file.

Re-exports§

pub use crate::parquet::compression::BrotliLevel;
pub use crate::parquet::compression::CompressionOptions;
pub use crate::parquet::compression::GzipLevel;
pub use crate::parquet::compression::ZstdLevel;
pub use crate::parquet::encoding::Encoding;
pub use crate::parquet::metadata::Descriptor;
pub use crate::parquet::metadata::FileMetadata;
pub use crate::parquet::metadata::SchemaDescriptor;
pub use crate::parquet::page::CompressedDataPage;
pub use crate::parquet::page::CompressedPage;
pub use crate::parquet::page::Page;
pub use crate::parquet::schema::types::FieldInfo;
pub use crate::parquet::schema::types::ParquetType;
pub use crate::parquet::schema::types::PhysicalType as ParquetPhysicalType;
pub use crate::parquet::write::compress;
pub use crate::parquet::write::write_metadata_sidecar;
pub use crate::parquet::write::Compressor;
pub use crate::parquet::write::DynIter;
pub use crate::parquet::write::DynStreamingIterator;
pub use crate::parquet::write::RowGroupIterColumns;
pub use crate::parquet::write::Version;
pub use crate::parquet::fallible_streaming_iterator;

Structs§

FileWriter
An interface to write a parquet to a Write
KeyValue
Wrapper struct to store key values
RowGroupIterator
An iterator adapter that converts an iterator over RecordBatchT into an iterator of row groups. Use it to create an iterator consumable by the parquet’s API.
StatisticsOptions
The statistics to write
ThriftFileMetadata
Description for file metadata
WriteOptions
Currently supported options to write to parquet

Enums§

EncodeNullability
Options to encode an array
Nested
Descriptor of nested information of a field

Traits§

FallibleStreamingIterator
A fallible, streaming iterator.

Functions§

array_to_columns
Returns a vector of iterators of Page, one per leaf column in the array
array_to_page
Converts an Array to a CompressedPage based on options, descriptor and encoding.
array_to_page_simple
Converts an Array to a CompressedPage based on options, descriptor and encoding.
array_to_pages
Returns an iterator of Page.
arrays_to_columns
get_max_length
Get the length of Array that should be sliced.
num_values
return number values of the nested
row_group_iter
Maps a RecordBatchT and parquet-specific options to an RowGroupIterColumns used to write to parquet
slice_nested_leaf
returns offset and length to slice the leaf values
slice_parquet_array
Slices the Array to Box<dyn Array> and Vec<Nested>.
to_leaves
Convert Array to a Vec<Box<dyn Array>> leaves in DFS order.
to_nested
Constructs the necessary Vec<Vec<Nested>> to write the rep and def levels of array to parquet
to_parquet_leaves
Convert ParquetType to Vec<ParquetPrimitiveType> leaves in DFS order.
to_parquet_schema
Creates a parquet SchemaDescriptor from a ArrowSchema.
to_parquet_type
Creates a ParquetType from a Field.
transverse
Transverses the dtype up to its (parquet) columns and returns a vector of items based on map.
write_def_levels
writes the def levels to a Vec<u8> and returns it.
write_rep_and_def
Write repetition_levels and definition_levels to buffer.