Struct arrow_data::ArrayData
source · pub struct ArrayData { /* private fields */ }
Expand description
A generic representation of Arrow array data which encapsulates common attributes and
operations for Arrow array. Specific operations for different arrays types (e.g.,
primitive, list, struct) are implemented in Array
.
Memory Layout
ArrayData
has references to one or more underlying data buffers
and optional child ArrayData, depending on type as illustrated
below. Bitmaps are not shown for simplicity but they are stored
similarly to the buffers.
offset
points to
┌───────────────────┐ start of ┌───────┐ Different
│ │ data │ │ ArrayData may
│ArrayData { │ │.... │ also refers to
│ data_type: ... │ ─ ─ ─ ─▶│1234 │ ┌ ─ the same
│ offset: ... ─ ─ ─│─ ┘ │4372 │ underlying
│ len: ... ─ ─ ─│─ ┐ │4888 │ │ buffer with different offset/len
│ buffers: [ │ │5882 │◀─
│ ... │ │ │4323 │
│ ] │ ─ ─ ─ ─▶│4859 │
│ child_data: [ │ │.... │
│ ... │ │ │
│ ] │ └───────┘
│} │
│ │ Shared Buffer uses
│ │ │ bytes::Bytes to hold
└───────────────────┘ actual data values
┌ ─ ─ ┘
▼
┌───────────────────┐
│ArrayData { │
│ ... │
│} │
│ │
└───────────────────┘
Child ArrayData may also have its own buffers and children
Implementations§
source§impl ArrayData
impl ArrayData
sourcepub unsafe fn new_unchecked(
data_type: DataType,
len: usize,
null_count: Option<usize>,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>
) -> Self
pub unsafe fn new_unchecked( data_type: DataType, len: usize, null_count: Option<usize>, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData> ) -> Self
Create a new ArrayData instance;
If null_count
is not specified, the number of nulls in
null_bit_buffer is calculated.
If the number of nulls is 0 then the null_bit_buffer
is set to None
.
Safety
The input values must form a valid Arrow array for
data_type
, or undefined behavior can result.
Note: This is a low level API and most users of the arrow
crate should create arrays using the methods in the array
module.
sourcepub fn try_new(
data_type: DataType,
len: usize,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>
) -> Result<Self, ArrowError>
pub fn try_new( data_type: DataType, len: usize, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData> ) -> Result<Self, ArrowError>
Create a new ArrayData, validating that the provided buffers form a valid Arrow array of the specified data type.
If the number of nulls in null_bit_buffer
is 0 then the null_bit_buffer
is set to None
.
Internally this calls through to Self::validate_data
Note: This is a low level API and most users of the arrow crate should create arrays using the builders found in arrow_array
sourcepub const fn builder(data_type: DataType) -> ArrayDataBuilder
pub const fn builder(data_type: DataType) -> ArrayDataBuilder
sourcepub fn child_data(&self) -> &[ArrayData]
pub fn child_data(&self) -> &[ArrayData]
Returns a slice of children ArrayData
. This will be non
empty for type such as lists and structs.
sourcepub fn nulls(&self) -> Option<&NullBuffer>
pub fn nulls(&self) -> Option<&NullBuffer>
Returns a reference to the null buffer of this ArrayData
if any
Note: ArrayData::offset
does NOT apply to the returned NullBuffer
sourcepub const fn len(&self) -> usize
pub const fn len(&self) -> usize
Returns the length (i.e., number of elements) of this ArrayData
.
sourcepub fn null_count(&self) -> usize
pub fn null_count(&self) -> usize
Returns the total number of nulls in this array
sourcepub fn get_buffer_memory_size(&self) -> usize
pub fn get_buffer_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied by the
buffers owned by this ArrayData
and all of its
children. (See also diagram on ArrayData
).
Note that this ArrayData
may only refer to a subset of the
data in the underlying Buffer
s (due to offset
and
length
), but the size returned includes the entire size of
the buffers.
If multiple ArrayData
s refer to the same underlying
Buffer
s they will both report the same size.
sourcepub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>
pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>
Returns the total number of the bytes of memory occupied by
the buffers by this slice of ArrayData
(See also diagram on ArrayData
).
This is approximately the number of bytes if a new
ArrayData
was formed by creating new Buffer
s with
exactly the data needed.
For example, a DataType::Int64
with 100
elements,
Self::get_slice_memory_size
would return 100 * 8 = 800
. If
the ArrayData
was then Self::slice
ed to refer to its
first 20
elements, then Self::get_slice_memory_size
on the
sliced ArrayData
would return 20 * 8 = 160
.
sourcepub fn get_array_memory_size(&self) -> usize
pub fn get_array_memory_size(&self) -> usize
Returns the total number of bytes of memory occupied
physically by this ArrayData
and all its Buffer
s and
children. (See also diagram on ArrayData
).
Equivalent to:
size_of_val(self)
+
Self::get_buffer_memory_size
+
size_of_val(child)
for all children
sourcepub fn buffer<T: ArrowNativeType>(&self, buffer: usize) -> &[T]
pub fn buffer<T: ArrowNativeType>(&self, buffer: usize) -> &[T]
Returns the buffer
as a slice of type T
starting at self.offset
Panics
This function panics if:
- the buffer is not byte-aligned with type T, or
- the datatype is
Boolean
(it corresponds to a bit-packed buffer where the offset is not applicable)
sourcepub fn new_null(data_type: &DataType, len: usize) -> Self
pub fn new_null(data_type: &DataType, len: usize) -> Self
Returns a new ArrayData
valid for data_type
containing len
null values
sourcepub fn new_empty(data_type: &DataType) -> Self
pub fn new_empty(data_type: &DataType) -> Self
Returns a new empty ArrayData valid for data_type
.
sourcepub fn align_buffers(&mut self)
pub fn align_buffers(&mut self)
Verifies that the buffers meet the minimum alignment requirements for the data type
Buffers that are not adequately aligned will be copied to a new aligned allocation
This can be useful for when interacting with data sent over IPC or FFI, that may not meet the minimum alignment requirements
sourcepub fn validate(&self) -> Result<(), ArrowError>
pub fn validate(&self) -> Result<(), ArrowError>
“cheap” validation of an ArrayData
. Ensures buffers are
sufficiently sized to store len
+ offset
total elements of
data_type
and performs other inexpensive consistency checks.
This check is “cheap” in the sense that it does not validate the contents of the buffers (e.g. that all offsets for UTF8 arrays are within the bounds of the values buffer).
See ArrayData::validate_data to validate fully the offset content and the validity of utf8 data
sourcepub fn validate_data(&self) -> Result<(), ArrowError>
pub fn validate_data(&self) -> Result<(), ArrowError>
Validate that the data contained within this ArrayData
is valid
- Null count is correct
- All offsets are valid
- All String data is valid UTF-8
- All dictionary offsets are valid
Internally this calls:
Note: this does not recurse into children, for a recursive variant
see Self::validate_full
sourcepub fn validate_full(&self) -> Result<(), ArrowError>
pub fn validate_full(&self) -> Result<(), ArrowError>
Performs a full recursive validation of this ArrayData
and all its children
This is equivalent to calling Self::validate_data
on this ArrayData
and all its children recursively
sourcepub fn validate_nulls(&self) -> Result<(), ArrowError>
pub fn validate_nulls(&self) -> Result<(), ArrowError>
sourcepub fn validate_values(&self) -> Result<(), ArrowError>
pub fn validate_values(&self) -> Result<(), ArrowError>
sourcepub fn ptr_eq(&self, other: &Self) -> bool
pub fn ptr_eq(&self, other: &Self) -> bool
Returns true if this ArrayData
is equal to other
, using pointer comparisons
to determine buffer equality. This is cheaper than PartialEq::eq
but may
return false when the arrays are logically equal
sourcepub fn into_builder(self) -> ArrayDataBuilder
pub fn into_builder(self) -> ArrayDataBuilder
Converts this ArrayData
into an ArrayDataBuilder