Struct arrow_data::ArrayData

source ·
pub struct ArrayData { /* private fields */ }
Expand description

An generic representation of Arrow array data which encapsulates common attributes and operations for Arrow array. Specific operations for different arrays types (e.g., primitive, list, struct) are implemented in Array.

Memory Layout

ArrayData has references to one or more underlying data buffers and optional child ArrayData, depending on type as illustrated below. Bitmaps are not shown for simplicity but they are stored similarly to the buffers.

                       offset
                      points to
┌───────────────────┐ start of  ┌───────┐       Different
│                   │   data    │       │     ArrayData may
│ArrayData {        │           │....   │     also refers to
│  data_type: ...   │   ─ ─ ─ ─▶│1234   │  ┌ ─  the same
│  offset: ... ─ ─ ─│─ ┘        │4372   │      underlying
│  len: ...    ─ ─ ─│─ ┐        │4888   │  │     buffer with different offset/len
│  buffers: [       │           │5882   │◀─
│    ...            │  │        │4323   │
│  ]                │   ─ ─ ─ ─▶│4859   │
│  child_data: [    │           │....   │
│    ...            │           │       │
│  ]                │           └───────┘
│}                  │
│                   │            Shared Buffer uses
│               │   │            bytes::Bytes to hold
└───────────────────┘            actual data values
          ┌ ─ ─ ┘

          ▼
┌───────────────────┐
│ArrayData {        │
│  ...              │
│}                  │
│                   │
└───────────────────┘

Child ArrayData may also have its own buffers and children

Implementations§

source§

impl ArrayData

source

pub unsafe fn new_unchecked( data_type: DataType, len: usize, null_count: Option<usize>, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData> ) -> Self

Create a new ArrayData instance;

If null_count is not specified, the number of nulls in null_bit_buffer is calculated.

If the number of nulls is 0 then the null_bit_buffer is set to None.

Safety

The input values must form a valid Arrow array for data_type, or undefined behavior can result.

Note: This is a low level API and most users of the arrow crate should create arrays using the methods in the array module.

source

pub fn try_new( data_type: DataType, len: usize, null_bit_buffer: Option<Buffer>, offset: usize, buffers: Vec<Buffer>, child_data: Vec<ArrayData> ) -> Result<Self, ArrowError>

Create a new ArrayData, validating that the provided buffers form a valid Arrow array of the specified data type.

If the number of nulls in null_bit_buffer is 0 then the null_bit_buffer is set to None.

Internally this calls through to Self::validate_data

Note: This is a low level API and most users of the arrow crate should create arrays using the builders found in arrow_array

source

pub const fn builder(data_type: DataType) -> ArrayDataBuilder

Returns a builder to construct a ArrayData instance of the same DataType

source

pub const fn data_type(&self) -> &DataType

Returns a reference to the DataType of this ArrayData

source

pub fn buffers(&self) -> &[Buffer]

Returns the Buffer storing data for this ArrayData

source

pub fn child_data(&self) -> &[ArrayData]

Returns a slice of children ArrayData. This will be non empty for type such as lists and structs.

source

pub fn is_null(&self, i: usize) -> bool

Returns whether the element at index i is null

source

pub fn nulls(&self) -> Option<&NullBuffer>

Returns a reference to the null buffer of this ArrayData if any

Note: ArrayData::offset does NOT apply to the returned NullBuffer

source

pub fn is_valid(&self, i: usize) -> bool

Returns whether the element at index i is not null

source

pub const fn len(&self) -> usize

Returns the length (i.e., number of elements) of this ArrayData.

source

pub const fn is_empty(&self) -> bool

Returns whether this ArrayData is empty

source

pub const fn offset(&self) -> usize

Returns the offset of this ArrayData

source

pub fn null_count(&self) -> usize

Returns the total number of nulls in this array

source

pub fn get_buffer_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied by the buffers owned by this ArrayData and all of its children. (See also diagram on ArrayData).

Note that this ArrayData may only refer to a subset of the data in the underlying Buffers (due to offset and length), but the size returned includes the entire size of the buffers.

If multiple ArrayDatas refer to the same underlying Buffers they will both report the same size.

source

pub fn get_slice_memory_size(&self) -> Result<usize, ArrowError>

Returns the total number of the bytes of memory occupied by the buffers by this slice of ArrayData (See also diagram on ArrayData).

This is approximately the number of bytes if a new ArrayData was formed by creating new Buffers with exactly the data needed.

For example, a DataType::Int64 with 100 elements, Self::get_slice_memory_size would return 100 * 8 = 800. If the ArrayData was then Self::sliceed to refer to its first 20 elements, then Self::get_slice_memory_size on the sliced ArrayData would return 20 * 8 = 160.

source

pub fn get_array_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied physically by this ArrayData and all its Buffers and children. (See also diagram on ArrayData).

Equivalent to: size_of_val(self) + Self::get_buffer_memory_size + size_of_val(child) for all children

source

pub fn slice(&self, offset: usize, length: usize) -> ArrayData

Creates a zero-copy slice of itself. This creates a new ArrayData pointing at the same underlying Buffers with a different offset and len

Panics

Panics if offset + length > self.len().

source

pub fn buffer<T: ArrowNativeType>(&self, buffer: usize) -> &[T]

Returns the buffer as a slice of type T starting at self.offset

Panics

This function panics if:

  • the buffer is not byte-aligned with type T, or
  • the datatype is Boolean (it corresponds to a bit-packed buffer where the offset is not applicable)
source

pub fn new_null(data_type: &DataType, len: usize) -> Self

Returns a new ArrayData valid for data_type containing len null values

source

pub fn new_empty(data_type: &DataType) -> Self

Returns a new empty ArrayData valid for data_type.

source

pub fn validate(&self) -> Result<(), ArrowError>

“cheap” validation of an ArrayData. Ensures buffers are sufficiently sized to store len + offset total elements of data_type and performs other inexpensive consistency checks.

This check is “cheap” in the sense that it does not validate the contents of the buffers (e.g. that all offsets for UTF8 arrays are within the bounds of the values buffer).

See ArrayData::validate_data to validate fully the offset content and the validity of utf8 data

source

pub fn validate_data(&self) -> Result<(), ArrowError>

Validate that the data contained within this ArrayData is valid

  1. Null count is correct
  2. All offsets are valid
  3. All String data is valid UTF-8
  4. All dictionary offsets are valid

Internally this calls:

Note: this does not recurse into children, for a recursive variant see Self::validate_full

source

pub fn validate_full(&self) -> Result<(), ArrowError>

Performs a full recursive validation of this ArrayData and all its children

This is equivalent to calling Self::validate_data on this ArrayData and all its children recursively

source

pub fn validate_nulls(&self) -> Result<(), ArrowError>

Validates the values stored within this ArrayData are valid without recursing into child ArrayData

Does not (yet) check

  1. Union type_ids are valid see #85 Validates the the null count is correct and that any nullability requirements of its children are correct
source

pub fn validate_values(&self) -> Result<(), ArrowError>

Validates the values stored within this ArrayData are valid without recursing into child ArrayData

Does not (yet) check

  1. Union type_ids are valid see #85
source

pub fn ptr_eq(&self, other: &Self) -> bool

Returns true if this ArrayData is equal to other, using pointer comparisons to determine buffer equality. This is cheaper than PartialEq::eq but may return false when the arrays are logically equal

source

pub fn into_builder(self) -> ArrayDataBuilder

Converts this ArrayData into an ArrayDataBuilder

Trait Implementations§

source§

impl Clone for ArrayData

source§

fn clone(&self) -> ArrayData

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for ArrayData

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl From<ArrayData> for ArrayDataBuilder

source§

fn from(d: ArrayData) -> Self

Converts to this type from the input type.
source§

impl PartialEq<ArrayData> for ArrayData

source§

fn eq(&self, other: &Self) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> Allocation for Twhere T: RefUnwindSafe + Send + Sync,