Struct arrow_array::array::DictionaryArray
source · pub struct DictionaryArray<K: ArrowDictionaryKeyType> { /* private fields */ }
Expand description
An array of dictionary encoded values
This is mostly used to represent strings or a limited set of primitive types as integers, for example when doing NLP analysis or representing chromosomes by name.
DictionaryArray
are represented using a keys
array and a
values
array, which may be different lengths. The keys
array
stores indexes in the values
array which holds
the corresponding logical value, as shown here:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌─────────────────┐ ┌─────────┐ │ ┌─────────────────┐
│ │ A │ │ 0 │ │ A │ values[keys[0]]
├─────────────────┤ ├─────────┤ │ ├─────────────────┤
│ │ D │ │ 2 │ │ B │ values[keys[1]]
├─────────────────┤ ├─────────┤ │ ├─────────────────┤
│ │ B │ │ 2 │ │ B │ values[keys[2]]
└─────────────────┘ ├─────────┤ │ ├─────────────────┤
│ │ 1 │ │ D │ values[keys[3]]
├─────────┤ │ ├─────────────────┤
│ │ 1 │ │ D │ values[keys[4]]
├─────────┤ │ ├─────────────────┤
│ │ 0 │ │ A │ values[keys[5]]
└─────────┘ │ └─────────────────┘
│ values keys
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
Logical array
Contents
DictionaryArray
length = 6
Example: From Nullable Data
let test = vec!["a", "a", "b", "c"];
let array : DictionaryArray<Int8Type> = test.iter().map(|&x| if x == "b" {None} else {Some(x)}).collect();
assert_eq!(array.keys(), &Int8Array::from(vec![Some(0), Some(0), None, Some(1)]));
Example: From Non-Nullable Data
let test = vec!["a", "a", "b", "c"];
let array : DictionaryArray<Int8Type> = test.into_iter().collect();
assert_eq!(array.keys(), &Int8Array::from(vec![0, 0, 1, 2]));
Example: From Existing Arrays
// You can form your own DictionaryArray by providing the
// values (dictionary) and keys (indexes into the dictionary):
let values = StringArray::from_iter_values(["a", "b", "c"]);
let keys = Int8Array::from_iter_values([0, 0, 1, 2]);
let array = DictionaryArray::<Int8Type>::try_new(keys, Arc::new(values)).unwrap();
let expected: DictionaryArray::<Int8Type> = vec!["a", "a", "b", "c"].into_iter().collect();
assert_eq!(&array, &expected);
Example: Using Builder
let mut builder = StringDictionaryBuilder::<Int32Type>::new();
builder.append_value("a");
builder.append_null();
builder.append_value("a");
builder.append_value("b");
let array = builder.finish();
let values: Vec<_> = array.downcast_dict::<StringArray>().unwrap().into_iter().collect();
assert_eq!(&values, &[Some("a"), None, Some("a"), Some("b")]);
Implementations§
source§impl<K: ArrowDictionaryKeyType> DictionaryArray<K>
impl<K: ArrowDictionaryKeyType> DictionaryArray<K>
sourcepub fn new(keys: PrimitiveArray<K>, values: ArrayRef) -> Self
pub fn new(keys: PrimitiveArray<K>, values: ArrayRef) -> Self
Attempt to create a new DictionaryArray with a specified keys (indexes into the dictionary) and values (dictionary) array.
Panics
Panics if Self::try_new
returns an error
sourcepub fn try_new(
keys: PrimitiveArray<K>,
values: ArrayRef
) -> Result<Self, ArrowError>
pub fn try_new( keys: PrimitiveArray<K>, values: ArrayRef ) -> Result<Self, ArrowError>
Attempt to create a new DictionaryArray with a specified keys (indexes into the dictionary) and values (dictionary) array.
Errors
Returns an error if any keys[i] >= values.len() || keys[i] < 0
sourcepub unsafe fn new_unchecked(keys: PrimitiveArray<K>, values: ArrayRef) -> Self
pub unsafe fn new_unchecked(keys: PrimitiveArray<K>, values: ArrayRef) -> Self
Create a new DictionaryArray
without performing validation
Safety
Safe provided Self::try_new
would not return an error
sourcepub fn into_parts(self) -> (PrimitiveArray<K>, ArrayRef)
pub fn into_parts(self) -> (PrimitiveArray<K>, ArrayRef)
Deconstruct this array into its constituent parts
sourcepub fn keys(&self) -> &PrimitiveArray<K>
pub fn keys(&self) -> &PrimitiveArray<K>
Return an array view of the keys of this dictionary as a PrimitiveArray.
sourcepub fn lookup_key(&self, value: &str) -> Option<K::Native>
pub fn lookup_key(&self, value: &str) -> Option<K::Native>
If value
is present in values
(aka the dictionary),
returns the corresponding key (index into the values
array). Otherwise returns None
.
Panics if values
is not a StringArray
.
sourcepub fn value_type(&self) -> DataType
pub fn value_type(&self) -> DataType
Returns a clone of the value type of this list.
sourcepub fn is_ordered(&self) -> bool
pub fn is_ordered(&self) -> bool
Currently exists for compatibility purposes with Arrow IPC.
sourcepub fn keys_iter(&self) -> impl Iterator<Item = Option<usize>> + '_
pub fn keys_iter(&self) -> impl Iterator<Item = Option<usize>> + '_
Return an iterator over the keys (indexes into the dictionary)
sourcepub fn key(&self, i: usize) -> Option<usize>
pub fn key(&self, i: usize) -> Option<usize>
Return the value of keys
(the dictionary key) at index i
,
cast to usize
, None
if the value at i
is NULL
.
sourcepub fn slice(&self, offset: usize, length: usize) -> Self
pub fn slice(&self, offset: usize, length: usize) -> Self
Returns a zero-copy slice of this array with the indicated offset and length.
sourcepub fn downcast_dict<V: 'static>(
&self
) -> Option<TypedDictionaryArray<'_, K, V>>
pub fn downcast_dict<V: 'static>( &self ) -> Option<TypedDictionaryArray<'_, K, V>>
Downcast this dictionary to a TypedDictionaryArray
use arrow_array::{Array, ArrayAccessor, DictionaryArray, StringArray, types::Int32Type};
let orig = [Some("a"), Some("b"), None];
let dictionary = DictionaryArray::<Int32Type>::from_iter(orig);
let typed = dictionary.downcast_dict::<StringArray>().unwrap();
assert_eq!(typed.value(0), "a");
assert_eq!(typed.value(1), "b");
assert!(typed.is_null(2));
sourcepub fn with_values(&self, values: ArrayRef) -> Self
pub fn with_values(&self, values: ArrayRef) -> Self
Returns a new dictionary with the same keys as the current instance but with a different set of dictionary values
This can be used to perform an operation on the values of a dictionary
Panics
Panics if values
has a length less than the current values
// Construct a Dict(Int32, Int8)
let mut builder = PrimitiveDictionaryBuilder::<Int32Type, Int8Type>::with_capacity(2, 200);
for i in 0..100 {
builder.append(i % 2).unwrap();
}
let dictionary = builder.finish();
// Perform a widening cast of dictionary values
let typed_dictionary = dictionary.downcast_dict::<Int8Array>().unwrap();
let values: Int64Array = typed_dictionary.values().unary(|x| x as i64);
// Create a Dict(Int32,
let new = dictionary.with_values(Arc::new(values));
// Verify values are as expected
let new_typed = new.downcast_dict::<Int64Array>().unwrap();
for i in 0..100 {
assert_eq!(new_typed.value(i), (i % 2) as i64)
}
sourcepub fn into_primitive_dict_builder<V>(
self
) -> Result<PrimitiveDictionaryBuilder<K, V>, Self>where
V: ArrowPrimitiveType,
pub fn into_primitive_dict_builder<V>( self ) -> Result<PrimitiveDictionaryBuilder<K, V>, Self>where V: ArrowPrimitiveType,
Returns PrimitiveDictionaryBuilder
of this dictionary array for mutating
its keys and values if the underlying data buffer is not shared by others.
sourcepub fn unary_mut<F, V>(
self,
op: F
) -> Result<DictionaryArray<K>, DictionaryArray<K>>where
V: ArrowPrimitiveType,
F: Fn(V::Native) -> V::Native,
pub fn unary_mut<F, V>( self, op: F ) -> Result<DictionaryArray<K>, DictionaryArray<K>>where V: ArrowPrimitiveType, F: Fn(V::Native) -> V::Native,
Applies an unary and infallible function to a mutable dictionary array. Mutable dictionary array means that the buffers are not shared with other arrays. As a result, this mutates the buffers directly without allocating new buffers.
Implementation
This will apply the function for all dictionary values, including those on null slots. This implies that the operation must be infallible for any value of the corresponding type or this function may panic.
Example
let values = Int32Array::from(vec![Some(10), Some(20), None]);
let keys = Int8Array::from_iter_values([0, 0, 1, 2]);
let dictionary = DictionaryArray::<Int8Type>::try_new(keys, Arc::new(values)).unwrap();
let c = dictionary.unary_mut::<_, Int32Type>(|x| x + 1).unwrap();
let typed = c.downcast_dict::<Int32Array>().unwrap();
assert_eq!(typed.value(0), 11);
assert_eq!(typed.value(1), 11);
assert_eq!(typed.value(2), 21);
sourcepub fn occupancy(&self) -> BooleanBuffer
pub fn occupancy(&self) -> BooleanBuffer
Computes an occupancy mask for this dictionary’s values
For each value in Self::values
the corresponding bit will be set in the
returned mask if it is referenced by a key in this DictionaryArray
Trait Implementations§
source§impl<K: ArrowDictionaryKeyType> AnyDictionaryArray for DictionaryArray<K>
impl<K: ArrowDictionaryKeyType> AnyDictionaryArray for DictionaryArray<K>
source§impl<T: ArrowDictionaryKeyType> Array for DictionaryArray<T>
impl<T: ArrowDictionaryKeyType> Array for DictionaryArray<T>
source§fn slice(&self, offset: usize, length: usize) -> ArrayRef
fn slice(&self, offset: usize, length: usize) -> ArrayRef
source§fn offset(&self) -> usize
fn offset(&self) -> usize
0
. Read moresource§fn logical_nulls(&self) -> Option<NullBuffer>
fn logical_nulls(&self) -> Option<NullBuffer>
source§fn is_nullable(&self) -> bool
fn is_nullable(&self) -> bool
false
if the array is guaranteed to not contain any logical nulls Read moresource§fn get_buffer_memory_size(&self) -> usize
fn get_buffer_memory_size(&self) -> usize
source§fn get_array_memory_size(&self) -> usize
fn get_array_memory_size(&self) -> usize
get_buffer_memory_size()
and
includes the overhead of the data structures that contain the pointers to the various buffers.source§fn is_null(&self, index: usize) -> bool
fn is_null(&self, index: usize) -> bool
index
is null.
When using this function on a slice, the index is relative to the slice. Read moresource§fn is_valid(&self, index: usize) -> bool
fn is_valid(&self, index: usize) -> bool
index
is not null.
When using this function on a slice, the index is relative to the slice. Read moresource§fn null_count(&self) -> usize
fn null_count(&self) -> usize
source§impl<K: ArrowDictionaryKeyType> Clone for DictionaryArray<K>
impl<K: ArrowDictionaryKeyType> Clone for DictionaryArray<K>
source§impl<T: ArrowDictionaryKeyType> Debug for DictionaryArray<T>
impl<T: ArrowDictionaryKeyType> Debug for DictionaryArray<T>
source§impl<T: ArrowDictionaryKeyType> From<ArrayData> for DictionaryArray<T>
impl<T: ArrowDictionaryKeyType> From<ArrayData> for DictionaryArray<T>
Constructs a DictionaryArray
from an array data reference.
source§impl<T: ArrowDictionaryKeyType> From<DictionaryArray<T>> for ArrayData
impl<T: ArrowDictionaryKeyType> From<DictionaryArray<T>> for ArrayData
source§fn from(array: DictionaryArray<T>) -> Self
fn from(array: DictionaryArray<T>) -> Self
source§impl<'a, T: ArrowDictionaryKeyType> FromIterator<&'a str> for DictionaryArray<T>
impl<'a, T: ArrowDictionaryKeyType> FromIterator<&'a str> for DictionaryArray<T>
Constructs a DictionaryArray
from an iterator of strings.
Example:
use arrow_array::{DictionaryArray, PrimitiveArray, StringArray, types::Int8Type};
let test = vec!["a", "a", "b", "c"];
let array: DictionaryArray<Int8Type> = test.into_iter().collect();
assert_eq!(
"DictionaryArray {keys: PrimitiveArray<Int8>\n[\n 0,\n 0,\n 1,\n 2,\n] values: StringArray\n[\n \"a\",\n \"b\",\n \"c\",\n]}\n",
format!("{:?}", array)
);
source§impl<'a, T: ArrowDictionaryKeyType> FromIterator<Option<&'a str>> for DictionaryArray<T>
impl<'a, T: ArrowDictionaryKeyType> FromIterator<Option<&'a str>> for DictionaryArray<T>
Constructs a DictionaryArray
from an iterator of optional strings.
Example:
use arrow_array::{DictionaryArray, PrimitiveArray, StringArray, types::Int8Type};
let test = vec!["a", "a", "b", "c"];
let array: DictionaryArray<Int8Type> = test
.iter()
.map(|&x| if x == "b" { None } else { Some(x) })
.collect();
assert_eq!(
"DictionaryArray {keys: PrimitiveArray<Int8>\n[\n 0,\n 0,\n null,\n 1,\n] values: StringArray\n[\n \"a\",\n \"c\",\n]}\n",
format!("{:?}", array)
);