pub struct DataFile {
pub path: String,
pub fields: Vec<i32>,
pub column_indices: Vec<i32>,
pub file_major_version: u32,
pub file_minor_version: u32,
}
Expand description
Lance Data File
Fields§
§path: String
Relative path to the root.
fields: Vec<i32>
The ids of the fields/columns in this file.
-1 is used for “unassigned” while in memory. It is not meant to be written to disk. -2 is used for “tombstoned”, meaningful a field that is no longer in use. This is often because the original field id was reassigned to a different data file.
In Lance v1 IDs are assigned based on position in the file, offset by the max existing field id in the table (if any already). So when a fragment is first created with one file of N columns, the field ids will be 1, 2, …, N. If a second, fragment is created with M columns, the field ids will be N+1, N+2, …, N+M.
In Lance v1 there is one field for each field in the input schema, this includes nested fields (both struct and list). Fixed size list fields have only a single field id (these are not considered nested fields in Lance v1).
This allows column indices to be calculated from field IDs and the input schema.
In Lance v2 the field IDs generally follow the same pattern but there is no way to calculate the column index from the field ID. This is because a given field could be encoded in many different ways, some of which occupy a different number of columns. For example, a struct field could be encoded into N + 1 columns or it could be encoded into a single packed column. To determine column indices the column_indices property should be used instead.
In Lance v1 these ids must be sorted but might not always be contiguous.
column_indices: Vec<i32>
The top-level column indices for each field in the file.
If the data file is version 1 then this property will be empty
Otherwise there must be one entry for each field in fields
.
Some fields may not correspond to a top-level column in the file. In these cases the index will -1.
For example, consider the schema:
- dimension: packed-struct (0):
- x: u32 (1)
- y: u32 (2)
- path: list
(3) - embedding: fsl<768> (4)
- fp64
- borders: fsl<4> (5)
- simple-struct (6)
- margin: fp64 (7)
- padding: fp64 (8)
- simple-struct (6)
One possible column indices array could be: [0, -1, -1, 1, 3, 4, 5, 6, 7]
This reflects quite a few phenomenon:
- The packed struct is encoded into a single column and there is no top-level column for the x or y fields
- The variable sized list is encoded into two columns
- The embedding is encoded into a single column (common for FSL of primitive) and there is not “FSL column”
- The borders field actually does have an “FSL column”
The column indices table may not have duplicates (other than -1)
file_major_version: u32
The major file version used to create the file
file_minor_version: u32
The minor file version used to create the file
If both file_major_version
and file_minor_version
are set to 0,
then this is a version 0.1 or version 0.2 file.
Trait Implementations§
Source§impl Message for DataFile
impl Message for DataFile
Source§fn encoded_len(&self) -> usize
fn encoded_len(&self) -> usize
Source§fn encode(&self, buf: &mut impl BufMut) -> Result<(), EncodeError>where
Self: Sized,
fn encode(&self, buf: &mut impl BufMut) -> Result<(), EncodeError>where
Self: Sized,
Source§fn encode_to_vec(&self) -> Vec<u8> ⓘwhere
Self: Sized,
fn encode_to_vec(&self) -> Vec<u8> ⓘwhere
Self: Sized,
Source§fn encode_length_delimited(
&self,
buf: &mut impl BufMut,
) -> Result<(), EncodeError>where
Self: Sized,
fn encode_length_delimited(
&self,
buf: &mut impl BufMut,
) -> Result<(), EncodeError>where
Self: Sized,
Source§fn encode_length_delimited_to_vec(&self) -> Vec<u8> ⓘwhere
Self: Sized,
fn encode_length_delimited_to_vec(&self) -> Vec<u8> ⓘwhere
Self: Sized,
Source§fn decode(buf: impl Buf) -> Result<Self, DecodeError>where
Self: Default,
fn decode(buf: impl Buf) -> Result<Self, DecodeError>where
Self: Default,
Source§fn decode_length_delimited(buf: impl Buf) -> Result<Self, DecodeError>where
Self: Default,
fn decode_length_delimited(buf: impl Buf) -> Result<Self, DecodeError>where
Self: Default,
Source§fn merge(&mut self, buf: impl Buf) -> Result<(), DecodeError>where
Self: Sized,
fn merge(&mut self, buf: impl Buf) -> Result<(), DecodeError>where
Self: Sized,
self
. Read moreSource§fn merge_length_delimited(&mut self, buf: impl Buf) -> Result<(), DecodeError>where
Self: Sized,
fn merge_length_delimited(&mut self, buf: impl Buf) -> Result<(), DecodeError>where
Self: Sized,
self
.Source§impl Name for DataFile
impl Name for DataFile
Source§const NAME: &'static str = "DataFile"
const NAME: &'static str = "DataFile"
Message
.
This name is the same as it appears in the source .proto file, e.g. FooBar
.Source§const PACKAGE: &'static str = "lance.table"
const PACKAGE: &'static str = "lance.table"
.
, e.g. google.protobuf
.Source§fn full_name() -> String
fn full_name() -> String
Message
.
It’s prefixed with the package name and names of any parent messages,
e.g. google.rpc.BadRequest.FieldViolation
.
By default, this is the package name followed by the message name.
Fully-qualified names must be unique within a domain of Type URLs.impl StructuralPartialEq for DataFile
Auto Trait Implementations§
impl Freeze for DataFile
impl RefUnwindSafe for DataFile
impl Send for DataFile
impl Sync for DataFile
impl Unpin for DataFile
impl UnwindSafe for DataFile
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more