lance_file::v2::writer

Struct FileWriterOptions

Source
pub struct FileWriterOptions {
    pub data_cache_bytes: Option<u64>,
    pub max_page_bytes: Option<u64>,
    pub keep_original_array: Option<bool>,
    pub encoding_strategy: Option<Arc<dyn FieldEncodingStrategy>>,
    pub format_version: Option<LanceFileVersion>,
}

Fields§

§data_cache_bytes: Option<u64>

How many bytes to use for buffering column data

When data comes in small batches the writer will buffer column data so that larger pages can be created. This value will be divided evenly across all of the columns. Generally you want this to be at least large enough to match your filesystem’s ideal read size per column.

In some cases you might want this value to be even larger if you have highly compressible data. However, if this is too large, then the writer could require a lot of memory and write performance may suffer if the CPU-expensive encoding falls behind and can’t be interleaved with the I/O expensive flushing.

The default will use 8MiB per column which should be reasonable for most cases.

§max_page_bytes: Option<u64>

A hint to indicate the max size of a page

This hint can’t always be respected. A single value could be larger than this value and we never slice single values. In addition, there are some cases where it can be difficult to know size up-front and so we might not be able to respect this value.

§keep_original_array: Option<bool>

The file writer buffers columns until enough data has arrived to flush a page to disk.

Some columns with small data types may not flush very often. These arrays can stick around for a long time. These arrays might also be keeping larger data structures alive. By default, the writer will make a deep copy of this array to avoid any potential memory leaks. However, this can be disabled for a (probably minor) performance boost if you are sure that arrays are not keeping any sibling structures alive (this typically means the array was allocated in the same language / runtime as the writer)

Do not enable this if your data is arriving from the C data interface. Data typically arrives one “batch” at a time (encoded in the C data interface as a struct array). Each array in that batch keeps the entire batch alive. This means a small boolean array (which we will buffer in memory for quite a while) might keep a much larger record batch around in memory (even though most of that batch’s data has been written to disk)

§encoding_strategy: Option<Arc<dyn FieldEncodingStrategy>>§format_version: Option<LanceFileVersion>

The format version to use when writing the file

This controls which encodings will be used when encoding the data. Newer versions may have more efficient encodings. However, newer format versions will require more up-to-date readers to read the data.

Trait Implementations§

Source§

impl Clone for FileWriterOptions

Source§

fn clone(&self) -> FileWriterOptions

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FileWriterOptions

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for FileWriterOptions

Source§

fn default() -> FileWriterOptions

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<Unshared, Shared> IntoShared<Shared> for Unshared
where Shared: FromUnshared<Unshared>,

Source§

fn into_shared(self) -> Shared

Creates a shared type from an unshared type.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize = _

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T