pub struct FileWriterOptions {
pub data_cache_bytes: Option<u64>,
pub max_page_bytes: Option<u64>,
pub keep_original_array: Option<bool>,
pub encoding_strategy: Option<Arc<dyn FieldEncodingStrategy>>,
pub format_version: Option<LanceFileVersion>,
}
Fields§
§data_cache_bytes: Option<u64>
How many bytes to use for buffering column data
When data comes in small batches the writer will buffer column data so that larger pages can be created. This value will be divided evenly across all of the columns. Generally you want this to be at least large enough to match your filesystem’s ideal read size per column.
In some cases you might want this value to be even larger if you have highly compressible data. However, if this is too large, then the writer could require a lot of memory and write performance may suffer if the CPU-expensive encoding falls behind and can’t be interleaved with the I/O expensive flushing.
The default will use 8MiB per column which should be reasonable for most cases.
max_page_bytes: Option<u64>
A hint to indicate the max size of a page
This hint can’t always be respected. A single value could be larger than this value and we never slice single values. In addition, there are some cases where it can be difficult to know size up-front and so we might not be able to respect this value.
keep_original_array: Option<bool>
The file writer buffers columns until enough data has arrived to flush a page to disk.
Some columns with small data types may not flush very often. These arrays can stick around for a long time. These arrays might also be keeping larger data structures alive. By default, the writer will make a deep copy of this array to avoid any potential memory leaks. However, this can be disabled for a (probably minor) performance boost if you are sure that arrays are not keeping any sibling structures alive (this typically means the array was allocated in the same language / runtime as the writer)
Do not enable this if your data is arriving from the C data interface. Data typically arrives one “batch” at a time (encoded in the C data interface as a struct array). Each array in that batch keeps the entire batch alive. This means a small boolean array (which we will buffer in memory for quite a while) might keep a much larger record batch around in memory (even though most of that batch’s data has been written to disk)
encoding_strategy: Option<Arc<dyn FieldEncodingStrategy>>
§format_version: Option<LanceFileVersion>
The format version to use when writing the file
This controls which encodings will be used when encoding the data. Newer versions may have more efficient encodings. However, newer format versions will require more up-to-date readers to read the data.
Trait Implementations§
Source§impl Clone for FileWriterOptions
impl Clone for FileWriterOptions
Source§fn clone(&self) -> FileWriterOptions
fn clone(&self) -> FileWriterOptions
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreSource§impl Debug for FileWriterOptions
impl Debug for FileWriterOptions
Source§impl Default for FileWriterOptions
impl Default for FileWriterOptions
Source§fn default() -> FileWriterOptions
fn default() -> FileWriterOptions
Auto Trait Implementations§
impl Freeze for FileWriterOptions
impl !RefUnwindSafe for FileWriterOptions
impl Send for FileWriterOptions
impl Sync for FileWriterOptions
impl Unpin for FileWriterOptions
impl !UnwindSafe for FileWriterOptions
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more