pub struct FileReader { /* private fields */ }
Implementations§
Source§impl FileReader
impl FileReader
pub fn num_rows(&self) -> u64
pub fn metadata(&self) -> &Arc<CachedFileMetadata>
pub async fn read_global_buffer(&self, index: u32) -> Result<Bytes>
pub async fn read_all_metadata( scheduler: &FileScheduler, ) -> Result<CachedFileMetadata>
Sourcepub async fn try_open(
scheduler: FileScheduler,
base_projection: Option<ReaderProjection>,
decoder_strategy: Arc<DecoderPlugins>,
cache: &FileMetadataCache,
options: FileReaderOptions,
) -> Result<Self>
pub async fn try_open( scheduler: FileScheduler, base_projection: Option<ReaderProjection>, decoder_strategy: Arc<DecoderPlugins>, cache: &FileMetadataCache, options: FileReaderOptions, ) -> Result<Self>
Opens a new file reader without any pre-existing knowledge
This will read the file schema from the file itself and thus requires a bit more I/O
A base_projection
can also be provided. If provided, then the projection will apply
to all reads from the file that do not specify their own projection.
Sourcepub async fn try_open_with_file_metadata(
scheduler: FileScheduler,
base_projection: Option<ReaderProjection>,
decoder_plugins: Arc<DecoderPlugins>,
file_metadata: Arc<CachedFileMetadata>,
cache: &FileMetadataCache,
options: FileReaderOptions,
) -> Result<Self>
pub async fn try_open_with_file_metadata( scheduler: FileScheduler, base_projection: Option<ReaderProjection>, decoder_plugins: Arc<DecoderPlugins>, file_metadata: Arc<CachedFileMetadata>, cache: &FileMetadataCache, options: FileReaderOptions, ) -> Result<Self>
Same as try_open
but with the file metadata already loaded.
Sourcepub fn read_tasks(
&self,
params: ReadBatchParams,
batch_size: u32,
projection: Option<ReaderProjection>,
filter: FilterExpression,
) -> Result<Pin<Box<dyn Stream<Item = ReadBatchTask> + Send>>>
pub fn read_tasks( &self, params: ReadBatchParams, batch_size: u32, projection: Option<ReaderProjection>, filter: FilterExpression, ) -> Result<Pin<Box<dyn Stream<Item = ReadBatchTask> + Send>>>
Creates a stream of “read tasks” to read the data from the file
The arguments are similar to Self::read_stream_projected
but instead of returning a stream
of record batches it returns a stream of “read tasks”.
The tasks should be consumed with some kind of buffered
argument if CPU parallelism is desired.
Note that “read task” is probably a bit imprecise. The tasks are actually “decode tasks”. The reading happens asynchronously in the background. In other words, a single read task may map to multiple I/O operations or a single I/O operation may map to multiple read tasks.
Sourcepub fn read_stream_projected(
&self,
params: ReadBatchParams,
batch_size: u32,
batch_readahead: u32,
projection: ReaderProjection,
filter: FilterExpression,
) -> Result<Pin<Box<dyn RecordBatchStream>>>
pub fn read_stream_projected( &self, params: ReadBatchParams, batch_size: u32, batch_readahead: u32, projection: ReaderProjection, filter: FilterExpression, ) -> Result<Pin<Box<dyn RecordBatchStream>>>
Reads data from the file as a stream of record batches
-
params
- Specifies the range (or indices) of data to read -
batch_size
- The maximum size of a single batch. A batch may be smaller if it is the last batch or if it is not possible to create a batch of the requested size.For example, if the batch size is 1024 and one of the columns is a string column then there may be some ranges of 1024 rows that contain more than 2^31 bytes of string data (which is the maximum size of a string column in Arrow). In this case smaller batches may be emitted.
-
batch_readahead
- The number of batches to read ahead. This controls the amount of CPU parallelism of the read. In other words it controls how many batches will be decoded in parallel. It has no effect on the I/O parallelism of the read (how many I/O requests are in flight at once).This parameter also is also related to backpressure. If the consumer of the stream is slow then the reader will build up RAM.
-
projection
- A projection to apply to the read. This controls which columns are read from the file. The projection is NOT applied on top of the base projection. The projection is applied directly to the file schema.
Sourcepub fn read_stream(
&self,
params: ReadBatchParams,
batch_size: u32,
batch_readahead: u32,
filter: FilterExpression,
) -> Result<Pin<Box<dyn RecordBatchStream>>>
pub fn read_stream( &self, params: ReadBatchParams, batch_size: u32, batch_readahead: u32, filter: FilterExpression, ) -> Result<Pin<Box<dyn RecordBatchStream>>>
Reads data from the file as a stream of record batches
This is similar to Self::read_stream_projected
but uses the base projection
provided when the file was opened (or reads all columns if the file was
opened without a base projection)
pub fn schema(&self) -> &Arc<Schema>
Trait Implementations§
Auto Trait Implementations§
impl Freeze for FileReader
impl !RefUnwindSafe for FileReader
impl Send for FileReader
impl Sync for FileReader
impl Unpin for FileReader
impl !UnwindSafe for FileReader
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more