pub struct Decoder { /* private fields */ }
Expand description
A push-based interface for decoding CSV data from an arbitrary byte stream
See Reader
for a higher-level interface for interface with Read
The push-based interface facilitates integration with sources that yield arbitrarily
delimited bytes ranges, such as BufRead
, or a chunked byte stream received from
object storage
fn read_from_csv<R: BufRead>(
mut reader: R,
schema: SchemaRef,
batch_size: usize,
) -> Result<impl Iterator<Item = Result<RecordBatch, ArrowError>>, ArrowError> {
let mut decoder = ReaderBuilder::new(schema)
.with_batch_size(batch_size)
.build_decoder();
let mut next = move || {
loop {
let buf = reader.fill_buf()?;
let decoded = decoder.decode(buf)?;
if decoded == 0 {
break;
}
// Consume the number of bytes read
reader.consume(decoded);
}
decoder.flush()
};
Ok(std::iter::from_fn(move || next().transpose()))
}
Implementations§
Source§impl Decoder
impl Decoder
Sourcepub fn decode(&mut self, buf: &[u8]) -> Result<usize, ArrowError>
pub fn decode(&mut self, buf: &[u8]) -> Result<usize, ArrowError>
Decode records from buf
returning the number of bytes read
This method returns once batch_size
objects have been parsed since the
last call to Self::flush
, or buf
is exhausted. Any remaining bytes
should be included in the next call to Self::decode
There is no requirement that buf
contains a whole number of records, facilitating
integration with arbitrary byte streams, such as that yielded by BufRead
or
network sources such as object storage
Sourcepub fn flush(&mut self) -> Result<Option<RecordBatch>, ArrowError>
pub fn flush(&mut self) -> Result<Option<RecordBatch>, ArrowError>
Flushes the currently buffered data to a RecordBatch
This should only be called after Self::decode
has returned Ok(0)
,
otherwise may return an error if part way through decoding a record
Returns Ok(None)
if no buffered data
Sourcepub fn capacity(&self) -> usize
pub fn capacity(&self) -> usize
Returns the number of records that can be read before requiring a call to Self::flush