Crate buffered_reader
source ·Expand description
A BufferedReader
is a super-powered Read
er.
Like the BufRead
trait, the BufferedReader
trait has an
internal buffer that is directly exposed to the user. This design
enables two performance optimizations. First, the use of an
internal buffer amortizes system calls. Second, exposing the
internal buffer allows the user to work with data in place, which
avoids another copy.
The BufRead
trait, however, has a significant limitation for
parsers: the user of a BufRead
object can’t control the amount
of buffering. This is essential for being able to conveniently
work with data in place, and being able to lookahead without
consuming data. The result is that either the sizing has to be
handled by the instantiator of the BufRead
object—assuming
the BufRead
object provides such a mechanism—which is a
layering violation, or the parser has to fallback to buffering if
the internal buffer is too small, which eliminates most of the
advantages of the BufRead
abstraction. The BufferedReader
trait addresses this shortcoming by allowing the user to control
the size of the internal buffer.
The BufferedReader
trait also has some functionality,
specifically, a generic interface to work with a stack of
BufferedReader
objects, that simplifies using multiple parsers
simultaneously. This is helpful when one parser deals with
framing (e.g., something like HTTP’s chunk transfer encoding),
and another decodes the actual objects. It is also useful when
objects are nested.
§Details
Because the BufRead
trait doesn’t provide a mechanism for the
user to size the internal buffer, a parser can’t generally be sure
that the internal buffer will be large enough to allow it to work
with all data in place.
Using the standard BufRead
implementation, BufReader
, the
instantiator can set the size of the internal buffer at creation
time. Unfortunately, this mechanism is ugly, and not always
adequate. First, the parser is typically not the instantiator.
Thus, the instantiator needs to know about the implementation
details of all of the parsers, which turns an implementation
detail into a cross-cutting concern. Second, when working with
dynamically sized data, the maximum amount of the data that needs
to be worked with in place may not be known apriori, or the
maximum amount may be significantly larger than the typical
amount. This leads to poorly sized buffers.
Alternatively, the code that uses, but does not instantiate a
BufRead
object, can be changed to stream the data, or to
fallback to reading the data into a local buffer if the internal
buffer is too small. Both of these approaches increase code
complexity, and the latter approach is contrary to the
BufRead
’s goal of reducing unnecessary copying.
The BufferedReader
trait solves this problem by allowing the
user to dynamically (i.e., at read time, not open time) ensure
that the internal buffer has a certain amount of data.
The ability to control the size of the internal buffer is also
essential to straightforward support for speculative lookahead.
The reason that speculative lookahead with a BufRead
object is
difficult is that speculative lookahead is /speculative/, i.e., if
the parser backtracks, the data that was read must not be
consumed. Using a BufRead
object, this is not possible if the
amount of lookahead is larger than the internal buffer. That is,
if the amount of lookahead data is larger than the BufRead
’s
internal buffer, the parser first has to std::io::BufRead::consume
some
data to be able to examine more data. But, if the parser then
decides to backtrack, it has no way to return the unused data to
the BufRead
object. This forces the parser to manage a buffer
of read, but unconsumed data, which significantly complicates the
code.
The BufferedReader
trait also simplifies working with a stack of
BufferedReader
s in two ways. First, the BufferedReader
trait
provides generic methods to access the underlying
BufferedReader
. Thus, even when dealing with a trait object, it
is still possible to recover the underlying BufferedReader
.
Second, the BufferedReader
provides a mechanism to associate
generic state with each BufferedReader
via a cookie. Although
it is possible to realize this functionality using a custom trait
that extends the BufferedReader
trait and wraps existing
BufferedReader
implementations, this approach eliminates a lot
of error-prone, boilerplate code.
§Examples
The following examples show not only how to use a
BufferedReader
, but also better illustrate the aforementioned
limitations of a BufRead
er.
Consider a file consisting of a sequence of objects, which are laid out as follows. Each object has a two byte header that indicates the object’s size in bytes. The object immediately follows the header. Thus, if we had two objects: “foobar” and “xyzzy”, in that order, the file would look like this:
0 6 f o o b a r 0 5 x y z z y
Here’s how we might parse this type of file using a
BufferedReader
:
use buffered_reader;
use buffered_reader::BufferedReader;
fn parse_object(content: &[u8]) {
// Parse the object.
}
let mut br = buffered_reader::File::open(FILENAME)?;
// While we haven't reached EOF (i.e., we can read at
// least one byte).
while br.data(1)?.len() > 0 {
// Get the object's length.
let len = br.read_be_u16()? as usize;
// Get the object's content.
let content = br.data_consume_hard(len)?;
// Parse the actual object using a real parser. Recall:
// `data_hard`() may return more than the requested amount (but
// it will never return less).
parse_object(&content[..len]);
}
Note that content
is actually a pointer to the
BufferedReader
’s internal buffer. Thus, getting some data
doesn’t require copying the data into a local buffer, which is
often discarded immediately after the data is parsed.
Further, BufferedReader::data
(and the other related functions) are guaranteed
to return at least the requested amount of data. There are two
exceptions: if an error occurs, or the end of the file is reached.
Thus, only the cases that actually need to be handled by the user
are actually exposed; there is no need to call something like
std::io::Read::read
in a loop to ensure the whole object is available.
Because reading is separate from consuming data, it is possible to
get a chunk of data, inspect it, and then consume only what is
needed. As mentioned above, this is only possible with a
BufRead
object if the internal buffer happens to be large
enough. Using a BufferedReader
, this is always possible,
assuming the data fits in memory.
In our example, we actually have two parsers: one that deals with
the framing, and one for the actual objects. The above code
buffers the objects in their entirety, and then passes a slice
containing the object to the object parser. If the object parser
also worked with a BufferedReader
object, then less buffering
will usually be needed, and the two parsers could run
simultaneously. This is particularly useful when the framing is
more complicated like HTTP’s chunk transfer encoding. Then,
when the object parser reads data, the frame parser is invoked
lazily. This is done by implementing the BufferedReader
trait
for the framing parser, and stacking the BufferedReader
s.
For our next example, we rewrite the previous code assuming that
the object parser reads from a BufferedReader
object. Since the
framing parser is really just a limit on the object’s size, we
don’t need to implement a special BufferedReader
, but can use a
Limitor
to impose an upper limit on the amount
that it can read. After the object parser has finished, we drain
the object reader. This pattern is particularly helpful when
individual objects that contain errors should be skipped.
use buffered_reader;
use buffered_reader::BufferedReader;
fn parse_object<R: BufferedReader<()>>(br: &mut R) {
// Parse the object.
}
let mut br : Box<dyn BufferedReader<()>>
= Box::new(buffered_reader::File::open(FILENAME)?);
// While we haven't reached EOF (i.e., we can read at
// least one byte).
while br.data(1)?.len() > 0 {
// Get the object's length.
let len = br.read_be_u16()? as u64;
// Set up a limit.
br = Box::new(buffered_reader::Limitor::new(br, len));
// Parse the actual object using a real parser.
parse_object(&mut br);
// If the parser didn't consume the whole object, e.g., due to
// a parse error, drop the rest.
br.drop_eof();
// Recover the framing parser's `BufferedReader`.
br = br.into_inner().unwrap();
}
Of particular note is the generic functionality for dealing with
stacked BufferedReader
s: the BufferedReader::into_inner
method is not bound
to the implementation, which is often not be available due to type
erasure, but is provided by the trait.
In addition to utility BufferedReader
s like the
Limitor
, this crate also includes a few
general-purpose parsers, like the Zlib
decompressor.
Structs§
- Changes the cookie type without introducing any buffering.
- Decompresses the underlying
BufferedReader
using the bzip2 algorithm. - Decompresses the underlying
BufferedReader
using the deflate algorithm. - Duplicates the underlying
BufferedReader
without consuming any of the data. - Always returns EOF.
- Wraps files using
mmap
(). - Wraps a
Read
er. - Limits the amount of data that can be read from a
BufferedReader
. - Wraps a memory buffer.
- A
Reserve
allows a reader to read everything except for the last N bytes (the reserve) from the underlyingBufferedReader
. - Decompresses the underlying
BufferedReader
using the zlib algorithm.
Traits§
- The generic
BufferReader
interface.
Functions§
- A generic implementation of
std::io::Read::read
appropriate for anyBufferedReader
implementation.