Crate buffered_reader

source ·
Expand description

A BufferedReader is a super-powered Reader.

Like the BufRead trait, the BufferedReader trait has an internal buffer that is directly exposed to the user. This design enables two performance optimizations. First, the use of an internal buffer amortizes system calls. Second, exposing the internal buffer allows the user to work with data in place, which avoids another copy.

The BufRead trait, however, has a significant limitation for parsers: the user of a BufRead object can’t control the amount of buffering. This is essential for being able to conveniently work with data in place, and being able to lookahead without consuming data. The result is that either the sizing has to be handled by the instantiator of the BufRead object—assuming the BufRead object provides such a mechanism—which is a layering violation, or the parser has to fallback to buffering if the internal buffer is too small, which eliminates most of the advantages of the BufRead abstraction. The BufferedReader trait addresses this shortcoming by allowing the user to control the size of the internal buffer.

The BufferedReader trait also has some functionality, specifically, a generic interface to work with a stack of BufferedReader objects, that simplifies using multiple parsers simultaneously. This is helpful when one parser deals with framing (e.g., something like HTTP’s chunk transfer encoding), and another decodes the actual objects. It is also useful when objects are nested.

§Details

Because the BufRead trait doesn’t provide a mechanism for the user to size the internal buffer, a parser can’t generally be sure that the internal buffer will be large enough to allow it to work with all data in place.

Using the standard BufRead implementation, BufReader, the instantiator can set the size of the internal buffer at creation time. Unfortunately, this mechanism is ugly, and not always adequate. First, the parser is typically not the instantiator. Thus, the instantiator needs to know about the implementation details of all of the parsers, which turns an implementation detail into a cross-cutting concern. Second, when working with dynamically sized data, the maximum amount of the data that needs to be worked with in place may not be known apriori, or the maximum amount may be significantly larger than the typical amount. This leads to poorly sized buffers.

Alternatively, the code that uses, but does not instantiate a BufRead object, can be changed to stream the data, or to fallback to reading the data into a local buffer if the internal buffer is too small. Both of these approaches increase code complexity, and the latter approach is contrary to the BufRead’s goal of reducing unnecessary copying.

The BufferedReader trait solves this problem by allowing the user to dynamically (i.e., at read time, not open time) ensure that the internal buffer has a certain amount of data.

The ability to control the size of the internal buffer is also essential to straightforward support for speculative lookahead. The reason that speculative lookahead with a BufRead object is difficult is that speculative lookahead is /speculative/, i.e., if the parser backtracks, the data that was read must not be consumed. Using a BufRead object, this is not possible if the amount of lookahead is larger than the internal buffer. That is, if the amount of lookahead data is larger than the BufRead’s internal buffer, the parser first has to std::io::BufRead::consume some data to be able to examine more data. But, if the parser then decides to backtrack, it has no way to return the unused data to the BufRead object. This forces the parser to manage a buffer of read, but unconsumed data, which significantly complicates the code.

The BufferedReader trait also simplifies working with a stack of BufferedReaders in two ways. First, the BufferedReader trait provides generic methods to access the underlying BufferedReader. Thus, even when dealing with a trait object, it is still possible to recover the underlying BufferedReader. Second, the BufferedReader provides a mechanism to associate generic state with each BufferedReader via a cookie. Although it is possible to realize this functionality using a custom trait that extends the BufferedReader trait and wraps existing BufferedReader implementations, this approach eliminates a lot of error-prone, boilerplate code.

§Examples

The following examples show not only how to use a BufferedReader, but also better illustrate the aforementioned limitations of a BufReader.

Consider a file consisting of a sequence of objects, which are laid out as follows. Each object has a two byte header that indicates the object’s size in bytes. The object immediately follows the header. Thus, if we had two objects: “foobar” and “xyzzy”, in that order, the file would look like this:

0 6 f o o b a r 0 5 x y z z y

Here’s how we might parse this type of file using a BufferedReader:

use buffered_reader;
use buffered_reader::BufferedReader;

fn parse_object(content: &[u8]) {
    // Parse the object.
}

let mut br = buffered_reader::File::open(FILENAME)?;

// While we haven't reached EOF (i.e., we can read at
// least one byte).
while br.data(1)?.len() > 0 {
    // Get the object's length.
    let len = br.read_be_u16()? as usize;
    // Get the object's content.
    let content = br.data_consume_hard(len)?;

    // Parse the actual object using a real parser.  Recall:
    // `data_hard`() may return more than the requested amount (but
    // it will never return less).
    parse_object(&content[..len]);
}

Note that content is actually a pointer to the BufferedReader’s internal buffer. Thus, getting some data doesn’t require copying the data into a local buffer, which is often discarded immediately after the data is parsed.

Further, BufferedReader::data (and the other related functions) are guaranteed to return at least the requested amount of data. There are two exceptions: if an error occurs, or the end of the file is reached. Thus, only the cases that actually need to be handled by the user are actually exposed; there is no need to call something like std::io::Read::read in a loop to ensure the whole object is available.

Because reading is separate from consuming data, it is possible to get a chunk of data, inspect it, and then consume only what is needed. As mentioned above, this is only possible with a BufRead object if the internal buffer happens to be large enough. Using a BufferedReader, this is always possible, assuming the data fits in memory.

In our example, we actually have two parsers: one that deals with the framing, and one for the actual objects. The above code buffers the objects in their entirety, and then passes a slice containing the object to the object parser. If the object parser also worked with a BufferedReader object, then less buffering will usually be needed, and the two parsers could run simultaneously. This is particularly useful when the framing is more complicated like HTTP’s chunk transfer encoding. Then, when the object parser reads data, the frame parser is invoked lazily. This is done by implementing the BufferedReader trait for the framing parser, and stacking the BufferedReaders.

For our next example, we rewrite the previous code assuming that the object parser reads from a BufferedReader object. Since the framing parser is really just a limit on the object’s size, we don’t need to implement a special BufferedReader, but can use a Limitor to impose an upper limit on the amount that it can read. After the object parser has finished, we drain the object reader. This pattern is particularly helpful when individual objects that contain errors should be skipped.

use buffered_reader;
use buffered_reader::BufferedReader;

fn parse_object<R: BufferedReader<()>>(br: &mut R) {
    // Parse the object.
}

let mut br : Box<dyn BufferedReader<()>>
    = Box::new(buffered_reader::File::open(FILENAME)?);

// While we haven't reached EOF (i.e., we can read at
// least one byte).
while br.data(1)?.len() > 0 {
    // Get the object's length.
    let len = br.read_be_u16()? as u64;

    // Set up a limit.
    br = Box::new(buffered_reader::Limitor::new(br, len));

    // Parse the actual object using a real parser.
    parse_object(&mut br);

    // If the parser didn't consume the whole object, e.g., due to
    // a parse error, drop the rest.
    br.drop_eof();

    // Recover the framing parser's `BufferedReader`.
    br = br.into_inner().unwrap();
}

Of particular note is the generic functionality for dealing with stacked BufferedReaders: the BufferedReader::into_inner method is not bound to the implementation, which is often not be available due to type erasure, but is provided by the trait.

In addition to utility BufferedReaders like the Limitor, this crate also includes a few general-purpose parsers, like the Zlib decompressor.

Structs§

  • Changes the cookie type without introducing any buffering.
  • Decompresses the underlying BufferedReader using the bzip2 algorithm.
  • Decompresses the underlying BufferedReader using the deflate algorithm.
  • Duplicates the underlying BufferedReader without consuming any of the data.
  • Always returns EOF.
  • Wraps files using mmap().
  • Wraps a Reader.
  • Limits the amount of data that can be read from a BufferedReader.
  • Wraps a memory buffer.
  • A Reserve allows a reader to read everything except for the last N bytes (the reserve) from the underlying BufferedReader.
  • Decompresses the underlying BufferedReader using the zlib algorithm.

Traits§

Functions§