Struct quick_xml::reader::Reader

source ·

pub struct Reader<R> { /* private fields */ }

Expand description

A low level encoding-agnostic XML event reader.

Consumes bytes and streams XML Events.

This reader does not manage namespace declarations and not able to resolve prefixes. If you want these features, use the NsReader.

Examples

use quick_xml::events::Event;
use quick_xml::reader::Reader;

let xml = r#"<tag1 att1 = "test">
                <tag2><!--Test comment-->Test</tag2>
                <tag2>Test 2</tag2>
             </tag1>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);

let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();

// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
    // NOTE: this is the generic case when we don't know about the input BufRead.
    // when the input is a &str or a &[u8], we don't actually need to use another
    // buffer, we could directly call `reader.read_event()`
    match reader.read_event_into(&mut buf) {
        Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
        // exits the loop when reaching end of file
        Ok(Event::Eof) => break,

        Ok(Event::Start(e)) => {
            match e.name().as_ref() {
                b"tag1" => println!("attributes values: {:?}",
                                    e.attributes().map(|a| a.unwrap().value)
                                    .collect::<Vec<_>>()),
                b"tag2" => count += 1,
                _ => (),
            }
        }
        Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),

        // There are several other `Event`s we do not consider here
        _ => (),
    }
    // if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
    buf.clear();
}

Implementations§

source §

impl<R: AsyncBufRead + Unpin> Reader<R>

source

pub async fn read_event_into_async<'b>( &mut self, buf: &'b mut Vec<u8> ) -> Result<Event<'b>>

Available on crate feature async-tokio only.

An asynchronous version of read_event_into(). Reads the next event into given buffer.

This is the main entry point for reading XML Events when using an async reader.

See the documentation of read_event_into() for more information.

pub async fn read_to_end_into_async<'n>( &mut self, end: QName<'n>, buf: &mut Vec<u8> ) -> Result

Available on crate feature async-tokio only.

An asynchronous version of read_to_end_into(). Reads asynchronously until end element is found using provided buffer as intermediate storage for events content. This function is supposed to be called after you already read a Start event.

See the documentation of read_to_end_into() for more information.

Examples

This example shows, how you can skip XML content after you read the start event.

use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;

let mut reader = Reader::from_reader(r#"
    <outer>
        <inner>
            <inner></inner>
            <inner/>
            <outer></outer>
            <outer/>
        </inner>
    </outer>
"#.as_bytes());
reader.trim_text(true);
let mut buf = Vec::new();

let start = BytesStart::new("outer");
let end   = start.to_end().into_owned();

// First, we read a start event...
assert_eq!(reader.read_event_into_async(&mut buf).await.unwrap(), Event::Start(start));

// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end_into_async(end.name(), &mut buf).await.unwrap();

// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event_into_async(&mut buf).await.unwrap(), Event::Eof);

source §

impl<R: BufRead> Reader<R>

This is an implementation for reading from a BufRead as underlying byte stream.

source

pub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>>

Reads the next Event.

This is the main entry point for reading XML Events.

Events borrow buf and can be converted to own their data if needed (uses Cow internally).

Having the possibility to control the internal buffers gives you some additional benefits such as:

Reduce the number of allocations by reusing the same buffer. For constrained systems, you can call buf.clear() once you are done with processing the event (typically at the end of your loop).
Reserve the buffer length if you know the file size (using Vec::with_capacity).

Examples

use quick_xml::events::Event;
use quick_xml::reader::Reader;

let xml = r#"<tag1 att1 = "test">
                <tag2><!--Test comment-->Test</tag2>
                <tag2>Test 2</tag2>
             </tag1>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut count = 0;
let mut buf = Vec::new();
let mut txt = Vec::new();
loop {
    match reader.read_event_into(&mut buf) {
        Ok(Event::Start(_)) => count += 1,
        Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),
        Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
        Ok(Event::Eof) => break,
        _ => (),
    }
    buf.clear();
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);

source

pub fn read_to_end_into( &mut self, end: QName<'_>, buf: &mut Vec<u8> ) -> Result

Reads until end element is found using provided buffer as intermediate storage for events content. This function is supposed to be called after you already read a Start event.

Returns a span that cover content between > of an opening tag and < of a closing tag or an empty slice, if expand_empty_elements is set and this method was called after reading expanded Start event.

Manages nested cases where parent and child elements have the literally same name.

If corresponding End event will not be found, the Error::UnexpectedEof will be returned. In particularly, that error will be returned if you call this method without consuming the corresponding Start event first.

If your reader created from a string slice or byte array slice, it is better to use read_to_end() method, because it will not copy bytes into intermediate buffer.

The provided buf buffer will be filled only by one event content at time. Before reading of each event the buffer will be cleared. If you know an appropriate size of each event, you can preallocate the buffer to reduce number of reallocations.

The end parameter should contain name of the end element in the reader encoding. It is good practice to always get that parameter using BytesStart::to_end() method.

The correctness of the skipped events does not checked, if you disabled the check_end_names option.

Namespaces

While the Reader does not support namespace resolution, namespaces does not change the algorithm for comparing names. Although the names a:name and b:name where both prefixes a and b resolves to the same namespace, are semantically equivalent, </b:name> cannot close <a:name>, because according to the specification

The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag

Examples

This example shows, how you can skip XML content after you read the start event.

use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;

let mut reader = Reader::from_str(r#"
    <outer>
        <inner>
            <inner></inner>
            <inner/>
            <outer></outer>
            <outer/>
        </inner>
    </outer>
"#);
reader.trim_text(true);
let mut buf = Vec::new();

let start = BytesStart::new("outer");
let end   = start.to_end().into_owned();

// First, we read a start event...
assert_eq!(reader.read_event_into(&mut buf).unwrap(), Event::Start(start));

// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end_into(end.name(), &mut buf).unwrap();

// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event_into(&mut buf).unwrap(), Event::Eof);

source §

impl Reader<BufReader<File>>

source

pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self>

Creates an XML reader from a file path.

source §

impl<'a> Reader<&'a [u8]>

This is an implementation for reading from a &[u8] as underlying byte stream. This implementation supports not using an intermediate buffer as the byte slice itself can be used to borrow from.

source

pub fn from_str(s: &'a str) -> Self

Creates an XML reader from a string slice.

source

pub fn read_event(&mut self) -> Result<Event<'a>>

Read an event that borrows from the input rather than a buffer.

There is no asynchronous read_event_async() version of this function, because it is not necessary – the contents are already in memory and no IO is needed, therefore there is no potential for blocking.

Examples

use quick_xml::events::Event;
use quick_xml::reader::Reader;

let mut reader = Reader::from_str(r#"
    <tag1 att1 = "test">
       <tag2><!--Test comment-->Test</tag2>
       <tag2>Test 2</tag2>
    </tag1>
"#);
reader.trim_text(true);

let mut count = 0;
let mut txt = Vec::new();
loop {
    match reader.read_event().unwrap() {
        Event::Start(e) => count += 1,
        Event::Text(e) => txt.push(e.unescape().unwrap().into_owned()),
        Event::Eof => break,
        _ => (),
    }
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);

source

pub fn read_to_end(&mut self, end: QName<'_>) -> Result

Reads until end element is found. This function is supposed to be called after you already read a Start event.

Returns a span that cover content between > of an opening tag and < of a closing tag or an empty slice, if expand_empty_elements is set and this method was called after reading expanded Start event.

Manages nested cases where parent and child elements have the literally same name.

If corresponding End event will not be found, the Error::UnexpectedEof will be returned. In particularly, that error will be returned if you call this method without consuming the corresponding Start event first.

The end parameter should contain name of the end element in the reader encoding. It is good practice to always get that parameter using BytesStart::to_end() method.

The correctness of the skipped events does not checked, if you disabled the check_end_names option.

There is no asynchronous read_to_end_async() version of this function, because it is not necessary – the contents are already in memory and no IO is needed, therefore there is no potential for blocking.

Namespaces

While the Reader does not support namespace resolution, namespaces does not change the algorithm for comparing names. Although the names a:name and b:name where both prefixes a and b resolves to the same namespace, are semantically equivalent, </b:name> cannot close <a:name>, because according to the specification

The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag

Examples

This example shows, how you can skip XML content after you read the start event.

use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;

let mut reader = Reader::from_str(r#"
    <outer>
        <inner>
            <inner></inner>
            <inner/>
            <outer></outer>
            <outer/>
        </inner>
    </outer>
"#);
reader.trim_text(true);

let start = BytesStart::new("outer");
let end   = start.to_end().into_owned();

// First, we read a start event...
assert_eq!(reader.read_event().unwrap(), Event::Start(start));

// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end(end.name()).unwrap();

// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event().unwrap(), Event::Eof);

source

pub fn read_text(&mut self, end: QName<'_>) -> Result<Cow<'a, str>>

Reads content between start and end tags, including any markup. This function is supposed to be called after you already read a Start event.

Manages nested cases where parent and child elements have the literally same name.

This method does not unescape read data, instead it returns content “as is” of the XML document. This is because it has no idea what text it reads, and if, for example, it contains CDATA section, attempt to unescape it content will spoil data.

Any text will be decoded using the XML current decoder().

Actually, this method perform the following code:

let span = reader.read_to_end(end)?;
let text = reader.decoder().decode(&reader.inner_slice[span]);

Examples

This example shows, how you can read a HTML content from your XML document.

use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;

let mut reader = Reader::from_str("
    <html>
        <title>This is a HTML text</title>
        <p>Usual XML rules does not apply inside it
        <p>For example, elements not needed to be &quot;closed&quot;
    </html>
");
reader.trim_text(true);

let start = BytesStart::new("html");
let end   = start.to_end().into_owned();

// First, we read a start event...
assert_eq!(reader.read_event().unwrap(), Event::Start(start));
// ...and disable checking of end names because we expect HTML further...
reader.check_end_names(false);

// ...then, we could read text content until close tag.
// This call will correctly handle nested <html> elements.
let text = reader.read_text(end.name()).unwrap();
assert_eq!(text, Cow::Borrowed(r#"
        <title>This is a HTML text</title>
        <p>Usual XML rules does not apply inside it
        <p>For example, elements not needed to be &quot;closed&quot;
    "#));
assert!(matches!(text, Cow::Borrowed(_)));

// Now we can enable checks again
reader.check_end_names(true);

// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event().unwrap(), Event::Eof);

source §

impl<R> Reader<R>

Builder methods

source

pub fn from_reader(reader: R) -> Self

Creates a Reader that reads from a given reader.

source

pub fn expand_empty_elements(&mut self, val: bool) -> &mut Self

Changes whether empty elements should be split into an Open and a Close event.

When set to true, all Empty events produced by a self-closing tag like <tag/> are expanded into a Start event followed by an End event. When set to false (the default), those tags are represented by an Empty event instead.

Note, that setting this to true will lead to additional allocates that needed to store tag name for an End event. However if check_end_names is also set, only one additional allocation will be performed that support both these options.

(false by default)

source

pub fn trim_text(&mut self, val: bool) -> &mut Self

Changes whether whitespace before and after character data should be removed.

When set to true, all Text events are trimmed. If after that the event is empty it will not be pushed.

Changing this option automatically changes the trim_text_end option.

(false by default).

WARNING: With this option every text events will be trimmed which is incorrect behavior when text events delimited by comments, processing instructions or CDATA sections. To correctly trim data manually apply BytesText::inplace_trim_start and BytesText::inplace_trim_end only to necessary events.

source

pub fn trim_text_end(&mut self, val: bool) -> &mut Self

Changes whether whitespace after character data should be removed.

When set to true, trailing whitespace is trimmed in Text events. If after that the event is empty it will not be pushed.

(false by default).

WARNING: With this option every text events will be trimmed which is incorrect behavior when text events delimited by comments, processing instructions or CDATA sections. To correctly trim data manually apply BytesText::inplace_trim_start and BytesText::inplace_trim_end only to necessary events.

source

pub fn trim_markup_names_in_closing_tags(&mut self, val: bool) -> &mut Self

Changes whether trailing whitespaces after the markup name are trimmed in closing tags </a >.

If true the emitted End event is stripped of trailing whitespace after the markup name.

Note that if set to false and check_end_names is true the comparison of markup names is going to fail erroneously if a closing tag contains trailing whitespaces.

(true by default)

source

pub fn check_end_names(&mut self, val: bool) -> &mut Self

Changes whether mismatched closing tag names should be detected.

Note, that start and end tags should match literally, they cannot have different prefixes even if both prefixes resolve to the same namespace. The XML

<outer xmlns="namespace" xmlns:p="namespace">
</p:outer>

is not valid, even though semantically the start tag is the same as the end tag. The reason is that namespaces are an extension of the original XML specification (without namespaces) and it should be backward-compatible.

When set to false, it won’t check if a closing tag matches the corresponding opening tag. For example, <mytag></different_tag> will be permitted.

If the XML is known to be sane (already processed, etc.) this saves extra time.

Note that the emitted End event will not be modified if this is disabled, ie. it will contain the data of the mismatched end tag.

Note, that setting this to true will lead to additional allocates that needed to store tag name for an End event. However if expand_empty_elements is also set, only one additional allocation will be performed that support both these options.

(true by default)

source

pub fn check_comments(&mut self, val: bool) -> &mut Self

Changes whether comments should be validated.

When set to true, every Comment event will be checked for not containing --, which is not allowed in XML comments. Most of the time we don’t want comments at all so we don’t really care about comment correctness, thus the default value is false to improve performance.

(false by default)

source §

impl<R> Reader<R>

Getters

source

pub fn into_inner(self) -> R

Consumes Reader returning the underlying reader

Can be used to compute line and column of a parsing error position

Examples

use std::{str, io::Cursor};
use quick_xml::events::Event;
use quick_xml::reader::Reader;

let xml = r#"<tag1 att1 = "test">
                <tag2><!--Test comment-->Test</tag2>
                <tag3>Test 2</tag3>
             </tag1>"#;
let mut reader = Reader::from_reader(Cursor::new(xml.as_bytes()));
let mut buf = Vec::new();

fn into_line_and_column(reader: Reader<Cursor<&[u8]>>) -> (usize, usize) {
    let end_pos = reader.buffer_position();
    let mut cursor = reader.into_inner();
    let s = String::from_utf8(cursor.into_inner()[0..end_pos].to_owned())
        .expect("can't make a string");
    let mut line = 1;
    let mut column = 0;
    for c in s.chars() {
        if c == '\n' {
            line += 1;
            column = 0;
        } else {
            column += 1;
        }
    }
    (line, column)
}

loop {
    match reader.read_event_into(&mut buf) {
        Ok(Event::Start(ref e)) => match e.name().as_ref() {
            b"tag1" | b"tag2" => (),
            tag => {
                assert_eq!(b"tag3", tag);
                assert_eq!((3, 22), into_line_and_column(reader));
                break;
            }
        },
        Ok(Event::Eof) => unreachable!(),
        _ => (),
    }
    buf.clear();
}