pub struct Reader<R> { /* private fields */ }
Expand description
A low level encoding-agnostic XML event reader.
Consumes bytes and streams XML Event
s.
This reader does not manage namespace declarations and not able to resolve
prefixes. If you want these features, use the NsReader
.
§Examples
use quick_xml::events::Event;
use quick_xml::reader::Reader;
let xml = r#"<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag2>Test 2</tag2>
</tag1>"#;
let mut reader = Reader::from_str(xml);
reader.config_mut().trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();
// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
// NOTE: this is the generic case when we don't know about the input BufRead.
// when the input is a &str or a &[u8], we don't actually need to use another
// buffer, we could directly call `reader.read_event()`
match reader.read_event_into(&mut buf) {
Err(e) => panic!("Error at position {}: {:?}", reader.error_position(), e),
// exits the loop when reaching end of file
Ok(Event::Eof) => break,
Ok(Event::Start(e)) => {
match e.name().as_ref() {
b"tag1" => println!("attributes values: {:?}",
e.attributes().map(|a| a.unwrap().value)
.collect::<Vec<_>>()),
b"tag2" => count += 1,
_ => (),
}
}
Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),
// There are several other `Event`s we do not consider here
_ => (),
}
// if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
buf.clear();
}
Implementations§
Source§impl<R: AsyncBufRead + Unpin> Reader<R>
impl<R: AsyncBufRead + Unpin> Reader<R>
Sourcepub async fn read_event_into_async<'b>(
&mut self,
buf: &'b mut Vec<u8>,
) -> Result<Event<'b>>
Available on crate feature async-tokio
only.
pub async fn read_event_into_async<'b>( &mut self, buf: &'b mut Vec<u8>, ) -> Result<Event<'b>>
async-tokio
only.An asynchronous version of read_event_into()
. Reads the next event into
given buffer.
This is the main entry point for reading XML Event
s when using an async reader.
See the documentation of read_event_into()
for more information.
§Examples
use quick_xml::events::Event;
use quick_xml::reader::Reader;
// This explicitly uses `from_reader("...".as_bytes())` to use a buffered
// reader instead of relying on the zero-copy optimizations for reading
// from byte slices, which provides the sync interface anyway.
let mut reader = Reader::from_reader(r#"
<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag2>Test 2</tag2>
</tag1>
"#.as_bytes());
reader.config_mut().trim_text(true);
let mut count = 0;
let mut buf = Vec::new();
let mut txt = Vec::new();
loop {
match reader.read_event_into_async(&mut buf).await {
Ok(Event::Start(_)) => count += 1,
Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),
Err(e) => panic!("Error at position {}: {:?}", reader.error_position(), e),
Ok(Event::Eof) => break,
_ => (),
}
buf.clear();
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub async fn read_to_end_into_async<'n>(
&mut self,
end: QName<'n>,
buf: &mut Vec<u8>,
) -> Result<Span>
Available on crate feature async-tokio
only.
pub async fn read_to_end_into_async<'n>( &mut self, end: QName<'n>, buf: &mut Vec<u8>, ) -> Result<Span>
async-tokio
only.An asynchronous version of read_to_end_into()
.
Reads asynchronously until end element is found using provided buffer as
intermediate storage for events content. This function is supposed to be
called after you already read a Start
event.
See the documentation of read_to_end_into()
for more information.
§Examples
This example shows, how you can skip XML content after you read the start event.
use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;
let mut reader = Reader::from_reader(r#"
<outer>
<inner>
<inner></inner>
<inner/>
<outer></outer>
<outer/>
</inner>
</outer>
"#.as_bytes());
reader.config_mut().trim_text(true);
let mut buf = Vec::new();
let start = BytesStart::new("outer");
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(reader.read_event_into_async(&mut buf).await.unwrap(), Event::Start(start));
// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end_into_async(end.name(), &mut buf).await.unwrap();
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event_into_async(&mut buf).await.unwrap(), Event::Eof);
Source§impl<R: BufRead> Reader<R>
This is an implementation for reading from a BufRead
as underlying byte stream.
impl<R: BufRead> Reader<R>
This is an implementation for reading from a BufRead
as underlying byte stream.
Sourcepub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>>
pub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>>
Reads the next Event
.
This is the main entry point for reading XML Event
s.
Event
s borrow buf
and can be converted to own their data if needed (uses Cow
internally).
Having the possibility to control the internal buffers gives you some additional benefits such as:
- Reduce the number of allocations by reusing the same buffer. For constrained systems,
you can call
buf.clear()
once you are done with processing the event (typically at the end of your loop). - Reserve the buffer length if you know the file size (using
Vec::with_capacity
).
§Examples
use quick_xml::events::Event;
use quick_xml::reader::Reader;
let xml = r#"<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag2>Test 2</tag2>
</tag1>"#;
let mut reader = Reader::from_str(xml);
reader.config_mut().trim_text(true);
let mut count = 0;
let mut buf = Vec::new();
let mut txt = Vec::new();
loop {
match reader.read_event_into(&mut buf) {
Ok(Event::Start(_)) => count += 1,
Ok(Event::Text(e)) => txt.push(e.unescape().unwrap().into_owned()),
Err(e) => panic!("Error at position {}: {:?}", reader.error_position(), e),
Ok(Event::Eof) => break,
_ => (),
}
buf.clear();
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub fn read_to_end_into(
&mut self,
end: QName<'_>,
buf: &mut Vec<u8>,
) -> Result<Span>
pub fn read_to_end_into( &mut self, end: QName<'_>, buf: &mut Vec<u8>, ) -> Result<Span>
Reads until end element is found using provided buffer as intermediate
storage for events content. This function is supposed to be called after
you already read a Start
event.
Returns a span that cover content between >
of an opening tag and <
of
a closing tag or an empty slice, if expand_empty_elements
is set and
this method was called after reading expanded Start
event.
Manages nested cases where parent and child elements have the literally same name.
If a corresponding End
event is not found, an error of type Error::IllFormed
will be returned. In particularly, that error will be returned if you call
this method without consuming the corresponding Start
event first.
If your reader created from a string slice or byte array slice, it is
better to use read_to_end()
method, because it will not copy bytes
into intermediate buffer.
The provided buf
buffer will be filled only by one event content at time.
Before reading of each event the buffer will be cleared. If you know an
appropriate size of each event, you can preallocate the buffer to reduce
number of reallocations.
The end
parameter should contain name of the end element in the reader
encoding. It is good practice to always get that parameter using
BytesStart::to_end()
method.
The correctness of the skipped events does not checked, if you disabled
the check_end_names
option.
§Namespaces
While the Reader
does not support namespace resolution, namespaces
does not change the algorithm for comparing names. Although the names
a:name
and b:name
where both prefixes a
and b
resolves to the
same namespace, are semantically equivalent, </b:name>
cannot close
<a:name>
, because according to the specification
The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag
§Examples
This example shows, how you can skip XML content after you read the start event.
use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;
let mut reader = Reader::from_str(r#"
<outer>
<inner>
<inner></inner>
<inner/>
<outer></outer>
<outer/>
</inner>
</outer>
"#);
reader.config_mut().trim_text(true);
let mut buf = Vec::new();
let start = BytesStart::new("outer");
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(reader.read_event_into(&mut buf).unwrap(), Event::Start(start));
// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end_into(end.name(), &mut buf).unwrap();
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event_into(&mut buf).unwrap(), Event::Eof);
Source§impl<'a> Reader<&'a [u8]>
This is an implementation for reading from a &[u8]
as underlying byte stream.
This implementation supports not using an intermediate buffer as the byte slice
itself can be used to borrow from.
impl<'a> Reader<&'a [u8]>
This is an implementation for reading from a &[u8]
as underlying byte stream.
This implementation supports not using an intermediate buffer as the byte slice
itself can be used to borrow from.
Sourcepub fn read_event(&mut self) -> Result<Event<'a>>
pub fn read_event(&mut self) -> Result<Event<'a>>
Read an event that borrows from the input rather than a buffer.
There is no asynchronous read_event_async()
version of this function,
because it is not necessary – the contents are already in memory and no IO
is needed, therefore there is no potential for blocking.
§Examples
use quick_xml::events::Event;
use quick_xml::reader::Reader;
let mut reader = Reader::from_str(r#"
<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag2>Test 2</tag2>
</tag1>
"#);
reader.config_mut().trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
loop {
match reader.read_event().unwrap() {
Event::Start(e) => count += 1,
Event::Text(e) => txt.push(e.unescape().unwrap().into_owned()),
Event::Eof => break,
_ => (),
}
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub fn read_to_end(&mut self, end: QName<'_>) -> Result<Span>
pub fn read_to_end(&mut self, end: QName<'_>) -> Result<Span>
Reads until end element is found. This function is supposed to be called
after you already read a Start
event.
Returns a span that cover content between >
of an opening tag and <
of
a closing tag or an empty slice, if expand_empty_elements
is set and
this method was called after reading expanded Start
event.
Manages nested cases where parent and child elements have the literally same name.
If a corresponding End
event is not found, an error of type Error::IllFormed
will be returned. In particularly, that error will be returned if you call
this method without consuming the corresponding Start
event first.
The end
parameter should contain name of the end element in the reader
encoding. It is good practice to always get that parameter using
BytesStart::to_end()
method.
The correctness of the skipped events does not checked, if you disabled
the check_end_names
option.
There is no asynchronous read_to_end_async()
version of this function,
because it is not necessary – the contents are already in memory and no IO
is needed, therefore there is no potential for blocking.
§Namespaces
While the Reader
does not support namespace resolution, namespaces
does not change the algorithm for comparing names. Although the names
a:name
and b:name
where both prefixes a
and b
resolves to the
same namespace, are semantically equivalent, </b:name>
cannot close
<a:name>
, because according to the specification
The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag
§Examples
This example shows, how you can skip XML content after you read the start event.
use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;
let mut reader = Reader::from_str(r#"
<outer>
<inner>
<inner></inner>
<inner/>
<outer></outer>
<outer/>
</inner>
</outer>
"#);
reader.config_mut().trim_text(true);
let start = BytesStart::new("outer");
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(reader.read_event().unwrap(), Event::Start(start));
// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end(end.name()).unwrap();
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event().unwrap(), Event::Eof);
Sourcepub fn read_text(&mut self, end: QName<'_>) -> Result<Cow<'a, str>>
pub fn read_text(&mut self, end: QName<'_>) -> Result<Cow<'a, str>>
Reads content between start and end tags, including any markup. This
function is supposed to be called after you already read a Start
event.
Manages nested cases where parent and child elements have the literally same name.
This method does not unescape read data, instead it returns content “as is” of the XML document. This is because it has no idea what text it reads, and if, for example, it contains CDATA section, attempt to unescape it content will spoil data.
Any text will be decoded using the XML current decoder()
.
Actually, this method perform the following code:
let span = reader.read_to_end(end)?;
let text = reader.decoder().decode(&reader.inner_slice[span]);
§Examples
This example shows, how you can read a HTML content from your XML document.
use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::Reader;
let mut reader = Reader::from_str("
<html>
<title>This is a HTML text</title>
<p>Usual XML rules does not apply inside it
<p>For example, elements not needed to be "closed"
</html>
");
reader.config_mut().trim_text(true);
let start = BytesStart::new("html");
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(reader.read_event().unwrap(), Event::Start(start));
// ...and disable checking of end names because we expect HTML further...
reader.config_mut().check_end_names = false;
// ...then, we could read text content until close tag.
// This call will correctly handle nested <html> elements.
let text = reader.read_text(end.name()).unwrap();
assert_eq!(text, Cow::Borrowed(r#"
<title>This is a HTML text</title>
<p>Usual XML rules does not apply inside it
<p>For example, elements not needed to be "closed"
"#));
assert!(matches!(text, Cow::Borrowed(_)));
// Now we can enable checks again
reader.config_mut().check_end_names = true;
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event().unwrap(), Event::Eof);
Source§impl<R> Reader<R>
Builder methods
impl<R> Reader<R>
Builder methods
Sourcepub fn from_reader(reader: R) -> Self
pub fn from_reader(reader: R) -> Self
Creates a Reader
that reads from a given reader.
Sourcepub fn config_mut(&mut self) -> &mut Config
pub fn config_mut(&mut self) -> &mut Config
Returns mutable reference to the parser configuration
Source§impl<R> Reader<R>
Getters
impl<R> Reader<R>
Getters
Sourcepub fn into_inner(self) -> R
pub fn into_inner(self) -> R
Consumes Reader
returning the underlying reader
Can be used to compute line and column of a parsing error position
§Examples
use std::{str, io::Cursor};
use quick_xml::events::Event;
use quick_xml::reader::Reader;
let xml = r#"<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag3>Test 2</tag3>
</tag1>"#;
let mut reader = Reader::from_reader(Cursor::new(xml.as_bytes()));
let mut buf = Vec::new();
fn into_line_and_column(reader: Reader<Cursor<&[u8]>>) -> (usize, usize) {
// We known that size cannot exceed usize::MAX because we created parser from single &[u8]
let end_pos = reader.buffer_position() as usize;
let mut cursor = reader.into_inner();
let s = String::from_utf8(cursor.into_inner()[0..end_pos].to_owned())
.expect("can't make a string");
let mut line = 1;
let mut column = 0;
for c in s.chars() {
if c == '\n' {
line += 1;
column = 0;
} else {
column += 1;
}
}
(line, column)
}
loop {
match reader.read_event_into(&mut buf) {
Ok(Event::Start(ref e)) => match e.name().as_ref() {
b"tag1" | b"tag2" => (),
tag => {
assert_eq!(b"tag3", tag);
assert_eq!((3, 22), into_line_and_column(reader));
break;
}
},
Ok(Event::Eof) => unreachable!(),
_ => (),
}
buf.clear();
}
Sourcepub fn get_mut(&mut self) -> &mut R
pub fn get_mut(&mut self) -> &mut R
Gets a mutable reference to the underlying reader.
Avoid read from this reader because this will not update reader’s position
and will lead to incorrect positions of errors. If you want to read, use
stream()
instead.
Sourcepub const fn buffer_position(&self) -> u64
pub const fn buffer_position(&self) -> u64
Gets the current byte position in the input data.
Sourcepub const fn error_position(&self) -> u64
pub const fn error_position(&self) -> u64
Gets the last error byte position in the input data. If there is no errors
yet, returns 0
.
Unlike buffer_position
it will point to the place where it is rational
to report error to the end user. For example, all SyntaxError
s are
reported when the parser sees EOF inside of some kind of markup. The
buffer_position()
will point to the last byte of input which is not
very useful. error_position()
will point to the start of corresponding
markup element (i. e. to the <
character).
This position is always <= buffer_position()
.
Sourcepub const fn decoder(&self) -> Decoder
pub const fn decoder(&self) -> Decoder
Get the decoder, used to decode bytes, read by this reader, to the strings.
If encoding
feature is enabled, the used encoding may change after
parsing the XML declaration, otherwise encoding is fixed to UTF-8.
If encoding
feature is enabled and no encoding is specified in declaration,
defaults to UTF-8.
Sourcepub fn stream(&mut self) -> BinaryStream<'_, R> ⓘ
pub fn stream(&mut self) -> BinaryStream<'_, R> ⓘ
Get the direct access to the underlying reader, but tracks the amount of
read data and update Reader::buffer_position()
accordingly.
Note, that this method gives you access to the internal reader and read
data will not be returned in any subsequent events read by read_event
family of methods.
§Example
This example demonstrates how to read stream raw bytes from an XML document. This could be used to implement streaming read of text, or to read raw binary bytes embedded in an XML document. (Documents with embedded raw bytes are not valid XML, but XML-derived file formats exist where such documents are valid).
use std::io::{BufRead, Read};
use quick_xml::events::{BytesEnd, BytesStart, Event};
use quick_xml::reader::Reader;
let mut reader = Reader::from_str("<tag>binary << data&></tag>");
// ^ ^ ^ ^
// 0 5 21 27
assert_eq!(
(reader.read_event().unwrap(), reader.buffer_position()),
// 5 - end of the `<tag>`
(Event::Start(BytesStart::new("tag")), 5)
);
// Reading directly from underlying reader will not update position
// let mut inner = reader.get_mut();
// Reading from the stream() advances position
let mut inner = reader.stream();
// Read binary data. We must know its size
let mut binary = [0u8; 16];
inner.read_exact(&mut binary).unwrap();
assert_eq!(&binary, b"binary << data&>");
// 21 - end of the `binary << data&>`
assert_eq!(inner.offset(), 21);
assert_eq!(reader.buffer_position(), 21);
assert_eq!(
(reader.read_event().unwrap(), reader.buffer_position()),
// 27 - end of the `</tag>`
(Event::End(BytesEnd::new("tag")), 27)
);
assert_eq!(reader.read_event().unwrap(), Event::Eof);