Expand description
High performance XML reader/writer.
Description
quick-xml contains two modes of operation:
A streaming API based on the StAX model. This is suited for larger XML documents which cannot completely read into memory at once.
The user has to explicitly ask for the next XML event, similar to a database cursor. This is achieved by the following two structs:
Reader
: A low level XML pull-reader where buffer allocation/clearing is left to user.Writer
: A XML writer. Can be nested with readers if you want to transform XMLs.
Especially for nested XML elements, the user must keep track where (how deep) in the XML document the current event is located.
quick-xml contains optional support of asynchronous reading using tokio.
Furthermore, quick-xml also contains optional Serde support to directly serialize and deserialize from structs, without having to deal with the XML events.
Examples
Features
quick-xml
supports the following features:
-
async-tokio
— Enables support for asynchronous reading fromtokio
’s IO-Traits by enabling reading events from types implementingtokio::io::AsyncBufRead
. -
encoding
— Enables support of non-UTF-8 encoded documents. Encoding will be inferred from the XML declaration if it will be found, otherwise UTF-8 is assumed.Currently, only ASCII-compatible encodings are supported, so, for example, UTF-16 will not work (therefore,
quick-xml
is not standard compliant).Thus, quick-xml supports all encodings of
encoding_rs
except these:You should stop to process document when one of that encoding will be detected, because generated events can be wrong and do not reflect a real document structure!
Because there is only supported encodings that is not ASCII compatible, you can check for that to detect them:
use quick_xml::events::Event; use quick_xml::reader::Reader; let xml = to_utf16le_with_bom(r#"<?xml encoding='UTF-16'><element/>"#); let mut reader = Reader::from_reader(xml.as_ref()); reader.trim_text(true); let mut buf = Vec::new(); let mut unsupported = false; loop { if !reader.decoder().encoding().is_ascii_compatible() { unsupported = true; break; } buf.clear(); match reader.read_event_into(&mut buf).unwrap() { Event::Eof => break, _ => {} } } assert_eq!(unsupported, true);
That restriction will be eliminated once issue #158 is resolved.
-
escape-html
— Enables support for recognizing all HTML 5 entities inunescape
andunescape_with
functions. The full list of entities also can be found in https://html.spec.whatwg.org/entities.json. -
overlapped-lists
— This feature for a serde deserializer that enables support for deserializing lists where tags are overlapped with tags that do not correspond to the list.When this feature is enabled, the XML:
<any-name> <item/> <another-item/> <item/> <item/> </any-name>
could be deserialized to a struct:
#[derive(Deserialize)] #[serde(rename_all = "kebab-case")] struct AnyName { item: Vec<()>, another_item: (), }
When this feature is not enabled (default), only the first element will be associated with the field, and the deserialized type will report an error (duplicated field) when the deserializer encounters a second
<item/>
.Note, that enabling this feature can lead to high and even unlimited memory consumption, because deserializer should check all events up to the end of a container tag (
</any-name>
in that example) to figure out that there are no more items for a field. If</any-name>
or even EOF is not encountered, the parsing will never end which can lead to a denial-of-service (DoS) scenario.Having several lists and overlapped elements for them in XML could also lead to quadratic parsing time, because the deserializer must check the list of events as many times as the number of sequence fields present in the schema.
To reduce negative consequences, always limit the maximum number of events that
Deserializer
will buffer.This feature works only with
serialize
feature and has no effect ifserialize
is not enabled. -
serialize
— Enables support forserde
serialization and deserialization
Re-exports
pub use crate::encoding::Decoder;
pub use crate::reader::Reader;
pub use crate::writer::ElementWriter;
pub use crate::writer::Writer;
Modules
serialize
Deserializer
moduleserialize
Serializer