Crate quick_xml

source ·
Expand description

High performance XML reader/writer.

Description

quick-xml contains two modes of operation:

A streaming API based on the StAX model. This is suited for larger XML documents which cannot completely read into memory at once.

The user has to explicitly ask for the next XML event, similar to a database cursor. This is achieved by the following two structs:

  • Reader: A low level XML pull-reader where buffer allocation/clearing is left to user.
  • Writer: A XML writer. Can be nested with readers if you want to transform XMLs.

Especially for nested XML elements, the user must keep track where (how deep) in the XML document the current event is located.

quick-xml contains optional support of asynchronous reading using tokio.

Furthermore, quick-xml also contains optional Serde support to directly serialize and deserialize from structs, without having to deal with the XML events.

Examples

  • For a reading example see Reader
  • For a writing example see Writer

Features

quick-xml supports the following features:

  • async-tokio — Enables support for asynchronous reading from tokio’s IO-Traits by enabling reading events from types implementing tokio::io::AsyncBufRead.

  • encoding — Enables support of non-UTF-8 encoded documents. Encoding will be inferred from the XML declaration if it will be found, otherwise UTF-8 is assumed.

    Currently, only ASCII-compatible encodings are supported, so, for example, UTF-16 will not work (therefore, quick-xml is not standard compliant).

    Thus, quick-xml supports all encodings of encoding_rs except these:

    You should stop to process document when one of that encoding will be detected, because generated events can be wrong and do not reflect a real document structure!

    Because there is only supported encodings that is not ASCII compatible, you can check for that to detect them:

    use quick_xml::events::Event;
    use quick_xml::reader::Reader;
    
    let xml = to_utf16le_with_bom(r#"<?xml encoding='UTF-16'><element/>"#);
    let mut reader = Reader::from_reader(xml.as_ref());
    reader.trim_text(true);
    
    let mut buf = Vec::new();
    let mut unsupported = false;
    loop {
        if !reader.decoder().encoding().is_ascii_compatible() {
            unsupported = true;
            break;
        }
        buf.clear();
        match reader.read_event_into(&mut buf).unwrap() {
            Event::Eof => break,
            _ => {}
        }
    }
    assert_eq!(unsupported, true);

    That restriction will be eliminated once issue #158 is resolved.

  • escape-html — Enables support for recognizing all HTML 5 entities in unescape and unescape_with functions. The full list of entities also can be found in https://html.spec.whatwg.org/entities.json.

  • overlapped-lists — This feature for a serde deserializer that enables support for deserializing lists where tags are overlapped with tags that do not correspond to the list.

    When this feature is enabled, the XML:

    <any-name>
      <item/>
      <another-item/>
      <item/>
      <item/>
    </any-name>
    

    could be deserialized to a struct:

    #[derive(Deserialize)]
    #[serde(rename_all = "kebab-case")]
    struct AnyName {
      item: Vec<()>,
      another_item: (),
    }

    When this feature is not enabled (default), only the first element will be associated with the field, and the deserialized type will report an error (duplicated field) when the deserializer encounters a second <item/>.

    Note, that enabling this feature can lead to high and even unlimited memory consumption, because deserializer should check all events up to the end of a container tag (</any-name> in that example) to figure out that there are no more items for a field. If </any-name> or even EOF is not encountered, the parsing will never end which can lead to a denial-of-service (DoS) scenario.

    Having several lists and overlapped elements for them in XML could also lead to quadratic parsing time, because the deserializer must check the list of events as many times as the number of sequence fields present in the schema.

    To reduce negative consequences, always limit the maximum number of events that Deserializer will buffer.

    This feature works only with serialize feature and has no effect if serialize is not enabled.

  • serde-types — Enables serialization of some types using serde. Probably your rarely will need this feature enabled.

    This feature does NOT provide XML serializer or deserializer. You should use the serialize feature for that instead.

  • serialize — Enables support for serde serialization and deserialization. When this feature is enabled, quick-xml provides serializer and deserializer for XML.

    This feature does NOT enables serializaton of the types inside quick-xml. If you need that, use the serde-types feature.

Re-exports

pub use crate::encoding::Decoder;
pub use crate::reader::Reader;
pub use crate::writer::ElementWriter;
pub use crate::writer::Writer;

Modules

deserialize
Serde Deserializer module.
A module for wrappers that encode / decode data.
Manage xml character escapes
Defines zero-copy XML events used throughout this library.
Module for handling names according to the W3C Namespaces in XML 1.1 (Second Edition) specification
Contains high-level interface for a pull-based XML parser.
seserialize
Module to handle custom serde Serializer
Contains high-level interface for an events-based XML emitter.

Structs

A low level encoding-agnostic XML event reader that performs namespace resolution.

Enums

DeErrorserialize
(De)serialization error
The error type used by this crate.

Type Definitions

A specialized Result type where the error is hard-wired to Error.