Crate quick_xml

Source
Expand description

High performance XML reader/writer.

§Description

quick-xml contains two modes of operation:

A streaming API based on the StAX model. This is suited for larger XML documents which cannot completely read into memory at once.

The user has to explicitly ask for the next XML event, similar to a database cursor. This is achieved by the following two structs:

  • Reader: A low level XML pull-reader where buffer allocation/clearing is left to user.
  • Writer: A XML writer. Can be nested with readers if you want to transform XMLs.

Especially for nested XML elements, the user must keep track where (how deep) in the XML document the current event is located.

quick-xml contains optional support of asynchronous reading and writing using tokio. To get it enable the async-tokio feature.

Furthermore, quick-xml also contains optional Serde support to directly serialize and deserialize from structs, without having to deal with the XML events. To get it enable the serialize feature. Read more about mapping Rust types to XML in the documentation of de module. Also check serde_helpers module.

§Examples

  • For a reading example see Reader
  • For a writing example see Writer

§Features

quick-xml supports the following features:

  • async-tokio — Enables support for asynchronous reading and writing from tokio’s IO-Traits by enabling reading events from types implementing tokio::io::AsyncBufRead.

  • encoding — Enables support of non-UTF-8 encoded documents. Encoding will be inferred from the XML declaration if it is found, otherwise UTF-8 is assumed.

    Currently, only ASCII-compatible encodings are supported. For example, UTF-16 will not work (therefore, quick-xml is not standard compliant).

    Thus, quick-xml supports all encodings of encoding_rs except these:

    You should stop processing a document when one of these encodings is detected, because generated events can be wrong and do not reflect a real document structure!

    Because these are the only supported encodings that are not ASCII compatible, you can check for them:

    use quick_xml::events::Event;
    use quick_xml::reader::Reader;
    
    let xml = to_utf16le_with_bom(r#"<?xml encoding='UTF-16'><element/>"#);
    let mut reader = Reader::from_reader(xml.as_ref());
    reader.config_mut().trim_text(true);
    
    let mut buf = Vec::new();
    let mut unsupported = false;
    loop {
        if !reader.decoder().encoding().is_ascii_compatible() {
            unsupported = true;
            break;
        }
        buf.clear();
        match reader.read_event_into(&mut buf).unwrap() {
            Event::Eof => break,
            _ => {}
        }
    }
    assert_eq!(unsupported, true);

    This restriction will be eliminated once issue #158 is resolved.

  • escape-html — Enables support for recognizing all HTML 5 entities in unescape function. The full list of entities also can be found in https://html.spec.whatwg.org/entities.json.

  • overlapped-lists — This feature is for the Serde deserializer that enables support for deserializing lists where tags are overlapped with tags that do not correspond to the list.

    When this feature is enabled, the XML:

    <any-name>
      <item/>
      <another-item/>
      <item/>
      <item/>
    </any-name>

    could be deserialized to a struct:

    #[derive(Deserialize)]
    #[serde(rename_all = "kebab-case")]
    struct AnyName {
      item: Vec<()>,
      another_item: (),
    }

    When this feature is not enabled (default), only the first element will be associated with the field, and the deserialized type will report an error (duplicated field) when the deserializer encounters a second <item/>.

    Note, that enabling this feature can lead to high and even unlimited memory consumption, because deserializer needs to check all events up to the end of a container tag (</any-name> in this example) to figure out that there are no more items for a field. If </any-name> or even EOF is not encountered, the parsing will never end which can lead to a denial-of-service (DoS) scenario.

    Having several lists and overlapped elements for them in XML could also lead to quadratic parsing time, because the deserializer must check the list of events as many times as the number of sequence fields present in the schema.

    To reduce negative consequences, always limit the maximum number of events that Deserializer will buffer.

    This feature works only with serialize feature and has no effect if serialize is not enabled.

  • serde-types — Enables serialization of some quick-xml types using serde. This feature is rarely needed.

    This feature does NOT provide XML serializer or deserializer. You should use the serialize feature for that instead.

  • serialize — Enables support for serde serialization and deserialization. When this feature is enabled, quick-xml provides serializer and deserializer for XML.

    This feature does NOT enables serializaton of the types inside quick-xml. If you need that, use the serde-types feature.

Re-exports§

Modules§

  • deserialize
    Serde Deserializer module.
  • A module for wrappers that encode / decode data.
  • Error management module
  • Manage xml character escapes
  • Defines zero-copy XML events used throughout this library.
  • Module for handling names according to the W3C Namespaces in XML 1.1 (Second Edition) specification
  • Contains low-level parsers of different XML pieces.
  • Contains high-level interface for a pull-based XML parser.
  • seserialize
    Module to handle custom serde Serializer
  • serde_helpersserde-types
    Provides helper functions to glue an XML with a serde content model.
  • Contains high-level interface for an events-based XML emitter.

Macros§