serialize
only.Expand description
Serde Deserializer
module.
Due to the complexity of the XML standard and the fact that serde was developed with JSON in mind, not all serde concepts apply smoothly to XML. This leads to that fact that some XML concepts are inexpressible in terms of serde derives and may require manual deserialization.
The most notable restriction is the ability to distinguish between elements and attributes, as no other format used by serde has such a conception.
Due to that the mapping is performed in a best effort manner.
Table of Contents
- Mapping XML to Rust types
- Composition Rules
- Difference between
$text
and$value
special names - Frequently Used Patterns
Mapping XML to Rust types
Type names are never considered when deserializing, so you can name your types as you wish. Other general rules:
struct
field name could be represented in XML only as an attribute name or an element name;enum
variant name could be represented in XML only as an attribute name or an element name;- the unit struct, unit type
()
and unit enum variant can be deserialized from any valid XML content:- attribute and element names;
- attribute and element values;
- text or CDATA content (including mixed text and CDATA content).
NOTE: examples, marked with FIXME:
does not work yet – any PRs that fixes
that are welcome! The message after marker is a test failure message.
Also, all that tests are marked with an ignore
option, although their
compiles. This is by intention, because rustdoc marks such blocks with
an information icon unlike no_run
blocks.
To parse all these XML's... | ...use that Rust type(s) |
---|---|
Content of attributes and text / CDATA content of elements (including mixed
text and CDATA content):
Merging of the text / CDATA content is tracked in the issue #474 and will be available in the next release. |
You can use any type that can be deserialized from an
NOTE: deserialization to non-owned types (i.e. borrow from the input),
such as |
Content of attributes and text / CDATA content of elements (including mixed
text and CDATA content), which represents a space-delimited lists, as
specified in the XML Schema specification for
Merging of the text / CDATA content is tracked in the issue #474 and will be available in the next release. |
Use any type that deserialized using
See the next row to learn where in your struct definition you should use that type. According to the XML Schema specification, delimiters for elements is one
or more space ( NOTE: according to the XML Schema restrictions, you cannot escape those
white-space characters, so list elements will never contain them.
In practice you will usually use NOTE: according to the XML Schema specification, list elements can be delimited only by spaces. Other delimiters (for example, commas) are not allowed. |
A typical XML with attributes. The root tag name does not matter:
|
A structure where each XML attribute is mapped to a field with a name
starting with
All these structs can be used to deserialize from an XML on the left side depending on amount of information that you want to get. Of course, you can combine them with elements extractor structs (see below). NOTE: XML allows you to have an attribute and an element with the same name
inside the one element. quick-xml deals with that by prepending a |
A typical XML with child elements. The root tag name does not matter:
|
A structure where an each XML child element are mapped to the field.
Each element name becomes a name of field. The name of the struct itself
does not matter:
All these structs can be used to deserialize from an XML on the left side depending on amount of information that you want to get. Of course, you can combine them with attributes extractor structs (see above). NOTE: XML allows you to have an attribute and an element with the same name
inside the one element. quick-xml deals with that by prepending a |
An XML with an attribute and a child element named equally:
|
You MUST specify
|
Optional attributes and elements | |
To parse all these XML's... | ...use that Rust type(s) |
An optional XML attribute that you want to capture.
The root tag name does not matter:
|
A structure with an optional field, renamed according to the requirements for attributes:
When the XML attribute is present, type
|
An optional XML elements that you want to capture.
The root tag name does not matter:
|
A structure with an optional field:
When the XML element is present, type Currently some edge cases exists described in the issue #497. |
Choices ( | |
To parse all these XML's... | ...use that Rust type(s) |
An XML with different root tag names:
|
An enum where each variant have a name of the possible root tag. The name of the enum itself does not matter. All these structs can be used to deserialize from any XML on the left side depending on amount of information that you want to get:
NOTE: You should have variants for all possible tag names in your enum
or have an |
|
A structure with a field which type is an Names of the enum, struct, and struct field with
|
|
A structure with a field which type is an Names of the enum, struct, and struct field with
NOTE: if your |
|
A structure with a field of an intermediate type with one field of Names of the enum and struct does not matter:
|
|
A structure with a field of an intermediate type with one field of Names of the enum and struct does not matter:
|
Sequences ( | |
To parse all these XML's... | ...use that Rust type(s) |
A sequence inside of a tag without a dedicated name:
|
A structure with a field which have a sequence type, for example, Use the
Use the ⓘ
Currently not working. The bug is tracked in #510. See also Frequently Used Patterns. |
A sequence with a strict order, probably with a mixed content
(text / CDATA and tags):
NOTE: this is just an example for showing mapping. XML does not allow multiple root tags – you should wrap the sequence into a tag. |
All elements mapped to the heterogeneous sequential type: tuple or named tuple.
Each element of the tuple should be able to be deserialized from the nested
element content ( ⓘ
ⓘ
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. Merging of the text / CDATA content is tracked in the issue #474 and will be available in the next release. |
A sequence with a non-strict order, probably with a mixed content
(text / CDATA and tags).
NOTE: this is just an example for showing mapping. XML does not allow multiple root tags – you should wrap the sequence into a tag. |
A homogeneous sequence of elements with a fixed or dynamic size:
ⓘ
ⓘ
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. Merging of the text / CDATA content is tracked in the issue #474 and will be available in the next release. |
A sequence with a strict order, probably with a mixed content,
(text and tags) inside of the other element:
|
A structure where all child elements mapped to the one field which have
a heterogeneous sequential type: tuple or named tuple. Each element of the
tuple should be able to be deserialized from the full element ( You MUST specify ⓘ
ⓘ
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. Merging of the text / CDATA content is tracked in the issue #474 and will be available in the next release. |
A sequence with a non-strict order, probably with a mixed content
(text / CDATA and tags) inside of the other element:
|
A structure where all child elements mapped to the one field which have
a homogeneous sequential type: array-like container. A container type You MUST specify ⓘ
ⓘ
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. Merging of the text / CDATA content is tracked in the issue #474 and will be available in the next release. |
Composition Rules
XML format is very different from other formats supported by serde
.
One such difference it is how data in the serialized form is related to
the Rust type. Usually each byte in the data can be associated only with
one field in the data structure. However, XML is an exception.
For example, took this XML:
<any>
<key attr="value"/>
</any>
and try to deserialize it to the struct AnyName
:
#[derive(Deserialize)]
struct AnyName { // AnyName calls `deserialize_struct` on `<any><key attr="value"/></any>`
// Used data: ^^^^^^^^^^^^^^^^^^^
key: Inner, // Inner calls `deserialize_struct` on `<key attr="value"/>`
// Used data: ^^^^^^^^^^^^
}
#[derive(Deserialize)]
struct Inner {
#[serde(rename = "@attr")]
attr: String, // String calls `deserialize_string` on `value`
// Used data: ^^^^^
}
Comments shows what methods of a Deserializer
called by each struct
deserialize
method and which input their seen. Used data shows, what
content is actually used for deserializing. As you see, name of the inner
<key>
tag used both as a map key / outer struct field name and as part
of the inner struct (although value of the tag, i.e. key
is not used
by it).
Difference between $text
and $value
special names
quick-xml supports two special names for fields – $text
and $value
.
Although they may seem the same, there is a distinction. Two different
names is required mostly for serialization, because quick-xml should know
how you want to serialize certain constructs, which could be represented
through XML in multiple different ways.
The only difference in how complex types and sequences are serialized.
If you doubt which one you should select, begin with $value
.
$text
$text
is used when you want to write your XML as a text or a CDATA content.
More formally, field with that name represents simple type definition with
{variety} = atomic
or {variety} = union
whose basic members are all atomic,
as described in the specification.
As a result, not all types of such fields can be serialized. Only serialization of following types are supported:
- all primitive types (strings, numbers, booleans)
- unit variants of enumerations (serializes to a name of a variant)
- newtypes (delegates serialization to inner type)
Option
of above (None
serializes to nothing)- sequences (including tuples and tuple variants of enumerations) of above,
excluding
None
and empty string elements (because it will not be possible to deserialize them back). The elements are separated by space(s) - unit type
()
and unit structs (serializes to nothing)
Complex types, such as structs and maps, are not supported in this field.
If you want them, you should use $value
.
Sequences serialized to a space-delimited string, that is why only certain types are allowed in this mode:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$text")]
field: Vec<usize>,
}
let obj = AnyName { field: vec![1, 2, 3] };
let xml = to_string(&obj).unwrap();
assert_eq!(xml, "<AnyName>1 2 3</AnyName>");
let object: AnyName = from_str(&xml).unwrap();
assert_eq!(object, obj);
$value
NOTE: a name #content
would better explain the purpose of that field,
but $value
is used for compatibility with other XML serde crates, which
uses that name. This allow you to switch XML crate more smoothly if required.
Representation of primitive types in $value
does not differ from their
representation in $text
field. The difference is how sequences are serialized.
$value
serializes each sequence item as a separate XML element. The name
of that element is taken from serialized type, and because only enum
s provide
such name (their variant name), only they should be used for such fields.
$value
fields does not support struct
types with fields, the serialization
of such types would end with an Err(Unsupported)
. Unit structs and unit
type ()
serializing to nothing and can be deserialized from any content.
Serialization and deserialization of $value
field performed as usual, except
that name for an XML element will be given by the serialized type, instead of
field. The latter allow to serialize enumerated types, where variant is encoded
as a tag name, and, so, represent an XSD xs:choice
schema by the Rust enum
.
In the example below, field will be serialized as <field/>
, because elements
get their names from the field name. It cannot be deserialized, because Enum
expects elements <A/>
, <B/>
or <C/>
, but AnyName
looked only for <field/>
:
#[derive(Deserialize, Serialize)]
enum Enum { A, B, C }
#[derive(Deserialize, Serialize)]
struct AnyName {
// <field/>
field: Enum,
}
If you rename field to $value
, then field
would be serialized as <A/>
,
<B/>
or <C/>
, depending on the its content. It is also possible to
deserialize it from the same elements:
#[derive(Deserialize, Serialize)]
struct AnyName {
// <A/>, <B/> or <C/>
#[serde(rename = "$value")]
field: Enum,
}
Primitives and sequences of primitives
Sequences serialized to a list of elements. Note, that types that does not produce their own tag (i. e. primitives) are written as is, without delimiters:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$value")]
field: Vec<usize>,
}
let obj = AnyName { field: vec![1, 2, 3] };
let xml = to_string(&obj).unwrap();
// Note, that types that does not produce their own tag are written as is!
assert_eq!(xml, "<AnyName>123</AnyName>");
let object: AnyName = from_str("<AnyName>123</AnyName>").unwrap();
assert_eq!(object, AnyName { field: vec![123] });
// `1 2 3` is mapped to a single `usize` element
// It is impossible to deserialize list of primitives to such field
from_str::<AnyName>("<AnyName>1 2 3</AnyName>").unwrap_err();
A particular case of that example is a string $value
field, which probably
would be a most used example of that attribute:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$value")]
field: String,
}
let obj = AnyName { field: "content".to_string() };
let xml = to_string(&obj).unwrap();
assert_eq!(xml, "<AnyName>content</AnyName>");
Structs and sequences of structs
Note, that structures does not have serializable name as well (name of the
type are never used), so it is impossible to serialize non-unit struct or
sequence of non-unit structs in $value
field. (sequences of) unit structs
are serialized as empty string, although, because units itself serializing
to nothing:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct Unit;
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
// #[serde(default)] is required to deserialization of empty lists
// This is a general note, not related to $value
#[serde(rename = "$value", default)]
field: Vec<Unit>,
}
let obj = AnyName { field: vec![Unit, Unit, Unit] };
let xml = to_string(&obj).unwrap();
assert_eq!(xml, "<AnyName/>");
let object: AnyName = from_str("<AnyName/>").unwrap();
assert_eq!(object, AnyName { field: vec![] });
let object: AnyName = from_str("<AnyName></AnyName>").unwrap();
assert_eq!(object, AnyName { field: vec![] });
let object: AnyName = from_str("<AnyName><A/><B/><C/></AnyName>").unwrap();
assert_eq!(object, AnyName { field: vec![Unit, Unit, Unit] });
Enums and sequences of enums
Enumerations uses the variant name as an element name:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$value")]
field: Vec<Enum>,
}
#[derive(Deserialize, Serialize, PartialEq, Debug)]
enum Enum { A, B, C }
let obj = AnyName { field: vec![Enum::A, Enum::B, Enum::C] };
let xml = to_string(&obj).unwrap();
assert_eq!(
xml,
"<AnyName>\
<A/>\
<B/>\
<C/>\
</AnyName>"
);
let object: AnyName = from_str(&xml).unwrap();
assert_eq!(object, obj);
You can have either $text
or $value
field in your structs. Unfortunately,
that is not enforced, so you can theoretically have both, but you should
avoid that.
Frequently Used Patterns
Some XML constructs used so frequent, that it is worth to document the recommended way to represent them in the Rust. The sections below describes them.
<element>
lists
Many XML formats wrap lists of elements in the additional container, although this is not required by the XML rules:
<root>
<field1/>
<field2/>
<list><!-- Container -->
<element/>
<element/>
<element/>
</list>
<field3/>
</root>
In this case, there is a great desire to describe this XML in this way:
/// Represents <element/>
type Element = ();
/// Represents <root>...</root>
struct AnyName {
// Incorrect
list: Vec<Element>,
}
This will not work, because potentially <list>
element can have attributes
and other elements inside. You should define the struct for the <list>
explicitly, as you do that in the XSD for that XML:
/// Represents <element/>
type Element = ();
/// Represents <root>...</root>
struct AnyName {
// Correct
list: List,
}
/// Represents <list>...</list>
struct List {
element: Vec<Element>,
}
If you want to simplify your API, you could write a simple function for unwrapping
inner list and apply it via deserialize_with
:
use quick_xml::de::from_str;
use serde::{Deserialize, Deserializer};
/// Represents <element/>
type Element = ();
/// Represents <root>...</root>
#[derive(Deserialize, Debug, PartialEq)]
struct AnyName {
#[serde(deserialize_with = "unwrap_list")]
list: Vec<Element>,
}
fn unwrap_list<'de, D>(deserializer: D) -> Result<Vec<Element>, D::Error>
where
D: Deserializer<'de>,
{
/// Represents <list>...</list>
#[derive(Deserialize)]
struct List {
// default allows empty list
#[serde(default)]
element: Vec<Element>,
}
Ok(List::deserialize(deserializer)?.element)
}
assert_eq!(
AnyName { list: vec![(), (), ()] },
from_str("
<root>
<list>
<element/>
<element/>
<element/>
</list>
</root>
").unwrap(),
);
Instead of writing such functions manually, you also could try https://lib.rs/crates/serde-query.
Structs
Enums
Traits
&[u8]
.Functions
reader
. If you want have a &str
input and want to borrow
as much as possible, use from_str
.T
from a string of XML text.