Crate scraper

Source
Expand description

HTML parsing and querying with CSS selectors.

scraper is on Crates.io and GitHub.

Scraper provides an interface to Servo’s html5ever and selectors crates, for browser-grade parsing and querying.

§Examples

§Parsing a document

use scraper::Html;

let html = r#"
    <!DOCTYPE html>
    <meta charset="utf-8">
    <title>Hello, world!</title>
    <h1 class="foo">Hello, <i>world!</i></h1>
"#;

let document = Html::parse_document(html);

§Parsing a fragment

use scraper::Html;
let fragment = Html::parse_fragment("<h1>Hello, <i>world!</i></h1>");

§Parsing a selector

use scraper::Selector;
let selector = Selector::parse("h1.foo").unwrap();

§Selecting elements

use scraper::{Html, Selector};

let html = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

let fragment = Html::parse_fragment(html);
let selector = Selector::parse("li").unwrap();

for element in fragment.select(&selector) {
    assert_eq!("li", element.value().name());
}

§Selecting descendent elements

use scraper::{Html, Selector};

let html = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

let fragment = Html::parse_fragment(html);
let ul_selector = Selector::parse("ul").unwrap();
let li_selector = Selector::parse("li").unwrap();

let ul = fragment.select(&ul_selector).next().unwrap();
for element in ul.select(&li_selector) {
    assert_eq!("li", element.value().name());
}

§Accessing element attributes

use scraper::{Html, Selector};

let fragment = Html::parse_fragment(r#"<input name="foo" value="bar">"#);
let selector = Selector::parse(r#"input[name="foo"]"#).unwrap();

let input = fragment.select(&selector).next().unwrap();
assert_eq!(Some("bar"), input.value().attr("value"));

§Serializing HTML and inner HTML

use scraper::{Html, Selector};

let fragment = Html::parse_fragment("<h1>Hello, <i>world!</i></h1>");
let selector = Selector::parse("h1").unwrap();

let h1 = fragment.select(&selector).next().unwrap();

assert_eq!("<h1>Hello, <i>world!</i></h1>", h1.html());
assert_eq!("Hello, <i>world!</i>", h1.inner_html());

§Accessing descendent text

use scraper::{Html, Selector};

let fragment = Html::parse_fragment("<h1>Hello, <i>world!</i></h1>");
let selector = Selector::parse("h1").unwrap();

let h1 = fragment.select(&selector).next().unwrap();
let text = h1.text().collect::<Vec<_>>();

assert_eq!(vec!["Hello, ", "world!"], text);

Re-exports§

pub use crate::element_ref::ElementRef;
pub use crate::html::Html;
pub use crate::html::HtmlTreeSink;
pub use crate::node::Node;
pub use crate::selector::Selector;

Modules§

element_ref
Element references.
error
Custom error types for diagnostics Includes re-exported error types from dependencies
html
HTML documents and fragments.
node
HTML nodes.
selectable
Provides the Selectable to abstract over collections of elements
selector
CSS selectors.

Enums§

CaseSensitivity

Traits§

Element

Type Aliases§

StrTendril
Primary string tendril type.