spider_cli-2.22.7 is not a library.
Spider CLI
A fast command line spider or crawler.
Dependencies
On Linux
- OpenSSL 1.0.1, 1.0.2, 1.1.0, or 1.1.1
Note: You need to have pkg-config
installed otherwise openssl
will not be recognized by cargo.
# On Ubuntu:
Usage
The CLI is a binary so do not add it to your Cargo.toml
file.
# without headless
# with headless
# with smart mode defaults to HTTP and Headless when needed
# with full resources not just web pages
Cli
The following can also be ran via command line to run the crawler.
If you need loging pass in the -v
flag.
Crawl and output all links visited to a file.
Download all html to local destination. Use the option -t
to pass in the target destination folder.
Set a crawl budget and only crawl one domain.
Set a crawl budget and only allow 10 pages matching the /blog/ path and limit all pages to 100.
)
All features are available except the Website struct on_link_find_callback
configuration option.