Expand description
Shuttle is a library for testing concurrent Rust code, heavily inspired by Loom.
Shuttle focuses on randomized testing, rather than the exhaustive testing that Loom offers. This is a soundness—scalability trade-off: Shuttle is not sound (a passing Shuttle test does not prove the code is correct), but it scales to much larger test cases than Loom. Empirically, randomized testing is successful at finding most concurrency bugs, which tend not to be adversarial.
§Testing concurrent code
Consider this simple piece of concurrent code:
use std::sync::{Arc, Mutex};
use std::thread;
let lock = Arc::new(Mutex::new(0u64));
let lock2 = lock.clone();
thread::spawn(move || {
*lock.lock().unwrap() = 1;
});
assert_eq!(0, *lock2.lock().unwrap());
There is an obvious race condition here: if the spawned thread runs before the assertion, the assertion will fail. But writing a unit test that finds this execution is tricky. We could run the test many times and try to “get lucky” by finding a failing execution, but that’s not a very reliable testing approach. Even if the test does fail, it will be difficult to debug: we won’t be able to easily catch the failure in a debugger, and every time we make a change, we will need to run the test many times to decide whether we fixed the issue.
§Randomly testing concurrent code with Shuttle
Shuttle avoids this issue by controlling the scheduling of each thread in the program, and scheduling those threads randomly. By controlling the scheduling, Shuttle allows us to reproduce failing tests deterministically. By using random scheduling, with appropriate heuristics, Shuttle can still catch most (non-adversarial) concurrency bugs even though it is not an exhaustive checker.
A Shuttle version of the above test just wraps the test body in a call to Shuttle’s
check_random function, and replaces the concurrency-related imports from std
with imports
from shuttle
:
use shuttle::sync::{Arc, Mutex};
use shuttle::thread;
shuttle::check_random(|| {
let lock = Arc::new(Mutex::new(0u64));
let lock2 = lock.clone();
thread::spawn(move || {
*lock.lock().unwrap() = 1;
});
assert_eq!(0, *lock2.lock().unwrap());
}, 100);
This test detects the assertion failure with extremely high probability (over 99.9999%).
§Testing non-deterministic code
Shuttle supports testing code that uses data non-determinism (random number generation). For
example, this test uses the rand
crate to generate a random
number:
use rand::{thread_rng, Rng};
let x = thread_rng().gen::<u64>();
assert_eq!(x % 10, 7);
Shuttle provides its own implementation of rand
that is a drop-in replacement:
use shuttle::rand::{thread_rng, Rng};
shuttle::check_random(|| {
let x = thread_rng().gen::<u64>();
assert_ne!(x % 10, 7);
}, 100);
This test will run the body 100 times, and fail if any of those executions fails; the test
therefore fails with probability 1-(9/10)^100, or 99.997%. We can increase the 100
parameter
to run more executions and increase the probability of finding the failure. Note that Shuttle
isn’t doing anything special to increase the probability of this test failing other than running
the body multiple times.
When this test fails, Shuttle provides output that can be used to deterministically reproduce the failure:
test panicked in task "task-0" with schedule: "910102ccdedf9592aba2afd70104"
pass that schedule string into `shuttle::replay` to reproduce the failure
We can use Shuttle’s replay
function to replay the execution that causes the failure:
use shuttle::rand::{thread_rng, Rng};
shuttle::replay(|| {
let x = thread_rng().gen::<u64>();
assert_ne!(x % 10, 7);
}, "910102ccdedf9592aba2afd70104");
This runs the test only once, and is guaranteed to reproduce the failure.
Support for data non-determinism is most useful when combined with support for schedule
non-determinism (i.e., concurrency). For example, an integration test might spawn several
threads, and within each thread perform a random sequence of actions determined by thread_rng
(this style of testing is often referred to as a “stress test”). By using Shuttle to implement
the stress test, we can both increase the coverage of the test by exploring more thread
interleavings and allow test failures to be deterministically reproducible for debugging.
§Writing Shuttle tests
To test concurrent code with Shuttle, all uses of synchronization primitives from std
must be
replaced by their Shuttle equivalents. The simplest way to do this is via cfg
flags.
Specifically, if you enforce that all synchronization primitives are imported from a single
sync
module in your code, and implement that module like this:
#[cfg(all(feature = "shuttle", test))]
use shuttle::{sync::*, thread};
#[cfg(not(all(feature = "shuttle", test)))]
use std::{sync::*, thread};
Then a Shuttle test can be written like this:
#[cfg(feature = "shuttle")]
#[test]
fn concurrency_test_shuttle() {
use my_crate::*;
// ...
}
and be executed by running cargo test --features shuttle
.
§Choosing a scheduler and running a test
Shuttle tests need to choose a scheduler to use to direct the execution. The scheduler determines the order in which threads are scheduled. Different scheduling policies can increase the probability of detecting certain classes of bugs (e.g., race conditions), but at the cost of needing to test more executions.
Shuttle has a number of built-in schedulers, which implement the
Scheduler
trait. They are most easily accessed via convenience
methods:
check_random
runs a test using a random scheduler for a chosen number of executions.check_pct
runs a test using the Probabilistic Concurrency Testing (PCT) algorithm. PCT bounds the number of preemptions a test explores; empirically, most concurrency bugs can be detected with very few preemptions, and so PCT increases the probability of finding such bugs. The PCT scheduler can be configured with a “bug depth” (the number of preemptions) and a number of executions.check_dfs
runs a test with an exhaustive scheduler using depth-first search. Exhaustive testing is intractable for all but the very simplest programs, and so using this scheduler is not recommended, but it can be useful to thoroughly test small concurrency primitives. The DFS scheduler can be configured with a bound on the depth of schedules to explore.
When these convenience methods do not provide enough control, Shuttle provides a Runner
object for executing a test. A runner is constructed from a chosen scheduler, and
then invoked with the Runner::run
method. Shuttle also provides a PortfolioRunner
object
for running multiple schedulers, using parallelism to increase the number of test executions
explored.
Modules§
- Information about the current thread and current Shuttle execution.
- Shuttle’s implementation of an async executor, roughly equivalent to
futures::executor
. - Shuttle’s implementation of
std::hint
. - Shuttle’s implementation of the
lazy_static
crate, v1.4.0. - Shuttle’s implementation of the
rand
crate, v0.8. - Implementations of different scheduling strategies for concurrency testing.
- Shuttle’s implementation of
std::sync
. - Shuttle’s implementation of
std::thread
.
Macros§
- Declare a new lazy static value, like the
lazy_static
crate. - Declare a new thread local storage key of type
LocalKey
.
Structs§
- Configuration parameters for Shuttle
- A
PortfolioRunner
is the same as aRunner
, except that it can run multiple different schedulers (a “portfolio” of schedulers) in parallel. If any of the schedulers finds a failing execution of the test, the entire run fails. - A
Runner
is the entry-point for testing concurrent code.
Enums§
- Specifies how to persist schedules when a Shuttle test fails
- Specifies an upper bound on the number of steps a single iteration of a Shuttle test can take, and how to react when the bound is reached.
Functions§
- Run the given function under a depth-first-search scheduler until all interleavings have been explored (but if the max_iterations bound is provided, stop after that many iterations).
- Run the given function under a PCT concurrency scheduler for some number of iterations at the given depth. Each iteration will run a (potentially) different randomized schedule.
- Run the given function under a randomized concurrency scheduler for some number of iterations. Each iteration will run a (potentially) different randomized schedule.
- Run the given function under a scheduler that checks whether the function contains randomness which is not controlled by Shuttle. Each iteration will check a different random schedule and replay that schedule once.
- Run the given function according to a given encoded schedule, usually produced as the output of a failing Shuttle test case.
- Run the given function according to a schedule saved in the given file, usually produced as the output of a failing Shuttle test case.