Function gix_dir::walk

source ·
pub fn walk(
    worktree_root: &Path,
    ctx: Context<'_>,
    options: Options<'_>,
    delegate: &mut dyn Delegate,
) -> Result<(Outcome, PathBuf), Error>
Expand description

A function to perform a git-style, unsorted, directory walk.

  • worktree_root - the top-most root of the worktree, which must be a prefix to root.
    • If Options::precompose_unicode is enabled, this path must be precomposed.
    • The starting point of the traversal (traversal root) is calculated from by doing worktree_root + pathspec.common_prefix().
    • Note that if the traversal root leading to this directory or it itself is excluded, it will be provided to Delegate::emit() without further traversal.
    • If Options::precompose_unicode is enabled, all involved paths must be precomposed.
    • Must be contained in worktree_root.
  • ctx - everything needed to classify the paths seen during the traversal.
  • delegate - an implementation of Delegate to control details of the traversal and receive its results.

Returns (outcome, traversal_root), with the traversal_root actually being used for the traversal, useful to transform the paths returned for the user. It’s always within the worktree_root, or the same, but is hard to guess due to additional logic affecting it.

§Performance Notes

In theory, parallel directory traversal can be significantly faster, and what’s possible for our current gix_features::fs::WalkDir implementation is to abstract a filter_entry() method so it works both for the iterator from the walkdir crate as well as from jwalk. However, doing so as initial version has the risk of not being significantly harder if not impossible to implement as flow-control is very limited.

Thus the decision was made to start out with something akin to the Git implementation, get all tests and baseline comparison to pass, and see if an iterator with just filter_entry would be capable of dealing with it. Note that filter_entry is the only just-in-time traversal control that walkdir offers, even though one could consider switching to jwalk and just use its single-threaded implementation if a unified interface is necessary to make this work - jwalk has a more powerful API for this to work.

If that was the case, we are talking about 0.5s for single-threaded traversal (without doing any extra work) or 0.25s for optimal multi-threaded performance, all in the WebKit directory with 388k items to traverse. Thus, the speedup could easily be 2x or more and thus worth investigating in due time.