# sanitise-file-name: an unusually flexible and efficient file name sanitiser
At the time of writing, I believe this to be one of the very best file name sanitisers around (comparing it with extant Rust options like sanitize-filename and sanitize-filename-reader-friendly, and other implementations I found for environments like Node.js and Python; I didn’t look at anything C/C++).
- It’s faster: while its flexibility may act against it in some cases (depending on the optimiser), it starts out with the substantial advantage of making exactly one allocation, whereas the alternatives (even Rust ones like sanitize-filename and sanitize-filename-reader-friendly) make at least three or four, normally quite a few more. (Note that I haven’t done any benchmarking comparison.) What’s more, it lets you keep on reusing one large-enough buffer if you want, for amortised *zero* allocations.
- It’s better documented: each option declares precisely what it does, why you might care, and sometimes gives extra suggestions (e.g. “if you want to support HFS+, normalise to NFD first so the length limit is correct”).
- It’s more flexible: you can choose whether you want things like Windows-safety and URL-safety, plus there are more options for producing probably-prettier results (mostly inspired a bit by sanitize-filename-reader-friendly).
- It’s more correct: it doesn’t remove unnecessary characaters and *does* remove all necessary characters (a surprisingly rare combination, though certainly not unknown); and length limitations (implemented correctly as UTF-8 code units rather than UTF-16 code units or Unicode code points or scalar values) truncate the base name rather than the extension where possible (now *this* is a feature that I haven’t found in any other library; and if you prefer to append the extension afterwards, I’ve got you covered, including adjusting the length limitation, which is *also* a feature that I haven’t found in any other library).
- It behaves in a platform- and file-system-neutral way, because matching the local platform’s behaviour is just asking for trouble, especially in cases where you can’t accurately detect the file system in use. Instead, it supports all even vaguely popular file systems by default (which only care about ␀ and `/`), and you can opt out of Windows support since it’s the only one with even mildly cumbersome rules.
- But ext3cow, which doesn’t allow `@`, is not supported.
- And HFS+ environments where `:` is reserved are only supported incidentally via Windows-safety; but I believe (without having definitely confirmed this) that that’s pretty much ancient history, Mac OS 9 or so from memory.
- It doesn’t even require `std` or `alloc` (though they’re enabled by default): it can support `tinyvec_string::ArrayString`, requiring no more than 510 bytes under the default options (and only that much because of extension cleverness).
- It uses *The Original And The Best™* English. (That is: *sanitise* instead of *sanitize*, and *file name* instead of *filename*.) ——Though as a concession to Americans, the functions are also exported under the spelling *sanitize*; but you’ll still have to steel yourselves to spelling it *sanitise* in the crate name.
Demonstration of the simplest and most convenient form of usage:
```rust
use sanitise_file_name::sanitise;
fn main() {
// Examples of some of the things it can do:
// whitespace is collapsed to one space,
// various ASCII puntuation gets replaced by underscores,
// outer whitespace is trimmed.
// (There are reasons for each of these things,
// and they can all be turned off or customised with options.)
assert_eq!(
sanitise(" https://example.com/Some\tfile \u{a0} name .exe "),
"https___example.com_Some file name.exe",
);
// The windows_safe option leads to the addition of the underscore.
assert_eq!(sanitise("aux.h"), "aux_.h");
}
```
`sanitise_file_name::Options` docs explain all sanitisation functionality precisely. And all of it is customisable.
This crate supports no_std operation and has several other Cargo features; refer to the root of the crate docs for information.