1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
/*!
This crate provides common routines used in command line applications, with a
focus on routines useful for search oriented applications. As a utility
library, there is no central type or function. However, a key focus of this
crate is to improve failure modes and provide user friendly error messages
when things go wrong.

To the best extent possible, everything in this crate works on Windows, macOS
and Linux.


# Standard I/O

[`is_readable_stdin`] determines whether stdin can be usefully read from. It
is useful when writing an application that changes behavior based on whether
the application was invoked with data on stdin. For example, `rg foo` might
recursively search the current working directory for occurrences of `foo`, but
`rg foo < file` might only search the contents of `file`.


# Coloring and buffering

The [`stdout`], [`stdout_buffered_block`] and [`stdout_buffered_line`] routines
are alternative constructors for [`StandardStream`]. A `StandardStream`
implements `termcolor::WriteColor`, which provides a way to emit colors to
terminals. Its key use is the encapsulation of buffering style. Namely,
`stdout` will return a line buffered `StandardStream` if and only if
stdout is connected to a tty, and will otherwise return a block buffered
`StandardStream`. Line buffering is important for use with a tty because it
typically decreases the latency at which the end user sees output. Block
buffering is used otherwise because it is faster, and redirecting stdout to a
file typically doesn't benefit from the decreased latency that line buffering
provides.

The `stdout_buffered_block` and `stdout_buffered_line` can be used to
explicitly set the buffering strategy regardless of whether stdout is connected
to a tty or not.


# Escaping

The [`escape`](crate::escape()), [`escape_os`], [`unescape`] and
[`unescape_os`] routines provide a user friendly way of dealing with UTF-8
encoded strings that can express arbitrary bytes. For example, you might want
to accept a string containing arbitrary bytes as a command line argument, but
most interactive shells make such strings difficult to type. Instead, we can
ask users to use escape sequences.

For example, `a\xFFz` is itself a valid UTF-8 string corresponding to the
following bytes:

```ignore
[b'a', b'\\', b'x', b'F', b'F', b'z']
```

However, we can
interpret `\xFF` as an escape sequence with the `unescape`/`unescape_os`
routines, which will yield

```ignore
[b'a', b'\xFF', b'z']
```

instead. For example:

```
use grep_cli::unescape;

// Note the use of a raw string!
assert_eq!(vec![b'a', b'\xFF', b'z'], unescape(r"a\xFFz"));
```

The `escape`/`escape_os` routines provide the reverse transformation, which
makes it easy to show user friendly error messages involving arbitrary bytes.


# Building patterns

Typically, regular expression patterns must be valid UTF-8. However, command
line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the standard
library's UTF-8 conversion functions from `OsStr`s do not provide good error
messages. However, the [`pattern_from_bytes`] and [`pattern_from_os`] do,
including reporting exactly where the first invalid UTF-8 byte is seen.

Additionally, it can be useful to read patterns from a file while reporting
good error messages that include line numbers. The [`patterns_from_path`],
[`patterns_from_reader`] and [`patterns_from_stdin`] routines do just that. If
any pattern is found that is invalid UTF-8, then the error includes the file
path (if available) along with the line number and the byte offset at which the
first invalid UTF-8 byte was observed.


# Read process output

Sometimes a command line application needs to execute other processes and
read its stdout in a streaming fashion. The [`CommandReader`] provides this
functionality with an explicit goal of improving failure modes. In particular,
if the process exits with an error code, then stderr is read and converted into
a normal Rust error to show to end users. This makes the underlying failure
modes explicit and gives more information to end users for debugging the
problem.

As a special case, [`DecompressionReader`] provides a way to decompress
arbitrary files by matching their file extensions up with corresponding
decompression programs (such as `gzip` and `xz`). This is useful as a means of
performing simplistic decompression in a portable manner without binding to
specific compression libraries. This does come with some overhead though, so
if you need to decompress lots of small files, this may not be an appropriate
convenience to use.

Each reader has a corresponding builder for additional configuration, such as
whether to read stderr asynchronously in order to avoid deadlock (which is
enabled by default).


# Miscellaneous parsing

The [`parse_human_readable_size`] routine parses strings like `2M` and converts
them to the corresponding number of bytes (`2 * 1<<20` in this case). If an
invalid size is found, then a good error message is crafted that typically
tells the user how to fix the problem.
*/

#![deny(missing_docs)]

mod decompress;
mod escape;
mod hostname;
mod human;
mod pattern;
mod process;
mod wtr;

pub use crate::{
    decompress::{
        resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
        DecompressionReader, DecompressionReaderBuilder,
    },
    escape::{escape, escape_os, unescape, unescape_os},
    hostname::hostname,
    human::{parse_human_readable_size, ParseSizeError},
    pattern::{
        pattern_from_bytes, pattern_from_os, patterns_from_path,
        patterns_from_reader, patterns_from_stdin, InvalidPatternError,
    },
    process::{CommandError, CommandReader, CommandReaderBuilder},
    wtr::{
        stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
    },
};

/// Returns true if and only if stdin is believed to be readable.
///
/// When stdin is readable, command line programs may choose to behave
/// differently than when stdin is not readable. For example, `command foo`
/// might search the current directory for occurrences of `foo` where as
/// `command foo < some-file` or `cat some-file | command foo` might instead
/// only search stdin for occurrences of `foo`.
///
/// Note that this isn't perfect and essentially corresponds to a heuristic.
/// When things are unclear (such as if an error occurs during introspection to
/// determine whether stdin is readable), this prefers to return `false`. That
/// means it's possible for an end user to pipe something into your program and
/// have this return `false` and thus potentially lead to ignoring the user's
/// stdin data. While not ideal, this is perhaps better than falsely assuming
/// stdin is readable, which would result in blocking forever on reading stdin.
/// Regardless, commands should always provide explicit fallbacks to override
/// behavior. For example, `rg foo -` will explicitly search stdin and `rg foo
/// ./` will explicitly search the current working directory.
pub fn is_readable_stdin() -> bool {
    use std::io::IsTerminal;

    #[cfg(unix)]
    fn imp() -> bool {
        use std::{
            fs::File,
            os::{fd::AsFd, unix::fs::FileTypeExt},
        };

        let stdin = std::io::stdin();
        let fd = match stdin.as_fd().try_clone_to_owned() {
            Ok(fd) => fd,
            Err(err) => {
                log::debug!(
                    "for heuristic stdin detection on Unix, \
                     could not clone stdin file descriptor \
                     (thus assuming stdin is not readable): {err}",
                );
                return false;
            }
        };
        let file = File::from(fd);
        let md = match file.metadata() {
            Ok(md) => md,
            Err(err) => {
                log::debug!(
                    "for heuristic stdin detection on Unix, \
                     could not get file metadata for stdin \
                     (thus assuming stdin is not readable): {err}",
                );
                return false;
            }
        };
        let ft = md.file_type();
        let is_file = ft.is_file();
        let is_fifo = ft.is_fifo();
        let is_socket = ft.is_socket();
        let is_readable = is_file || is_fifo || is_socket;
        log::debug!(
            "for heuristic stdin detection on Unix, \
             found that \
             is_file={is_file}, is_fifo={is_fifo} and is_socket={is_socket}, \
             and thus concluded that is_stdin_readable={is_readable}",
        );
        is_readable
    }

    #[cfg(windows)]
    fn imp() -> bool {
        let stdin = winapi_util::HandleRef::stdin();
        let typ = match winapi_util::file::typ(stdin) {
            Ok(typ) => typ,
            Err(err) => {
                log::debug!(
                    "for heuristic stdin detection on Windows, \
                     could not get file type of stdin \
                     (thus assuming stdin is not readable): {err}",
                );
                return false;
            }
        };
        let is_disk = typ.is_disk();
        let is_pipe = typ.is_pipe();
        let is_readable = is_disk || is_pipe;
        log::debug!(
            "for heuristic stdin detection on Windows, \
             found that is_disk={is_disk} and is_pipe={is_pipe}, \
             and thus concluded that is_stdin_readable={is_readable}",
        );
        is_readable
    }

    #[cfg(not(any(unix, windows)))]
    fn imp() -> bool {
        log::debug!("on non-{{Unix,Windows}}, assuming stdin is not readable");
        false
    }

    !std::io::stdin().is_terminal() && imp()
}

/// Returns true if and only if stdin is believed to be connected to a tty
/// or a console.
///
/// Note that this is now just a wrapper around
/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
/// Callers should prefer using the `IsTerminal` trait directly. This routine
/// is deprecated and will be removed in the next semver incompatible release.
#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
pub fn is_tty_stdin() -> bool {
    use std::io::IsTerminal;
    std::io::stdin().is_terminal()
}

/// Returns true if and only if stdout is believed to be connected to a tty
/// or a console.
///
/// This is useful for when you want your command line program to produce
/// different output depending on whether it's printing directly to a user's
/// terminal or whether it's being redirected somewhere else. For example,
/// implementations of `ls` will often show one item per line when stdout is
/// redirected, but will condensed output when printing to a tty.
///
/// Note that this is now just a wrapper around
/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
/// Callers should prefer using the `IsTerminal` trait directly. This routine
/// is deprecated and will be removed in the next semver incompatible release.
#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
pub fn is_tty_stdout() -> bool {
    use std::io::IsTerminal;
    std::io::stdout().is_terminal()
}

/// Returns true if and only if stderr is believed to be connected to a tty
/// or a console.
///
/// Note that this is now just a wrapper around
/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
/// Callers should prefer using the `IsTerminal` trait directly. This routine
/// is deprecated and will be removed in the next semver incompatible release.
#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
pub fn is_tty_stderr() -> bool {
    use std::io::IsTerminal;
    std::io::stderr().is_terminal()
}