Expand description
Low-level Rust lexer.
The idea with rustc_lexer
is to make a reusable library,
by separating out pure lexing and rustc-specific concerns, like spans,
error reporting, and interning. So, rustc_lexer operates directly on &str
,
produces simple tokens which are a pair of type-tag and a bit of original text,
and does not report errors, instead storing them as flags on the token.
Tokens produced by this lexer are not yet ready for parsing the Rust syntax.
For that see rustc_parse::lexer
, which converts this basic token stream
into wide tokens used by actual parser.
The purpose of this crate is to convert raw sources into a labeled sequence of well-known token types, so building an actual Rust token stream will be easier.
The main entity of this crate is the TokenKind
enum which represents common
lexeme types.
Modules§
- unescape
- Utilities for validating string and char literals and turning them into values they represent.
Structs§
- Cursor
- Peekable iterator over a char sequence.
- Guarded
Str #"abc"#
,##"a"
(fewer closing), or even#"a
(unterminated).- Token
- Parsed token. It doesn’t contain information about data that has been parsed, only the type of the token and its size.
Enums§
- Base
- Base of numeric literal encoding according to its prefix.
- DocStyle
- Literal
Kind - Enum representing the literal types supported by the lexer.
- RawStr
Error - Token
Kind - Enum representing common lexeme types.
Constants§
- UNICODE_
XID_ VERSION - The version of Unicode that this version of unicode-xid is based on.
Functions§
- is_
id_ continue - True if
c
is valid as a non-first character of an identifier. See Rust language reference for a formal definition of valid identifier name. - is_
id_ start - True if
c
is valid as a first character of an identifier. See Rust language reference for a formal definition of valid identifier name. - is_
ident - The passed string is lexically an identifier.
- is_
whitespace - True if
c
is considered a whitespace according to Rust language definition. See Rust language reference for definitions of these classes. - strip_
shebang rustc
allows files to have a shebang, e.g. “#!/usr/bin/rustrun”, but shebang isn’t a part of rust syntax.- tokenize
- Creates an iterator that produces tokens from the input string.
- validate_
raw_ str - Validates a raw string literal. Used for getting more information about a
problem with a
RawStr
/RawByteStr
with aNone
field.