[−][src]Crate postgres_parser
postgres-parser
is a safe wrapper around Postgres' SQL query parser.
Its primary purpose is to easily syntax check SQL statements (individually or en masse) in addition to providing a parse tree that can be walked, examined, mutated, or transformed.
Technical Details
First things first, this crate, as part of its build process, downloads the Postgres source code, builds Postgres to LLVM IR (and then bitcode), which is ultimately statically linked into the resulting Rust library (rlib), relying on LLVM's "link time optimization" (LTO) features to reduce the Postgres LLVM bitcode to just the symbols/code required by this crate.
That's a lot of work, and it requires that building this create, or even crates that use this
crate as a dependency, have the LLVM toolchain on the system $PATH
.
The justification for this is that, despite the build complexity, we can always stay current with Postgres as it evolves its SQL support and, thus, its parser.
What's in the Box?
There's three primary things. The first two are safe interfaces into parsing SQL statements and evaluating the resulting parse trees. The third is the set of completely unsafe functions and structs upon which the first two are built.
parse_query()
and the nodes
module
The parse_query()
function parses a string of SQL statements and returns a Vec of parsed
nodes, or a parse error
A quick example:
use postgres_parser::{parse_query, PgParserError}; let parsetree = parse_query("SELECT * FROM my_table WHERE id = 42; SELECT 2;"); match parsetree { Ok(nodes) => { // one node for each statement parsed from the input string above for node in nodes { // debug-print the node for this query println!("{:#?}", node); } } // one possible error, for the first mal-formed SQL statement Err(e) => { panic!(e); } }
The nodes represented in the parse tree live in the postgres_parser::nodes
module. The
top-level node is simply called Node
and is an enum with a variant for every possible node
type.
An example of walking a parsetree and examining the expected Node:
use postgres_parser::{parse_query, Node, join_name_list}; let parsetree = parse_query("DROP TABLE my_schema.my_table;"); match parsetree { Ok(mut nodes) => { let node = nodes.pop().unwrap(); // we know we only have 1 node here match node { Node::DropStmt(dropstmt) => { // dropstmt.object is a Vec<Node>, where each Node is a Node::List of // ultimately, Node::Value, where each value is a String for object in dropstmt.objects.unwrap() { // join_name_list() will figure out the hard part for us // this is a common pattern throughout Postgres' parsetree let name = join_name_list(&object).unwrap(); assert_eq!(name, "my_schema.my_table"); } } _ => panic!("unexpected node: {:#?}", node), } } // one possible error, for the first mal-formed SQL statement Err(e) => { panic!(e); } }
The sys
module
The sys
module is a 100% "bindgen"-generated module from Postgres' header files. In general,
it's not expected that users of this crate will interact with this module.
It is upon the items in this module that the rest of postgres-parser
is built. The module is
public for completeness only.
SqlStatementScanner
The SqlStatementScanner
is a simple type intended to work as an iterator over scanning and
parsing a single string of multiple SQL statements, one at a time.
This is particullary useful to report statement-level parse errors, as opposed to the parse_query()
function that simply reports one error for the entire string.
A quick example:
use postgres_parser::SqlStatementScanner; let mut scanner = SqlStatementScanner::new("SELECT 1;\nSELECT 2;").into_iter(); let first = scanner.next().expect("no first query"); assert_eq!(first.sql, "SELECT 1;\n"); // note trailing \n -- trailing whitespace after ';' is included assert!(first.payload.is_none()); assert!(first.parsetree.is_ok()); let second = scanner.next().expect("no second query"); assert_eq!(second.sql, "SELECT 2;"); assert!(second.payload.is_none()); assert!(second.parsetree.is_ok()); assert!(scanner.next().is_none());
Serde Support
All the parse tree Node structures supported are Deserialize, Serialize
, and as such, can be
directly used by any of the serde serializers, including serde_json.
use postgres_parser::parse_query; let as_json = serde_json::to_string_pretty(&parse_query("SELECT 1;")).expect("failed to convert to json"); println!("{}", as_json);
The above would output:
{"SelectStmt":{"targetList":[{"ResTarget":{"val":{"A_Const":{"val":{"int":1},"location":7}},"location":7}}],"op":"SETOP_NONE","all":false}}
Notes on Thread Safety
Postgres is, by design, not thread safe. Rust, on the other hand, is. As we're literally statically linking against the compiled Postgres code, this presents an interesting problem.
The solution postgres-parser
has taken is that the parse_query()
function (which is also
used by SqlStatementScanner
) is guarded under a Rust Mutex. As such, only one query can
be parsed at a time.
Re-exports
pub use nodes::Node; |
Modules
nodes | Generated types to represent a parse tree in a safe manner as returned from |
sys | Generated types and constants from Postgres' header files necessary to represent
a parse tree as raw "C" structures. Also contains various enum types used by
this module and the |
Structs
ScannedStatement | An individual SQL statement scanned from a larger set of SQL statements |
SqlStatementScanner | The |
SqlStatementScannerIterator | Iterator for the |
Enums
PgParserError | Represents various errors that can occur while parsing a SQL statement. |
Functions
join_name_list | A common pattern in Postgres' parse trees, when it needs to represent the name of a thing (a table, a field, a view, etc), especially when that name can be qualified, is to represent that name as a List of string Values. |
parse_query | Parse a string of delimited SQL statements. |