outlines_core/
lib.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
//! # Outlines_core
//!
//! `outlines_core` crate provides a convenient way to:
//!
//! - build regular expressions from JSON schemas
//!
//! - construct an [`index::Index`] object by combining a [`vocabulary::Vocabulary`] and regular
//!   expression to efficiently map tokens from a given `Vocabulary` to state transitions in a
//!   finite-state automation
//!
//! ## `json_schema`
//!
//! [`json_schema`] module provides interfaces to generate a regular expression based on a given JSON schema, depending on its type:
//! - [`json_schema::regex_from_str`]
//! - [`json_schema::regex_from_value`]
//!
//! Whitespace pattern could be customized, otherwise the default [`json_schema::WHITESPACE`] pattern is used.
//!
//! Note, that not all the features of JSON schema are supported for regex generation: [Supported Features](json_schema#supported-features)
//!
//! ## `Index`
//!
//! Once [`index::Index`] is built, it can be used to evaluate or validate token sequences.
//!
//! ### Complexity and construction cost
//!
//! `Index` can accommodate large vocabularies and complex regular expressions. However, its size **may** grow
//! significantly with the complexity of the input, as well as time and computational resources.
//!
//! ## Python bindings
//!
//! Additionally, crate provides interfaces to integrate the crate's functionality with Python.
//!
//! ## Support
//!
//! `Outlines_core` is primarily used in structured text generation project [`outlines`](https://github.com/dottxt-ai/outlines),
//! if you need support, consider reaching out to its maintainers, you can also open an issue or start a discussion
//! on [github](https://github.com/dottxt-ai/outlines-core)
//!
//! ## Example
//!
//! Basic example of how it all fits together.
//!
//! ```rust
//! # use outlines_core::Error;
//! use outlines_core::prelude::*;
//!
//! # fn main() -> Result<(), Error> {
//! // Define a JSON schema
//! let schema = r#"{
//!     "type": "object",
//!     "properties": {
//!         "name": { "type": "string" },
//!         "age": { "type": "integer" }
//!     },
//!     "required": ["name", "age"]
//! }"#;
//!
//! // Generate a regular expression from it
//! let regex = json_schema::regex_from_str(&schema, None)?;
//! println!("Generated regex: {}", regex);
//!
//! // Create `Vocabulary` from pretrained large language model (but manually is also possible)
//! let vocabulary = Vocabulary::from_pretrained("openai-community/gpt2", None)?;
//!
//! // Create new `Index` from regex and a given `Vocabulary`
//! let index = Index::new(&regex, &vocabulary)?;
//!
//! let initial_state = index.initial_state();
//! println!("Is initial state {} a final state? {}", initial_state, index.is_final_state(&initial_state));
//!
//! let allowed_tokens = index.allowed_tokens(&initial_state).expect("Some allowed tokens");
//! println!("Allowed tokens at initial state are {:?}", allowed_tokens);
//!
//! let token_id = allowed_tokens.first().expect("First token");
//! println!("Next state for the token_id {} is {:?}", token_id, index.next_state(&initial_state, token_id));
//! println!("Final states are {:?}", index.final_states());
//! println!("Index has exactly {} transitions", index.transitions().len());
//! # Ok(())
//! }
//! ```

pub mod error;
pub mod index;
pub mod json_schema;
pub mod prelude;
pub mod primitives;
pub mod vocabulary;

pub use error::{Error, Result};

#[cfg(feature = "python-bindings")]
mod python_bindings;