Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
The Easiest Rust Interface for Local LLMs
# For Mac (CPU and GPU), windows (CPU and CUDA), or linux (CPU and CUDA)
="*"
This will download and build llama.cpp. See build.md for other features and backends like mistral.rs.
use *;
let llm_client = llama_cpp
.mistral7b_instruct_v0_3 // Uses a preset model
.init // Downloads model from hugging face and starts the inference interface
.await?;
Several of the most common models are available as presets. Loading from local models is also fully supported. See models.md for more information.
An Interface for Deterministic Signals from Probabilistic LLM Vibes
Reasoning with Primitive Outcomes
A constraint enforced CoT process for reasoning. First, we get the LLM to 'justify' an answer in plain english. This allows the LLM to 'think' by outputting the stream of tokens required to come to an answer. Then we take that 'justification', and prompt the LLM to parse it for the answer. See the workflow for implementation details.
-
Currently supporting returning booleans, u32s, and strings from a list of options
-
Can be 'None' when ran with
return_optional_primitive()
// boolean outcome
let reason_request = llm_client.reason.boolean;
reason_request
.instructions
.set_content;
reason_request
.supporting_material
.set_content;
let res: bool = reason_request.return_primitive.await.unwrap;
assert_eq!;
// u32 outcome
let reason_request = llm_client.reason.integer;
reason_request.primitive.lower_bound.upper_bound;
reason_request
.instructions
.set_content;
reason_request
.supporting_material
.set_content;
// Can be None
let response: = reason_request.return_optional_primitive.await.unwrap;
assert!;
// string from a list of options outcome
let mut reason_request = llm_client.reason.exact_string;
reason_request
.instructions
.set_content;
reason_request
.supporting_material
.set_content;
reason_request
.primitive
.add_strings_to_allowed;
let response: String = reason_request.return_primitive.await.unwrap;
assert_eq!;
See the reason example for more
Decisions with N number of Votes Across a Temperature Gradient
Uses the same process as above N number of times where N is the number of times the process must be repeated to reach a consensus. We dynamically alter the temperature to ensure an accurate consensus. See the workflow for implementation details.
-
Supports primitives that implement the reasoning trait
-
The consensus vote count can be set with
best_of_n_votes()
-
By default
dynamic_temperture
is enabled, and each 'vote' increases across a gradient
// An integer decision request
let decision_request = llm_client.reason.integer.decision;
decision_request.best_of_n_votes;
decision_request
.instructions
.set_content;
let response = decision_request.return_primitive.await.unwrap;
assert_eq!;
See the decision example for more
Structured Outputs and NLP
-
Data extraction, summarization, and semantic splitting on text.
-
Currently implemented NLP workflows are url extraction.
Basic Primitives
A generation where the output is constrained to one of the defined primitive types. See the currently implemented primitive types. These are used in other workflows, but only some are used as the output for specific workflows like reason and decision.
- These are fairly easy to add, so feel free to open an issue if you'd like one added.
See the basic_primitive example
API LLMs
-
Basic support for API based LLMs. Currently, anthropic, openai, perplexity
-
Perplexity does not currently return documents, but it does create it's responses from live data
let llm_client = perplexity.sonar_large.init;
let mut basic_completion = llm_client.basic_completion;
basic_completion
.prompt
.add_user_message
.set_content;
let response = basic_completion.run.await?;
See the basic_completion example
Configuring Requests
-
All requests and workflows implement the
RequestConfigTrait
which gives access to the parameters sent to the LLM -
These settings are normalized across both local and API requests
let llm_client = llama_cpp
.available_vram
.mistral7b_instruct_v0_3
.init
.await?;
let basic_completion = llm_client.basic_completion;
basic_completion
.temperature
.frequency_penalty
.max_tokens;
More Resouces
See the main repo for more documentation.