Expand description
Bindings to the llama.cpp library.
As llama.cpp is a very fast moving target, this crate does not attempt to create a stable API with all the rust idioms. Instead it provided safe wrappers around nearly direct bindings to llama.cpp. This makes it easier to keep up with the changes in llama.cpp, but does mean that the API is not as nice as it could be.
§Examples
§Feature Flags
cuda
enables CUDA gpu support.sampler
adds the [context::sample::sampler
] struct for a more rusty way of sampling.
Modules§
- context
- Safe wrapper around
llama_context
. - llama_
backend - Representation of an initialized llama backend
- llama_
batch - Safe wrapper around
llama_batch
. - model
- A safe wrapper around
llama_model
. - sampling
- Safe wrapper around
llama_sampler
. - timing
- Safe wrapper around
llama_timings
. - token
- Safe wrappers around
llama_token_data
andllama_token_data_array
. - token_
type - Utilities for working with
llama_token_type
values.
Enums§
- Apply
Chat Template Error - Failed to apply model chat template.
- Chat
Template Error - There was an error while getting the chat template from a model.
- Decode
Error - Failed to decode a batch.
- Embeddings
Error - When embedding related functions fail
- Encode
Error - Failed to decode a batch.
- LLama
CppError - All errors that can occur in the llama-cpp crate.
- Llama
Context Load Error - Failed to Load context
- Llama
Lora Adapter Init Error - An error that can occur when loading a model.
- Llama
Lora Adapter Remove Error - An error that can occur when loading a model.
- Llama
Lora Adapter SetError - An error that can occur when loading a model.
- Llama
Model Load Error - An error that can occur when loading a model.
- NewLlama
Chat Message Error - Failed to apply model chat template.
- String
ToToken Error - Failed to convert a string to a token sequence.
- Token
ToString Error - An error that can occur when converting a token to a string.
Functions§
- ggml_
time_ us - Get the time in microseconds according to ggml
- llama_
supports_ mlock - checks if mlock is supported
- llama_
time_ us - get the time (in microseconds) according to llama.cpp
- max_
devices - get the max number of devices according to llama.cpp (this is generally cuda devices)
- mlock_
supported - is memory locking supported according to llama.cpp
- mmap_
supported - is memory mapping supported according to llama.cpp
Type Aliases§
- Result
- A failable result from a llama.cpp function.