Crate llama_cpp_2

Expand description

Bindings to the llama.cpp library.

As llama.cpp is a very fast moving target, this crate does not attempt to create a stable API with all the rust idioms. Instead it provided safe wrappers around nearly direct bindings to llama.cpp. This makes it easier to keep up with the changes in llama.cpp, but does mean that the API is not as nice as it could be.

§Examples

simple

§Feature Flags

cuda enables CUDA gpu support.
sampler adds the [context::sample::sampler] struct for a more rusty way of sampling.

Modules§

context: Safe wrapper around llama_context.
llama_backend: Representation of an initialized llama backend
llama_batch: Safe wrapper around llama_batch.
model: A safe wrapper around llama_model.
sampling: Safe wrapper around llama_sampler.
timing: Safe wrapper around llama_timings.
token: Safe wrappers around llama_token_data and llama_token_data_array.
token_type: Utilities for working with llama_token_type values.

Enums§

ApplyChatTemplateError: Failed to apply model chat template.
ChatTemplateError: There was an error while getting the chat template from a model.
DecodeError: Failed to decode a batch.
EmbeddingsError: When embedding related functions fail
EncodeError: Failed to decode a batch.
LLamaCppError: All errors that can occur in the llama-cpp crate.
LlamaContextLoadError: Failed to Load context
LlamaLoraAdapterInitError: An error that can occur when loading a model.
LlamaLoraAdapterRemoveError: An error that can occur when loading a model.
LlamaLoraAdapterSetError: An error that can occur when loading a model.
LlamaModelLoadError: An error that can occur when loading a model.
NewLlamaChatMessageError: Failed to apply model chat template.
StringToTokenError: Failed to convert a string to a token sequence.
TokenToStringError: An error that can occur when converting a token to a string.

Functions§

ggml_time_us: Get the time in microseconds according to ggml
llama_supports_mlock: checks if mlock is supported
llama_time_us: get the time (in microseconds) according to llama.cpp
max_devices: get the max number of devices according to llama.cpp (this is generally cuda devices)
mlock_supported: is memory locking supported according to llama.cpp
mmap_supported: is memory mapping supported according to llama.cpp

Type Aliases§

Result: A failable result from a llama.cpp function.