Expand description
Academic paper management and metadata retrieval library.
learner
is a flexible library for managing academic papers that emphasizes user choice
and interoperability with existing tools. Unlike monolithic paper management solutions,
it focuses on providing robust metadata handling and storage while allowing users to
choose their own tools for viewing, annotating, and organizing papers.
§Core Features
-
Multi-source Retrieval:
- arXiv (supporting both new-style “2301.07041” and old-style “math.AG/0601001” identifiers)
- IACR (International Association for Cryptologic Research)
- DOI (Digital Object Identifier)
- Extensible retriever system for adding new sources
-
Flexible Storage and Organization:
- Configurable document storage locations
- User-controlled directory structure
- Separation of metadata and document storage
- Integration with existing file organization
-
Rich Metadata Management:
- Comprehensive paper metadata
- Author information with affiliations
- Publication dates and version tracking
- Abstract text and citations
- Custom metadata fields
-
Database Operations:
- Type-safe query building
- Full-text search capabilities
- Composable operations using command pattern
- Robust error handling
§Getting Started
use learner::{
database::{Add, OrderField, Query},
paper::Paper,
prelude::*,
Learner,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize with default configuration
let mut learner = Learner::builder().build().await?;
// Fetch papers from any supported source
let arxiv_paper = learner.retriever.get_paper("2301.07041").await?;
let doi_paper = learner.retriever.get_paper("10.1145/1327452.1327492").await?;
println!("Retrieved: {}", arxiv_paper.title);
// Store paper with its PDF
Add::complete(&arxiv_paper).execute(&mut learner.database).await?;
// Search the database
let papers = Query::text("quantum computing")
.order_by(OrderField::PublicationDate)
.descending()
.execute(&mut learner.database)
.await?;
// Find papers by author
let author_papers = Query::by_author("Alice Researcher").execute(&mut learner.database).await?;
Ok(())
}
§Module Organization
The library is organized into focused, composable modules:
-
paper
: Core paper types and metadata management- Paper struct with comprehensive metadata
- Multi-source identifier handling
- Author information management
-
database
: Storage and querying functionality- Type-safe query building
- Full-text search implementation
- Document storage management
- Command pattern operations
-
[
clients
]: API clients for paper sources- Source-specific implementations
- Response parsing and validation
- Error handling and retry logic
-
retriever
: Configurable paper retrieval system- Automatic source detection
- XML and JSON response handling
- Custom field mapping
-
prelude
: Common imports for ergonomic use- Essential traits
- Common type definitions
- Error types
§Design Philosophy
learner
is built on several key principles:
-
User Control: Users should have full control over document storage and organization, allowing integration with their existing workflows and tools.
-
Separation of Concerns: Clear separation between metadata management and document storage, enabling flexible integration with external tools.
-
Type Safety: Database operations and API interactions are designed to be type-safe and verified at compile time when possible.
-
Extensibility: The command pattern for database operations and configurable retrievers make the system easy to extend.
-
Error Handling: Clear error types and propagation make it easy to handle and debug issues at every level.
§Configuration
The library can be configured through TOML files or programmatically:
# ~/.learner/config.toml
database_path = "~/.local/share/learner/papers.db"
storage_path = "~/Documents/papers"
retrievers_path = "~/.learner/retrievers"
// Programmatic configuration
let config = Config::default()
.with_storage_path(&PathBuf::from("~/papers"))
.with_database_path(&PathBuf::from("~/.papers.db"));
let learner = Learner::with_config(config).await?;
Modules§
- Database management and operations for academic paper metadata.
- Error types for the learner library.
- Text formatting utilities for standardizing document titles and filenames.
- Client implementation for interacting with Ollama LLMs.
- Core paper management and metadata types for academic paper handling.
- PDF parsing and content extraction functionality.
- Common traits and types for ergonomic imports.
- Paper retrieval and metadata extraction framework.
Structs§
- Core configuration for the library.
- Main entry point for the library.
- Builder for creating configured Learner instances.