yore 1.1.0

Rust library for decoding/encoding character sets according to OEM code pages
Documentation

Yore

A Rust library for decoding and encoding character sets based on OEM code pages.

yore at crates.io yore at docs.rs

Features

  • Fast performance *
  • Minimal memory usage with Cow and shrink_to_fit
  • Easy-to-use API
  • Broad range of supported code pages
  • Handles code pages with redefined ASCII characters (<0x80), such as '٪' in CP864

Usage

Add yore to your Cargo.toml file.

[dependencies]
yore = "1.1.0"

Examples

Using a specific code page

use yore::code_pages::{CP857, CP850};
use yore::{DecodeError, EncodeError};

// Vec contains ASCII "text"
let bytes = vec![116, 101, 120, 116];
// Vec contains ASCII "text " and codepoint 231
let bytes_undefined = vec![116, 101, 120, 116, 32, 231]; 

// Notice that decoding CP850 can't fail because it is completely defined
assert_eq!(CP850.decode(&bytes), "text");

// However, CP857 can fail
assert_eq!(CP857.decode(&bytes).unwrap(), "text");

// "text " + codepoint 231 
assert!(matches!(CP857.decode(&bytes_undefined), DecodeError));

// Lossy decoding won't fail due to fallback
assert_eq!(CP857.decode_lossy(&bytes_undefined), "text �");

// Encoding
assert_eq!(CP850.encode("text").unwrap(), bytes);
assert!(matches!(CP850.encode("text 🦀"), EncodeError));
assert_eq!(CP850.encode_lossy("text 🦀", 231), bytes_undefined);

Using a trait object

use yore::CodePage;
fn do_something(code_page: &dyn CodePage, bytes: &[u8]) {
    println!("{}", code_page.decode(bytes).unwrap());
}

Supported code pages

Identifier Name Description
437 ibm437 OEM United States
737 ibm737 OEM Greek (formerly 437G); Greek (DOS)
775 ibm775 OEM Baltic; Baltic (DOS)
850 ibm850 OEM Multilingual Latin 1; Western European (DOS)
852 ibm852 OEM Latin 2; Central European (DOS)
855 ibm855 OEM Cyrillic (primarily Russian)
857 ibm857 OEM Turkish; Turkish (DOS)
860 ibm860 OEM Portuguese; Portuguese (DOS)
861 ibm861 OEM Icelandic; Icelandic (DOS)
862 dos-862 OEM Hebrew; Hebrew (DOS)
863 ibm863 OEM French Canadian; French Canadian (DOS)
864 ibm864 OEM Arabic; Arabic (864)
865 ibm865 OEM Nordic; Nordic (DOS)
866 cp866 OEM Russian; Cyrillic (DOS)
869 ibm869 OEM Modern Greek; Greek, Modern (DOS)
874 windows-874 Thai (Windows)
910 ibm910 IBM-PC APL2
1250 windows-1250 ANSI Central European; Central European (Windows)
1251 windows-1251 ANSI Cyrillic; Cyrillic (Windows)
1252 windows-1252 ANSI Latin 1; Western European (Windows)
1253 windows-1253 ANSI Greek; Greek (Windows)
1254 windows-1254 ANSI Turkish; Turkish (Windows)
1255 windows-1255 ANSI Hebrew; Hebrew (Windows)
1256 windows-1256 ANSI Arabic; Arabic (Windows)
1257 windows-1257 ANSI Baltic; Baltic (Windows)
1258 windows-1258 ANSI/OEM Vietnamese; Vietnamese (Windows)

* Benchmarks

encoding_rs supports only a few of the encodings that oem_cp and yore support. Additionally, encoding_rs focuses on streaming use cases.

Refer to the bench crate for more details.