pub struct U16Str { /* private fields */ }
Expand description
16-bit wide string slice with undefined encoding.
U16Str
is to U16String
as OsStr
is to
OsString
.
U16Str
are string slices that do not have a defined encoding. While it is sometimes
assumed that they contain possibly invalid or ill-formed UTF-16 data, they may be used for
any wide encoded string. This is because U16Str
is intended to be used with FFI
functions, where proper encoding cannot be guaranteed. If you need string slices that are
always valid UTF-16 strings, use Utf16Str
instead.
Because U16Str
does not have a defined encoding, no restrictions are placed on mutating
or indexing the slice. This means that even if the string contained properly encoded UTF-16
or other encoding data, mutationing or indexing may result in malformed data. Convert to a
Utf16Str
if retaining proper UTF-16 encoding is desired.
§FFI considerations
U16Str
is not aware of nul values and may or may not be nul-terminated. It is intended
to be used with FFI functions that directly use string length, where the strings are known
to have proper nul-termination already, or where strings are merely being passed through
without modification.
U16CStr
should be used instead if nul-aware strings are required.
§Examples
The easiest way to use U16Str
outside of FFI is with the u16str!
macro to convert string literals into UTF-16 string slices at compile time:
use widestring::u16str;
let hello = u16str!("Hello, world!");
You can also convert any u16
slice directly:
use widestring::{u16str, U16Str};
let sparkle_heart = [0xd83d, 0xdc96];
let sparkle_heart = U16Str::from_slice(&sparkle_heart);
assert_eq!(u16str!("💖"), sparkle_heart);
// This unpaired UTf-16 surrogate is invalid UTF-16, but is perfectly valid in U16Str
let malformed_utf16 = [0x0, 0xd83d]; // Note that nul values are also valid an untouched
let s = U16Str::from_slice(&malformed_utf16);
assert_eq!(s.len(), 2);
When working with a FFI, it is useful to create a U16Str
from a pointer and a length:
use widestring::{u16str, U16Str};
let sparkle_heart = [0xd83d, 0xdc96];
let sparkle_heart = unsafe {
U16Str::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len())
};
assert_eq!(u16str!("💖"), sparkle_heart);
Implementations§
Source§impl U16Str
impl U16Str
Sourcepub unsafe fn from_ptr<'a>(p: *const u16, len: usize) -> &'a U16Str
pub unsafe fn from_ptr<'a>(p: *const u16, len: usize) -> &'a U16Str
Constructs a wide string slice from a pointer and a length.
The len
argument is the number of elements, not the number of bytes. No
copying or allocation is performed, the resulting value is a direct reference to the
pointer bytes.
§Safety
This function is unsafe as there is no guarantee that the given pointer is valid for
len
elements.
In addition, the data must meet the safety conditions of
std::slice::from_raw_parts. In particular, the returned string reference must not
be mutated for the duration of lifetime 'a
, except inside an
UnsafeCell
.
§Panics
This function panics if p
is null.
§Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
Sourcepub unsafe fn from_ptr_mut<'a>(p: *mut u16, len: usize) -> &'a mut U16Str
pub unsafe fn from_ptr_mut<'a>(p: *mut u16, len: usize) -> &'a mut U16Str
Constructs a mutable wide string slice from a mutable pointer and a length.
The len
argument is the number of elements, not the number of bytes. No
copying or allocation is performed, the resulting value is a direct reference to the
pointer bytes.
§Safety
This function is unsafe as there is no guarantee that the given pointer is valid for
len
elements.
In addition, the data must meet the safety conditions of std::slice::from_raw_parts_mut.
§Panics
This function panics if p
is null.
§Caveat
The lifetime for the returned string is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the string, or by explicit annotation.
Sourcepub const fn from_slice(slice: &[u16]) -> &U16Str
pub const fn from_slice(slice: &[u16]) -> &U16Str
Constructs a wide string slice from a slice of character data.
No checks are performed on the slice. It may be of any encoding and may contain invalid or malformed data for that encoding.
Sourcepub fn from_slice_mut(slice: &mut [u16]) -> &mut U16Str
pub fn from_slice_mut(slice: &mut [u16]) -> &mut U16Str
Constructs a mutable wide string slice from a mutable slice of character data.
No checks are performed on the slice. It may be of any encoding and may contain invalid or malformed data for that encoding.
Sourcepub fn to_ustring(&self) -> U16String
pub fn to_ustring(&self) -> U16String
Copies the string reference to a new owned wide string.
Sourcepub const fn as_slice(&self) -> &[u16]
pub const fn as_slice(&self) -> &[u16]
Converts to a slice of the underlying elements of the string.
Sourcepub fn as_mut_slice(&mut self) -> &mut [u16]
pub fn as_mut_slice(&mut self) -> &mut [u16]
Converts to a mutable slice of the underlying elements of the string.
Sourcepub const fn as_ptr(&self) -> *const u16
pub const fn as_ptr(&self) -> *const u16
Returns a raw pointer to the string.
The caller must ensure that the string outlives the pointer this function returns, or else it will end up pointing to garbage.
The caller must also ensure that the memory the pointer (non-transitively) points to
is never written to (except inside an UnsafeCell
) using this pointer or any
pointer derived from it. If you need to mutate the contents of the string, use
as_mut_ptr
.
Modifying the container referenced by this string may cause its buffer to be reallocated, which would also make any pointers to it invalid.
Sourcepub fn as_mut_ptr(&mut self) -> *mut u16
pub fn as_mut_ptr(&mut self) -> *mut u16
Returns an unsafe mutable raw pointer to the string.
The caller must ensure that the string outlives the pointer this function returns, or else it will end up pointing to garbage.
Modifying the container referenced by this string may cause its buffer to be reallocated, which would also make any pointers to it invalid.
Sourcepub fn as_ptr_range(&self) -> Range<*const u16>
pub fn as_ptr_range(&self) -> Range<*const u16>
Returns the two raw pointers spanning the string slice.
The returned range is half-open, which means that the end pointer points one past the last element of the slice. This way, an empty slice is represented by two equal pointers, and the difference between the two pointers represents the size of the slice.
See as_ptr
for warnings on using these pointers. The end pointer
requires extra caution, as it does not point to a valid element in the slice.
This function is useful for interacting with foreign interfaces which use two pointers to refer to a range of elements in memory, as is common in C++.
Sourcepub fn as_mut_ptr_range(&mut self) -> Range<*mut u16>
pub fn as_mut_ptr_range(&mut self) -> Range<*mut u16>
Returns the two unsafe mutable pointers spanning the string slice.
The returned range is half-open, which means that the end pointer points one past the last element of the slice. This way, an empty slice is represented by two equal pointers, and the difference between the two pointers represents the size of the slice.
See as_mut_ptr
for warnings on using these pointers. The end
pointer requires extra caution, as it does not point to a valid element in the
slice.
This function is useful for interacting with foreign interfaces which use two pointers to refer to a range of elements in memory, as is common in C++.
Sourcepub const fn len(&self) -> usize
pub const fn len(&self) -> usize
Returns the length of the string as number of elements (not number of bytes).
Sourcepub fn into_ustring(self: Box<U16Str>) -> U16String
pub fn into_ustring(self: Box<U16Str>) -> U16String
Converts a boxed wide string slice into an owned wide string without copying or allocating.
Sourcepub fn display(&self) -> Display<'_, U16Str>
pub fn display(&self) -> Display<'_, U16Str>
Returns an object that implements Display
for printing
strings that may contain non-Unicode data.
This method assumes this string is intended to be UTF-16 encoding, but handles
ill-formed UTF-16 sequences lossily. The returned struct implements
the Display
trait in a way that decoding the string is lossy
UTF-16 decoding but no heap allocations are performed, such as by
to_string_lossy
.
By default, invalid Unicode data is replaced with
U+FFFD REPLACEMENT CHARACTER
(�). If you wish
to simply skip any invalid Uncode data and forego the replacement, you may use the
alternate formatting with {:#}
.
§Examples
Basic usage:
use widestring::U16Str;
// 𝄞mus<invalid>ic<invalid>
let s = U16Str::from_slice(&[
0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
]);
assert_eq!(format!("{}", s.display()),
"𝄞mus�ic�"
);
Using alternate formatting style to skip invalid values entirely:
use widestring::U16Str;
// 𝄞mus<invalid>ic<invalid>
let s = U16Str::from_slice(&[
0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
]);
assert_eq!(format!("{:#}", s.display()),
"𝄞music"
);
Sourcepub fn get<I>(&self, i: I) -> Option<&U16Str>
pub fn get<I>(&self, i: I) -> Option<&U16Str>
Returns a subslice of the string.
This is the non-panicking alternative to indexing the string. Returns None
whenever equivalent indexing operation would panic.
Sourcepub fn get_mut<I>(&mut self, i: I) -> Option<&mut U16Str>
pub fn get_mut<I>(&mut self, i: I) -> Option<&mut U16Str>
Returns a mutable subslice of the string.
This is the non-panicking alternative to indexing the string. Returns None
whenever equivalent indexing operation would panic.
Sourcepub unsafe fn get_unchecked<I>(&self, i: I) -> &U16Str
pub unsafe fn get_unchecked<I>(&self, i: I) -> &U16Str
Returns an unchecked subslice of the string.
This is the unchecked alternative to indexing the string.
§Safety
Callers of this function are responsible that these preconditions are satisfied:
- The starting index must not exceed the ending index;
- Indexes must be within bounds of the original slice.
Failing that, the returned string slice may reference invalid memory.
Sourcepub unsafe fn get_unchecked_mut<I>(&mut self, i: I) -> &mut U16Str
pub unsafe fn get_unchecked_mut<I>(&mut self, i: I) -> &mut U16Str
Returns aa mutable, unchecked subslice of the string.
This is the unchecked alternative to indexing the string.
§Safety
Callers of this function are responsible that these preconditions are satisfied:
- The starting index must not exceed the ending index;
- Indexes must be within bounds of the original slice.
Failing that, the returned string slice may reference invalid memory.
Sourcepub fn split_at(&self, mid: usize) -> (&U16Str, &U16Str)
pub fn split_at(&self, mid: usize) -> (&U16Str, &U16Str)
Divide one string slice into two at an index.
The argument, mid
, should be an offset from the start of the string.
The two slices returned go from the start of the string slice to mid
, and from
mid
to the end of the string slice.
To get mutable string slices instead, see the split_at_mut
method.
Sourcepub fn split_at_mut(&mut self, mid: usize) -> (&mut U16Str, &mut U16Str)
pub fn split_at_mut(&mut self, mid: usize) -> (&mut U16Str, &mut U16Str)
Divide one mutable string slice into two at an index.
The argument, mid
, should be an offset from the start of the string.
The two slices returned go from the start of the string slice to mid
, and from
mid
to the end of the string slice.
To get immutable string slices instead, see the split_at
method.
Source§impl U16Str
impl U16Str
Sourcepub fn to_os_string(&self) -> OsString
pub fn to_os_string(&self) -> OsString
Decodes a string reference to an owned OsString
.
This makes a string copy of the U16Str
. Since U16Str
makes no guarantees that its
encoding is UTF-16 or that the data valid UTF-16, there is no guarantee that the resulting
OsString
will have a valid underlying encoding either.
Note that the encoding of OsString
is platform-dependent, so on
some platforms this may make an encoding conversions, while on other platforms (such as
windows) no changes to the string will be made.
§Examples
use widestring::U16String;
use std::ffi::OsString;
let s = "MyString";
// Create a wide string from the string
let wstr = U16String::from_str(s);
// Create an OsString from the wide string
let osstr = wstr.to_os_string();
assert_eq!(osstr, OsString::from(s));
Sourcepub fn to_string(&self) -> Result<String, Utf16Error>
pub fn to_string(&self) -> Result<String, Utf16Error>
Decodes this string to a String
if it contains valid UTF-16 data.
This method assumes this string is encoded as UTF-16 and attempts to decode it as such.
§Failures
Returns an error if the string contains any invalid UTF-16 data.
§Examples
use widestring::U16String;
let s = "MyString";
// Create a wide string from the string
let wstr = U16String::from_str(s);
// Create a regular string from the wide string
let s2 = wstr.to_string().unwrap();
assert_eq!(s2, s);
Sourcepub fn to_string_lossy(&self) -> String
pub fn to_string_lossy(&self) -> String
Decodes the string to a String
even if it is invalid UTF-16 data.
This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any
invalid sequences are replaced with
U+FFFD REPLACEMENT CHARACTER
, which looks like this:
�
§Examples
use widestring::U16String;
let s = "MyString";
// Create a wide string from the string
let wstr = U16String::from_str(s);
// Create a regular string from the wide string
let lossy = wstr.to_string_lossy();
assert_eq!(lossy, s);
Sourcepub fn chars(&self) -> CharsUtf16<'_>
pub fn chars(&self) -> CharsUtf16<'_>
Returns an iterator over the char
s of a string slice.
As this string has no defined encoding, this method assumes the string is UTF-16. Since it
may consist of invalid UTF-16, the iterator returned by this method
is an iterator over Result<char, DecodeUtf16Error>
instead of char
s
directly. If you would like a lossy iterator over chars
s directly, instead
use chars_lossy
.
It’s important to remember that char
represents a Unicode Scalar Value, and
may not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be
what you actually want. That functionality is not provided by by this crate.
Sourcepub fn chars_lossy(&self) -> CharsLossyUtf16<'_>
pub fn chars_lossy(&self) -> CharsLossyUtf16<'_>
Returns a lossy iterator over the char
s of a string slice.
As this string has no defined encoding, this method assumes the string is UTF-16. Since it
may consist of invalid UTF-16, the iterator returned by this method will replace unpaired
surrogates with
U+FFFD REPLACEMENT CHARACTER
(�). This is a lossy
version of chars
.
It’s important to remember that char
represents a Unicode Scalar Value, and
may not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be
what you actually want. That functionality is not provided by by this crate.
Sourcepub fn char_indices(&self) -> CharIndicesUtf16<'_>
pub fn char_indices(&self) -> CharIndicesUtf16<'_>
Returns an iterator over the chars of a string slice, and their positions.
As this string has no defined encoding, this method assumes the string is UTF-16. Since it
may consist of invalid UTF-16, the iterator returned by this method is an iterator over
Result<char, DecodeUtf16Error>
as well as their positions, instead of
char
s directly. If you would like a lossy indices iterator over
chars
s directly, instead use
char_indices_lossy
.
The iterator yields tuples. The position is first, the char
is second.
Sourcepub fn char_indices_lossy(&self) -> CharIndicesLossyUtf16<'_>
pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf16<'_>
Returns a lossy iterator over the chars of a string slice, and their positions.
As this string slice may consist of invalid UTF-16, the iterator returned by this method
will replace unpaired surrogates with
U+FFFD REPLACEMENT CHARACTER
(�), as well as the
positions of all characters. This is a lossy version of
char_indices
.
The iterator yields tuples. The position is first, the char
is second.
Trait Implementations§
Source§impl AddAssign<&U16Str> for U16String
impl AddAssign<&U16Str> for U16String
Source§fn add_assign(&mut self, rhs: &U16Str)
fn add_assign(&mut self, rhs: &U16Str)
+=
operation. Read moreSource§impl BorrowMut<U16Str> for U16String
impl BorrowMut<U16Str> for U16String
Source§fn borrow_mut(&mut self) -> &mut U16Str
fn borrow_mut(&mut self) -> &mut U16Str
Source§impl<'a> Extend<&'a U16Str> for U16String
impl<'a> Extend<&'a U16Str> for U16String
Source§fn extend<T>(&mut self, iter: T)where
T: IntoIterator<Item = &'a U16Str>,
fn extend<T>(&mut self, iter: T)where
T: IntoIterator<Item = &'a U16Str>,
Source§fn extend_one(&mut self, item: A)
fn extend_one(&mut self, item: A)
extend_one
)Source§fn extend_reserve(&mut self, additional: usize)
fn extend_reserve(&mut self, additional: usize)
extend_one
)