icu_segmenter
Segment strings by lines, graphemes, words, and sentences.
This module is published as its own crate (icu_segmenter
)
and as part of the icu
crate. See the latter for more details on the ICU4X project.
This module contains segmenter implementation for the following rules.
- Line segmenter that is compatible with Unicode Standard Annex #14, Unicode Line
Breaking Algorithm, with options to tailor line-breaking behavior for CSS
line-break
andword-break
properties. - Grapheme cluster segmenter, word segmenter, and sentence segmenter that are compatible with Unicode Standard Annex #29, Unicode Text Segmentation.
Examples
Line Break
Find line break opportunities:
use LineSegmenter;
let segmenter = new_auto;
let breakpoints: = segmenter
.segment_str
.collect;
assert_eq!;
See [LineSegmenter
] for more examples.
Grapheme Cluster Break
Find all grapheme cluster boundaries:
use GraphemeClusterSegmenter;
let segmenter = new;
let breakpoints: = segmenter
.segment_str
.collect;
assert_eq!;
See [GraphemeClusterSegmenter
] for more examples.
Word Break
Find all word boundaries:
use WordSegmenter;
let segmenter = new_auto;
let breakpoints: = segmenter
.segment_str
.collect;
assert_eq!;
See [WordSegmenter
] for more examples.
Sentence Break
Segment the string into sentences:
use SentenceSegmenter;
let segmenter = new;
let breakpoints: = segmenter
.segment_str
.collect;
assert_eq!;
See [SentenceSegmenter
] for more examples.
More Information
For more information on development, authorship, contributing etc. please visit ICU4X home page
.