pub trait OutlierDetection<T>where
T: GeoFloat,{
// Required methods
fn outliers(&self, k_neighbours: usize) -> Vec<T>;
fn prepared_detector(&self) -> PreparedDetector<'_, T>;
fn generate_ensemble(&self, bounds: RangeInclusive<usize>) -> Vec<Vec<T>>;
fn ensemble_min(&self, bounds: RangeInclusive<usize>) -> Vec<T>;
fn ensemble_max(&self, bounds: RangeInclusive<usize>) -> Vec<T>;
}
Expand description
Calculate the Local Outlier Factor of a set of points
Based on: Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM Int. Conf. on Management of Data, pages 93-104. doi: 10.1145/335191.335388
LOF is an unsupervised algorithm that uses local data for anomaly detection.
Outlier vs inlier classification is highly dependent on the shape of the data. LOF values <= 1 can generally be considered inliers, but e.g. a highly concentrated, uniform dataset could result in points with a LOF of 1.05 being outliers. LOF scores should thus be evaluated in the context of the dataset as a whole in order to classify outliers.
If you wish to run multiple outlier detection processes with differing neighbour counts in order
to build up data for more robust detection (see p. 100-1 above), you can use the OutlierDetection::prepared_detector
method, which retains
the spatial index and point set between runs for greater efficiency. The OutlierDetection::generate_ensemble
method
will efficiently run the LOF algorithm over a contiguous range of neighbour inputs,
allowing aggregations to be carried out over the resulting data.
Required Methods§
Sourcefn outliers(&self, k_neighbours: usize) -> Vec<T>
fn outliers(&self, k_neighbours: usize) -> Vec<T>
The LOF algorithm. k_neighbours
specifies the number of neighbours to use for local outlier
classification. The paper linked above (see p. 100) suggests a k_neighbours
value of 10 - 20
as a lower bound for “real-world”
data.
§Note on Erroneous Input
If k_neighbours
>= points in the set, or k_neighbours
< 1, all input points will be returned with an LOF score of 1.
If there are at least k_neighbours
duplicate points of an input point, LOF scores can be ∞
or NaN
.
It is thus advisable to deduplicate or otherwise ensure the uniqueness of the input points.
§Note on Returned Points
Outlier scores are always returned corresponding to input point order
§Examples
§MultiPoint
use approx::assert_relative_eq;
use geo::OutlierDetection;
use geo::{point, MultiPoint};
let v = vec![
point!(x: 0.0, y: 0.0),
point!(x: 0.0, y: 1.0),
point!(x: 3.0, y: 0.0),
point!(x: 1.0, y: 1.0),
];
let lofscores = v.outliers(2);
// The third point is an outlier, resulting in a large LOF score
assert_relative_eq!(lofscores[2], 3.0);
// The last point is an inlier, resulting in a small LOF score
assert_relative_eq!(lofscores[3], 1.0);
§Computing indices, sorting by LOF score
use geo::OutlierDetection;
use geo::{point, MultiPoint};
// these points contain 4 strong outliers
let v = vec![
point!(x: 0.16, y: 0.14),
point!(x: 0.15, y: 0.33),
point!(x: 0.37, y: 0.25),
point!(x: 0.3 , y: 0.4),
point!(x: 0.3 , y: 0.1),
point!(x: 0.3 , y: 0.2),
point!(x: 1.3 , y: 2.3),
point!(x: 1.7 , y: 0.2),
point!(x: 0.7 , y: -0.9),
point!(x: 0.21, y: 2.45),
point!(x: 0.8 , y: 0.7),
point!(x: 0.9 , y: 0.7),
point!(x: 0.8 , y: 0.6),
point!(x: 0.73, y: 0.65),
point!(x: 0.9 , y: 0.6),
point!(x: 1.0, y: 0.6),
point!(x: 1.0, y: 0.7),
point!(x: 0.25, y: 0.29),
point!(x: 0.2 , y: 0.2),
];
let lofs = &mut v.outliers(3);
let mut idx_lofs: Vec<(usize, f64)> = lofs
.iter()
.enumerate()
.map(|(idx, score)| (idx, *score))
.collect();
// sort by LOF score
idx_lofs.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
// most likely outliers first
idx_lofs.reverse();
// four outliers, LOF scores way above 10
idx_lofs
.iter()
.take(4)
.for_each(|score| assert!(score.1 > 10.0));
Sourcefn prepared_detector(&self) -> PreparedDetector<'_, T>
fn prepared_detector(&self) -> PreparedDetector<'_, T>
Create a prepared outlier detector allowing multiple runs to retain the spatial index in use.
A PreparedDetector
can efficiently recompute outliers with different k_neigbhours
values.
Sourcefn generate_ensemble(&self, bounds: RangeInclusive<usize>) -> Vec<Vec<T>>
fn generate_ensemble(&self, bounds: RangeInclusive<usize>) -> Vec<Vec<T>>
Perform successive runs with k_neighbours
values between bounds
,
generating an ensemble of LOF scores, which may be aggregated using e.g. min, max, or mean
§Examples
use geo::OutlierDetection;
use geo::{point, Point, MultiPoint};
let v: Vec<Point<f64>> = vec![
point!(x: 0.16, y: 0.14),
point!(x: 0.15, y: 0.33),
point!(x: 0.37, y: 0.25),
point!(x: 0.3 , y: 0.4),
point!(x: 0.3 , y: 0.1),
point!(x: 0.3 , y: 0.2),
point!(x: 1.3 , y: 2.3),
point!(x: 1.7 , y: 0.2),
point!(x: 0.7 , y: -0.9),
point!(x: 0.21, y: 2.45),
point!(x: 0.8 , y: 0.7),
point!(x: 0.9 , y: 0.7),
point!(x: 0.8 , y: 0.6),
point!(x: 0.73, y: 0.65),
point!(x: 0.9 , y: 0.6),
point!(x: 1.0, y: 0.6),
point!(x: 1.0, y: 0.7),
point!(x: 0.25, y: 0.29),
point!(x: 0.2 , y: 0.2),
];
let ensemble = v.generate_ensemble((2..=5));
// retain the maximum LOF value for each point for all runs
// this will result in a single Vec
let aggregated = ensemble[1..].iter().fold(ensemble[0].clone(), |acc, xs| {
acc.iter()
.zip(xs)
.map(|(elem1, elem2)| elem1.min(*elem2))
.collect()
});
assert_eq!(v.len(), aggregated.len());
Sourcefn ensemble_min(&self, bounds: RangeInclusive<usize>) -> Vec<T>
fn ensemble_min(&self, bounds: RangeInclusive<usize>) -> Vec<T>
Convenience method to efficiently calculate the minimum values of an LOF ensemble
Sourcefn ensemble_max(&self, bounds: RangeInclusive<usize>) -> Vec<T>
fn ensemble_max(&self, bounds: RangeInclusive<usize>) -> Vec<T>
Convenience method to efficiently calculate the maximum values of an LOF ensemble