geo::algorithm::outlier_detection

Trait OutlierDetection

Source
pub trait OutlierDetection<T>
where T: GeoFloat,
{ // Required methods fn outliers(&self, k_neighbours: usize) -> Vec<T>; fn prepared_detector(&self) -> PreparedDetector<'_, T>; fn generate_ensemble(&self, bounds: RangeInclusive<usize>) -> Vec<Vec<T>>; fn ensemble_min(&self, bounds: RangeInclusive<usize>) -> Vec<T>; fn ensemble_max(&self, bounds: RangeInclusive<usize>) -> Vec<T>; }
Expand description

Calculate the Local Outlier Factor of a set of points

Based on: Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM Int. Conf. on Management of Data, pages 93-104. doi: 10.1145/335191.335388

LOF is an unsupervised algorithm that uses local data for anomaly detection.

Outlier vs inlier classification is highly dependent on the shape of the data. LOF values <= 1 can generally be considered inliers, but e.g. a highly concentrated, uniform dataset could result in points with a LOF of 1.05 being outliers. LOF scores should thus be evaluated in the context of the dataset as a whole in order to classify outliers.

If you wish to run multiple outlier detection processes with differing neighbour counts in order to build up data for more robust detection (see p. 100-1 above), you can use the OutlierDetection::prepared_detector method, which retains the spatial index and point set between runs for greater efficiency. The OutlierDetection::generate_ensemble method will efficiently run the LOF algorithm over a contiguous range of neighbour inputs, allowing aggregations to be carried out over the resulting data.

Required Methods§

Source

fn outliers(&self, k_neighbours: usize) -> Vec<T>

The LOF algorithm. k_neighbours specifies the number of neighbours to use for local outlier classification. The paper linked above (see p. 100) suggests a k_neighbours value of 10 - 20 as a lower bound for “real-world” data.

§Note on Erroneous Input

If k_neighbours >= points in the set, or k_neighbours < 1, all input points will be returned with an LOF score of 1. If there are at least k_neighbours duplicate points of an input point, LOF scores can be or NaN. It is thus advisable to deduplicate or otherwise ensure the uniqueness of the input points.

§Note on Returned Points

Outlier scores are always returned corresponding to input point order

§Examples
§MultiPoint
use approx::assert_relative_eq;
use geo::OutlierDetection;
use geo::{point, MultiPoint};

let v = vec![
    point!(x: 0.0, y: 0.0),
    point!(x: 0.0, y: 1.0),
    point!(x: 3.0, y: 0.0),
    point!(x: 1.0, y: 1.0),
];

let lofscores = v.outliers(2);
// The third point is an outlier, resulting in a large LOF score
assert_relative_eq!(lofscores[2], 3.0);
// The last point is an inlier, resulting in a small LOF score
assert_relative_eq!(lofscores[3], 1.0);
§Computing indices, sorting by LOF score
 use geo::OutlierDetection;
 use geo::{point, MultiPoint};

 // these points contain 4 strong outliers
 let v = vec![
     point!(x: 0.16, y: 0.14),
     point!(x: 0.15, y: 0.33),
     point!(x: 0.37, y: 0.25),
     point!(x: 0.3 , y: 0.4),
     point!(x: 0.3 , y: 0.1),
     point!(x: 0.3 , y: 0.2),
     point!(x: 1.3 , y: 2.3),
     point!(x: 1.7 , y: 0.2),
     point!(x: 0.7 , y: -0.9),
     point!(x: 0.21, y: 2.45),
     point!(x: 0.8 , y: 0.7),
     point!(x: 0.9 , y: 0.7),
     point!(x: 0.8 , y: 0.6),
     point!(x: 0.73, y: 0.65),
     point!(x: 0.9 , y: 0.6),
     point!(x: 1.0, y: 0.6),
     point!(x: 1.0, y: 0.7),
     point!(x: 0.25, y: 0.29),
     point!(x: 0.2 , y: 0.2),
 ];
 let lofs = &mut v.outliers(3);
 let mut idx_lofs: Vec<(usize, f64)> = lofs
     .iter()
     .enumerate()
     .map(|(idx, score)| (idx, *score))
     .collect();
 // sort by LOF score
 idx_lofs.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
 // most likely outliers first
 idx_lofs.reverse();
 // four outliers, LOF scores way above 10
 idx_lofs
     .iter()
     .take(4)
     .for_each(|score| assert!(score.1 > 10.0));
Source

fn prepared_detector(&self) -> PreparedDetector<'_, T>

Create a prepared outlier detector allowing multiple runs to retain the spatial index in use. A PreparedDetector can efficiently recompute outliers with different k_neigbhours values.

Source

fn generate_ensemble(&self, bounds: RangeInclusive<usize>) -> Vec<Vec<T>>

Perform successive runs with k_neighbours values between bounds, generating an ensemble of LOF scores, which may be aggregated using e.g. min, max, or mean

§Examples
 use geo::OutlierDetection;
 use geo::{point, Point, MultiPoint};
 let v: Vec<Point<f64>> = vec![
     point!(x: 0.16, y: 0.14),
     point!(x: 0.15, y: 0.33),
     point!(x: 0.37, y: 0.25),
     point!(x: 0.3 , y: 0.4),
     point!(x: 0.3 , y: 0.1),
     point!(x: 0.3 , y: 0.2),
     point!(x: 1.3 , y: 2.3),
     point!(x: 1.7 , y: 0.2),
     point!(x: 0.7 , y: -0.9),
     point!(x: 0.21, y: 2.45),
     point!(x: 0.8 , y: 0.7),
     point!(x: 0.9 , y: 0.7),
     point!(x: 0.8 , y: 0.6),
     point!(x: 0.73, y: 0.65),
     point!(x: 0.9 , y: 0.6),
     point!(x: 1.0, y: 0.6),
     point!(x: 1.0, y: 0.7),
     point!(x: 0.25, y: 0.29),
     point!(x: 0.2 , y: 0.2),
 ];
 let ensemble = v.generate_ensemble((2..=5));
 // retain the maximum LOF value for each point for all runs
 // this will result in a single Vec
 let aggregated = ensemble[1..].iter().fold(ensemble[0].clone(), |acc, xs| {
     acc.iter()
         .zip(xs)
         .map(|(elem1, elem2)| elem1.min(*elem2))
         .collect()
 });
 assert_eq!(v.len(), aggregated.len());
Source

fn ensemble_min(&self, bounds: RangeInclusive<usize>) -> Vec<T>

Convenience method to efficiently calculate the minimum values of an LOF ensemble

Source

fn ensemble_max(&self, bounds: RangeInclusive<usize>) -> Vec<T>

Convenience method to efficiently calculate the maximum values of an LOF ensemble

Implementations on Foreign Types§

Source§

impl<T> OutlierDetection<T> for [Point<T>]
where T: GeoFloat + Sum,

Source§

fn outliers(&self, k_neighbours: usize) -> Vec<T>

Source§

fn prepared_detector(&self) -> PreparedDetector<'_, T>

Source§

fn generate_ensemble(&self, bounds: RangeInclusive<usize>) -> Vec<Vec<T>>

Source§

fn ensemble_min(&self, bounds: RangeInclusive<usize>) -> Vec<T>

Source§

fn ensemble_max(&self, bounds: RangeInclusive<usize>) -> Vec<T>

Implementors§

Source§

impl<T> OutlierDetection<T> for MultiPoint<T>
where T: GeoFloat + Sum,