Outlier Robust Geodesic K-means Algorithm for High Dimensional Data

This paper proposes an outlier robust geodesic K-mean algorithm for high dimensional data. The proposed algorithm features three novel contributions. First, it employs a shared nearest neighbour (SNN) based distance metric to construct the nearest neighbour data model. Second, it combines the notion of geodesic distance to the well-known local outlier factor (LOF) model to distinguish outliers from inlier data. Third, it introduces a new ad-hoc strategy to integrate outlier scores into geodesic distances. Numerical experiments with synthetic and real world remote sensing spectral data show the efficiency of the proposed algorithm in clustering of high-dimensional data in terms of the overall clustering accuracy and the average precision.

