k Nearest Neighbor

K-nearest-neighbor (kNN) classification is one of the most fundamental and simple classification methods and should be one of the first choices for a classification study when there is little or no prior knowledge about the distribution of the data. K-nearest-neighbor classification was developed from the need to perform discriminant analysis when reliable parametric estimates of probability densities are unknown or difficult to determine. In an unpublished US Air Force School of Aviation Medicine report in 1951, Fix and Hodges introduced a non-parametric method for pattern classification that has since become known the k-nearest neighbor rule (Fix & Hodges, 1951). Later in 1967, some of the formal properties of the k-nearest-neighbor rule were worked out; for instance it was shown that for k=1 and n→∞ the k-nearest-neighbor classification error is bounded above by twice the Bayes error rate (Cover & Hart, 1967). Once such formal properties of k-nearest-neighbor classification were established, a long line of investigation ensued including new rejection approaches (Hellman, 1970), refinements with respect to Bayes error rate (Fukunaga & Hostetler, 1975), distance weighted approaches (Dudani, 1976; Bailey & Jain, 1978), soft computing (Bermejo & Cabestany, 2000) methods and fuzzy methods (Jozwik, 1983; Keller et al., 1985). (Source: http://www.scholarpedia.org/article/K-nearest_neighbor)