Citation Connor R, Cardillo FA, Moss R & Rabitti F (2013) Evaluation of Jensen-Shannon distance over sparse data. In: Brisaboa N, Pedreira O & Zezula P (eds.) Similarity Search and Applications: 6th International Conference, SISAP 2013, A Coruña, Spain, October 2-4, 2013, Proceedings. Lecture Notes in Computer Science, 8199. Similarity Search and Applications: 6th International Conference, SISAP 2013, Coruna, Spain, 02.10.2013-04.10.2013. Berlin, Heidelberg: Springer Verlag, pp. 163-168. https://doi.org/10.1007/978-3-642-41062-8_16
Abstract Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in ℝ*.
The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible.
Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.
Keywords Sparse data; inverted index; sparse vector; intrinsic dimensionality; large data collection;
Journal Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)