Title: A family of metrics connecting Jaccard distance to normalized information distance
by Bjørn Kjos-Hanssen (University of Hawaii at Manoa) as part of Computability theory and applications
Distance metrics are used in a wide variety of scientific contexts. In a 2001 paper in the journal Bioinformatics, M. Li, Badger, Chen, Kwong, and Kearney introduced an information-based sequence distance. It is analogous to the famous Jaccard distance on finite sets. Soon thereafter, M. Li, Chen, X. Li, Ma and Vitányi (2004) rejected that distance in favor of what they call the normalized information distance (NID). Raff and Nicholas (2017) proposed a return to the Bioinformatics distance, based on further application-informed criteria.
We attempt to shed some light on this “dispute” by showing that the Jaccard distance and the NID analogue form the extreme points of the set of metrics within a family of semimetrics studied by Jiménez, Becerra, and Gelbukh (2013).
The NID is based on Kolmogorov complexity, and Terwijn, Torenvliet and Vitányi (2011) showed that it is neither upper semicomputable nor lower semicomputable. Our result gives a 2-dimensional family including the NID as an extreme point. It would be interesting to know if any of these functions are semicomputable.