Google Distance Between Words (with Alberto J. Evangelista).
Frontiers in Undergraduate Research, University of Connecticut, Spring 2006.
arXiv: 0901.4180 [cs.CL]
The normalized Google distance of Cilibrasi and Vitanyi was studied. An explicit example of the failure of the triangle inequality was found:
\[d(\text{Rolling Stones}, \text{salmonflies})>d(\text{Rolling Stones}, \text{Beatles})+d(\text{Beatles}, \text{salmonflies}).\]
(This is probably due to people misspelling “beetles”.)
Surprisingly (?) this paper was cited by:
Wu, Lin, and Liu: An exploratory study of navigating Wikipedia semantically: model and application (published in LNCS, Online Communities and Social Computing: 4th International Conference).
Worawitphinyo, Gao and Jabeen: Improving Suffix Tree Clustering with New Ranking and Similarity Measures (published in LNCS, Advanced Data Mining and Applications
7th International Conference, ADMA 2011, Beijing, China, December 17-19, 2011, Proceedings, Part II)
Lee Jun Choi,; Rashid, Nur’Aini Abdul;
Adapting normalized google similarity in protein sequence comparison (This paper appears in: Information Technology, 2008. ITSim 2008. International Symposium on)
Khushboo Thakkar et al. / International Journal on Computer Science and Engineering (IJCSE): Test Model for Text Categorization and
Text Summarization
http://efreedom.com/Question/9-19264/Determining-Similarity-Words
Kolmogorov Complexity in perspective Part II: Classification, InformationProcessing and Duality, by Marie Ferbus-Zanda (published in Synthese)
The language specialisation of the Google search
engine, by Volker Schatz