Semantic distance using search engines

Google Distance Between Words (with Alberto J. Evangelista).
Frontiers in Undergraduate Research, University of Connecticut, Spring 2006.
arXiv: 0901.4180 [cs.CL]
The normalized Google distance of Cilibrasi and Vitanyi was studied. An explicit example of the failure of the triangle inequality was found:
\[d(\text{Rolling Stones}, \text{salmonflies})>d(\text{Rolling Stones}, \text{Beatles})+d(\text{Beatles}, \text{salmonflies}).\]
(This is probably due to people misspelling “beetles”.)

Surprisingly (?) this paper was cited by:
Wu, Lin, and Liu: An exploratory study of navigating Wikipedia semantically: model and application (published in LNCS, Online Communities and Social Computing: 4th International Conference).

Worawitphinyo, Gao and Jabeen: Improving Suffix Tree Clustering with New Ranking and Similarity Measures (published in LNCS, Advanced Data Mining and Applications
7th International Conference, ADMA 2011, Beijing, China, December 17-19, 2011, Proceedings, Part II)

Lee Jun Choi,; Rashid, Nur’Aini Abdul;
Adapting normalized google similarity in protein sequence comparison (This paper appears in: Information Technology, 2008. ITSim 2008. International Symposium on)

Khushboo Thakkar et al. / International Journal on Computer Science and Engineering (IJCSE): Test Model for Text Categorization and
Text Summarization

http://efreedom.com/Question/9-19264/Determining-Similarity-Words

Kolmogorov Complexity in perspective Part II: Classification, InformationProcessing and Duality, by Marie Ferbus-Zanda (published in Synthese)

The language specialisation of the Google search
engine, by Volker Schatz

Bjørn Kjos-Hanssen

Semantic distance using search engines

Professor of Mathematics, University of Hawaii at Manoa