Wikidata:WikidataCon 2017/Notes/Extracting a database of etymological relationships from Wiktionary

From Wikidata
Jump to navigation Jump to search

Abstract[edit]

In this talk I will show progresses in the extraction of a database of etymological relationships from Wiktionary. In particular, I have worked with data extracted from the English Wiktionary, and I have generated a (RDF) database of lexical information and etymological relationships. I have also created an interactive visualization, a graphical etymology dictionary, where users can search a word (in principle in any language) and visualize the etymological tree of the word, i.e., the tree of ancestors and descendants of ancestors, as well as descendants of the word itself. Etymological trees are multilingual trees that show how different words in different languages have evolved from a common ancestor. Through the visualization users can also see lexical data associated with words in the tree, like POS and definitions, by clicking on words in the tree.

This is an IEG project centered around Wiktionary that could produce data for Wikidata when a structure for lexical data will be available in Wikidata (see the | Wikidata for Wiktionary project). For this purpose, as suggested before (see here), the primary sources tool could be used, as data would need a validation step.

The slides above are screenshots of the interactive visualizations produced by the graphical and multilingual etymology dictionary etytree. The tool is currently under development and can be tested at etytree. Users can click on language tags and words to see definitions. We have also started a Wikidata project: Etymology.