Wikidata:Lexicographical data/Documentation/Languages/ta

From Wikidata
Jump to navigation Jump to search
natural language, modern language
Subclass ofTamil languages Edit
Native labelதமிழ் Edit
Located in the administrative territorial entityTamil Nadu, Singapore, Sri Lanka Edit
Has tensepresent tense, past tense, future tense Edit
Has grammatical gendermasculine, feminine, neuter Edit
Writing systemTamil script, Vatteluttu, Koleluttu, Arabic script Edit
Ethnologue language status2 Provincial Edit
Studied inTamilology Edit
Related categoryCategory:Tamil pronunciation Edit
Time of earliest written record4. century BCE Edit
Stack Exchange tag Edit
Wikimedia language codeta Edit

This page is a documentation page for Tamil (Q5885) lexemes in WikiProject Wikidata:Lexicographical data, intended for assisting contributions to Tamil lexeme content. Tamil language is a Dravidian language spoken by over 75 million people in South Asia, mainly in Tamil Nadu, Sri Lanka, Puducherry and Singapore. It is an agglutinative language.

Wikidata:Lexemes aims to provide a CC0 licensed structured lexicographical data for everyone to use for different purposes, including for Wiktionary, the upcoming Abstract Wikipedia and external projects.


Every lexeme entry has the following layout:

  • Lemma (dictionary form) of the lexeme as title or headword. It is to be written in Tamil script. Every lexeme entry will have a lexeme ID.
    • Language of the lexeme should be Tamil (Q5885) and lexical category is also specified. The lexical category should be as broad as possible, and based on the Tamil linguistic ontology
    • Senses - different meanings of the same word
    • Forms - different forms and cases of the lexeme

Structure and properties[edit]

Common properties to be added for lexeme entries are given below:


Lemma is the dictionary form (base form) of the word/lexeme.



See sense properties by usage


See forms by grammatical feature
  • See Tamil grammar (Q3535154) Cases, tenses and other inflections.
  • Tamil nouns are inflected based on number and grammatical case. There are 9 grammatical cases described for Tamil:
case suffix transliteration of suffix
nominative case (Q131105) -∅
accusative case (Q146078) -ஐ -ai
instrumental case (Q192997) -ஆல், -கொண்டு -āl, -(aik) koṇṭu
sociative case (Q3773161) -ஓடு, -உடன் -ōṭu, -uṭaṉ
dative case (Q145599) -(க்)கு, -இன்பொருட்டு, -இந்நிமித்தம் -(k)ku, -iṉ poruṭṭu, -iṉ nimittam
ablative case (Q156986) -இலிருந்து, -இடமிருந்து, -இனின்று -il(ē) iruntu [irrational], -iṭam iruntu [rational], -iṉiṉṟu
genitive case (Q146233) -அது, -ஆது, -உடைய -atu, -uṭaiya
locative case (Q202142) -இல், -இடம் -il(ē) [irrational], -iṭam [rational]
vocative case (Q185077) -ஏ

See also this Tamil Wikipedia article: w:ta:வேற்றுமை (தமிழ் இலக்கணம்)


To do[edit]

Lexicographical Coverage[edit]

See also: WD:Lexicographical data/Statistics
  • The lexeme forms coverage chart for Tamil language is given below. These statistics use corpus data from the Leipzig Corpora Collection.
  • Forms in Wikidata: 6,463
  • Forms in Wikipedia: 31,721
  • Tokens: 2,539,025
  • Covered forms: 1,089 (3.4%)
  • Missing forms: 30,632 (96.6%)
  • Covered tokens: 283,052 (11.1%)
  • Missing tokens: 2,255,973 (88.9%)
  • Most frequent missing forms


Main page: WD:Lexicographical data/Ideas of queries

1) Get all existing lexemes in Tamil: results

The following query uses these:

  • Items: Tamil (Q5885)  View with Reasonator View with SQID
    SELECT ?lexeme ?lemma WHERE {
      ?lexeme dct:language wd:Q5885; 
              wikibase:lemma ?lemma.

2) Get the count of lexemes in Tamil belonging to different lexical categories:$cS

3) Query for all Tamil nouns missing a locative case: query

The following query uses these:

  • Items: Tamil (Q5885)  View with Reasonator View with SQID, noun (Q1084)  View with Reasonator View with SQID, accusative case (Q146078)  View with Reasonator View with SQID
      ?l a ontolex:LexicalEntry ; 
           dct:language wd:Q5885; 
           wikibase:lexicalCategory wd:Q1084; 
           wikibase:lemma ?lemma ; 
           ontolex:lexicalForm ?form .
        ?form ontolex:representation ?word ;  
        minus {
          {?l a ontolex:LexicalEntry ; ontolex:lexicalForm/wikibase:grammaticalFeature wd:Q146078.}



Citable external resources[edit]


WD:Tools/Lexicographical data