Wikidata talk:Lexicographical data/Archive/2021/03

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

English Wiktionary languages

As we finally got the occasion to index all languages of Wikisource, I figured I should do the same for Wiktionary. Looking at it counts 5272 subcategories:

Some ca. 1000 were created today, others in 2017, but had been missing its P910 till today.
Some may already have its language defined on the Wiktionary category page with a Wikidata item (?!)

wiktionary:Wiktionary:Statistics/generated has detailed stats and counts 4205 "language headers", 2800 with less than 10 "gloss definitions". --- Jura 00:48, 1 March 2021 (UTC)

Focus languages for improvements to the lexicographic extension of Wikidata and Abstract Wikipedia

Hi. We would like to find two or three language communities who would be good matches to help to start and guide some long-term improvements to the lexicographic data part of Wikidata, and the closely related work in the Wikifunctions wiki and the Abstract Wikipedia project, over the next few years. Participating communities will hopefully find that this project will lead to long-term growth in content in Wikipedia and Wiktionary in and about their language. See Wikidata:Lexicographical data/Focus languages for more information. Please help us identify potential good matches. More details are on that page. Thank you! Quiddity (WMF) (talk) 00:09, 4 March 2021 (UTC)

Demonyms created as items, instead of lexemes

Haligonians (Q104720536) is listed for deletion at Wikidata:Requests_for_deletions#Q104720536. It's currently an instance of demonym (Q217438). --- Jura 21:45, 8 March 2021 (UTC)

30 Lexic-o-days, events and challenges about lexicographical data

Hello all,

I'm glad to announce that 30 Lexic-o-days, a series of events, projects and challenges around lexicographical data, will start on March 15th. There will be discussions, presentations, but also activities like improving the documentation of Lexemes or editing challenges. The goals of this event is to gather people editing Lexemes to have discussions around the content and work together. You can find the schedule and all relevant links on this page.

This format is a first experiment and its content is powered by the community: if you have ideas or wishes for the discussions, you're very welcome to set up an appointment or to create a task on the related Phabricator board! We're also keeping an open list of ideas here. Discussions about Lexemes, or summaries of future discussions that will take place during the event, should be documented on the project page or its talk page.

If you have questions or need help to participate, feel free to contact me. I'm looking forward to your participation! Cheers, Lea Lacroix (WMDE) (talk) 12:32, 9 March 2021 (UTC)

Lexicographic coverage of Wikipedia

lang Forms in Wikidata Forms in Wikipedia Tokens Covered forms (%) Missing forms Covered tokens (%) Missing tokens Notes
en 64494 963849 1529657229 41654 4.3 922195 1345425070 88 184232159
et 1606320 131017 16958786 72103 55 58914 13378652 78.9 3580134
da 31671 112469 31303226 17096 15.2 95373 24553257 78.4 6749969
sv 145092 224857 72905148 39899 17.7 184958 55812149 76.6 17092999
fr 17572 554837 480077448 25562 4.6 529275 356747841 74.3 123329607
de 21654 1035287 609589775 15705 1.5 1019582 381764040 62.6 227825735
sk 65104 119565 18106551 35858 30 83707 10763899 59.4 7342652
ru 909795 795720 292922734 158112 19.9 637608 136397968 46.6 156524766
he 324189 250049 77382629 53709 21.5 196340 35040268 45.3 42342361
cs 106648 273215 73639616 26526 9.7 246689 32663946 44.4 40975670
es 4069 450682 410174431 5763 1.3 444919 169742085 41.4 240432346
it 547 399799 285251736 860 0.2 398939 93932894 32.9 191318842
pl 15161 386124 118098912 7071 1.8 379053 37465675 31.7 80633237
no 5090 158509 49548917 2499 1.6 156010 12431637 25.1 37117280 (w/ 10+ tokens)
fi 7067 294157 47543118 6404 2.2 287753 11613274 24.4 35929844
nl 183 277692 131747011 280 0.1 277412 25134658 19.1 106612353
bn 27490 649854 13474320 5478 0.8 644376 2109462 18.6 11364858 selection slightly different
pt 226 244299 159495159 347 0.1 243952 25751182 16.1 133743977
hi 151 54671 18940269 106 0.2 54565 2859937 15.1 16080332
ms 358 58423 16381541 521 0.9 57902 2446411 14.9 13935130
ca 116 182535 109381764 97 0.1 182438 11036460 10.1 98345304
hr 54 137158 28734051 52 0 137106 1311223 4.6 27422828
uk 735 421935 115177791 710 0.2 421225 4254846 3.7 110922945
lv 38 60997 8034762 36 0.1 60961 264719 3.3 7770043
bg 166 124069 33507484 152 0.1 123917 347412 1 33160072
ro 24 134905 41483262 38 0 134867 337091 0.8 41146171
ar 202 248809 69904516 35 0 248774 245740 0.4 69658776
id 13 113157 40564473 13 0 113144 111944 0.3 40452529
fa 45 107982 45328313 27 0 107955 111378 0.2 45216935
hu 131 290408 65863565 102 0 290306 133238 0.2 65730327
ko 24 292283 34022746 22 0 292261 74177 0.2 33948569
lt 34 101819 13119499 20 0 101799 16277 0.1 13103222
tr 21 165383 30066752 35 0 165348 39628 0.1 30027124
el 9 132176 40493491 9 0 132167 4024 0 40489467
sl 2 115894 19773777 3 0 115891 2133 0 19771644
sr 9 201413 42242364 2 0 201411 682 0 42241682
th 3 28208 2090490 2 0 28206 634 0 2089856
tl 1 22614 3604273 2 0 22612 1092 0 3603181
vi 5 71119 75800297 6 0 71113 6510 0 75793787

Wikidata:Lexicographical_coverage has some some interesting stats. Above a summary of these. There is some discussion about it at Wikidata_talk:Lexicographical_coverage. --- Jura 13:28, 9 March 2021 (UTC)

Prospection around a « verb conjugation » gadget

Hi, I have a little project to create a gadget to show conjugation for verbs using Wikidata datas. So I tried stuff in SPARQL, like this query to find the forms of verbs in french in different tenses:

See a first attempt in french :

select ?verbe ?verbeLabel ?tempsLabel ?LabelArticle ?repre {
  values (?temps ?rangTemps ?tempsLabel) { 
    (wd:Q192613 1 "présent") 
    (wd:Q442485 2 "passé simple") # passé simple
    (wd:Q17081589 2 "passé simple(fr)")
    (wd:Q1475560 3 "futur simple")
    (wd:Q1336020 5 "passé composé")
  values (?article ?rangArticle ?LabelArticle) { 
    (wd:Q51929218 1 "je") 
    (wd:Q51929369 2 "tu")
    (wd:Q51929447 3 "il/elle")
    (wd:Q51929447 4 "il")
    (wd:Q52431970 5 "elle")
    (wd:Q51929290 6 "nous")
    (wd:Q51929403 7 "vous")
    (wd:Q51929517 8 "ils/elles")
    (wd:Q52432983 9 "ils")
    (wd:Q52433019 10 "elles")
  ?verbe ontolex:lexicalForm ?forme ; 
         dct:language wd:Q150 ;
         wikibase:lemma ?verbeLabel .
  ?forme wikibase:grammaticalFeature ?temps ;
         wikibase:grammaticalFeature ?article ;
         ontolex:representation ?repre
  # SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by ?verbe ?rangTemps ?rangArticle
Try it!

(Either my query is bad or there is not much verbs forms in french right now.)

I wondered how to generalize this query to allow other languages in french, so I tried to build another query that finds the personal pronouns in all languages :

select ?trait_grammatical ?mot ?lang {
  ?lex wikibase:lemma ?mot ; dct:language ?lang .
  ?lex ontolex:sense/wdt:P5137 ?trait_grammatical .
  ?trait_grammatical wdt:P279* wd:Q690940 .
Try it!

Far from complete as well. Same question, is my query bad ? it queries item for this sense (P5137) View with SQID to link the lexeme senses to a grammatical person (Q690940)  View with Reasonator View with SQID. Is this a good approach ? author  TomT0m / talk page 15:47, 13 March 2021 (UTC)

@TomT0m: thanks for this query, it allowed me to find and correct mistakes. first-person singular (Q51929218) should not be used as grammatical feature, it should be first person (Q21714344) + singular (Q110786) (like passé simple (Q17081589) should not be used). You can look at bouger (L10251) for a not-too-bad example (some details still need to be improved, see Talk:Q12547192 for instance). PS: third-person masculine plural (Q52432983) and third-person feminine plural (Q52433019) are indeed grammatical persons in French but they're not used for conjugation flexions. Cdlt, VIGNERON (talk) 11:24, 24 March 2021 (UTC)
@VIGNERON: mmm I think I’d prefer to keep forms such as first-person singular (Q51929218), it would make the query easier (by putting them as subclass of the other non combined items you can achieve the same effect I guess) author  TomT0m / talk page 11:35, 24 March 2021 (UTC)
@TomT0m: well, for obvious reasons (no new items to create, translate and manage) and since both method are equivalent, it has been decided to go wiht the easier one and no to use combined items, see Wikidata talk:Lexicographical data/Archive/2019/12#Use of combined grammatical features like “second-person plural” or Wikidata talk:Lexicographical data/Archive/2018/06#Missing items for grammatical features for instance. BTW, right now, we only have 47 persons, most of them don't have many data, not even labels in English nor French :( and some others are not even in subclasses (I may improve them a bit but I already have a lot to do and days only have 24 hours). Cheers, VIGNERON (talk) 14:15, 24 March 2021 (UTC)
@TomT0m: actually I don't like approach with grammatical person (Q690940). you (L482) (and e.g. tu (L9096)) are not grammatical person (Q690940) (and thus would be a grammatical category (Q980357)) but a personal pronoun (Q468801), i.e. like thou (Q4466935). It just has some grammatical person (Q690940) like second-person singular (Q51929369). --Infovarius (talk) 21:07, 26 March 2021 (UTC)

Open up lexicographical data to Wiktionary

Given that the lexeme namespace in Wikidata (or lexemes as Wikibase entities) could benefit from more contributors, how could we do open it up to Wiktionary contributors? --- Jura 10:59, 18 March 2021 (UTC)

See Wikidata:Wiktionary and phab:T212843. Cheers, VIGNERON (talk) 16:54, 24 March 2021 (UTC)

radical (P5280) for non-CJVK languages

What's the status of this? special:ListProperties?datatype=wikibase-lexeme gives no hints but something mentioned at lexeme talk:L1 made me wonder. Arlo Barnes (talk) 00:18, 30 October 2020 (UTC)

@Arlo Barnes: what is your question or proposal exactly? radical (P5280) is not meant form lexemes. The talk on L1 was old, two month later combines lexemes (P5238) was created that can be used for that I think. Cheers, VIGNERON (talk) 07:46, 1 March 2021 (UTC)
@VIGNERON: Thank you, I think that answers any questions I had. Arlo Barnes (talk) 19:58, 1 March 2021 (UTC)