Wikidata talk:Lexical Masks

From Wikidata
Jump to navigation Jump to search

Great idea[edit]

Hi @Bruno771, Denny:, this is really a great idea. I have some remarks/questions:

  • where can I find the list of all masks? (or is there really only 2 masks?)
  • (if it doesn't exist yet) I'd like to try create a mask for French nouns (which are pretty simple), is there a good tutorial somewhere?
  • I wondering if it's a good idea to create masks for Breton (which is a very irregular language), can ShEx do thing like "a Breton noun should have either a singular/plural or a collective/singulative/plural of singulative"

Cheers, VIGNERON (talk) 10:15, 10 March 2020 (UTC)[reply]

@Bruno771, Denny: for the first point, I did a search and found other ShEx about Lexemes (including a lot in Danish created by @Fnielsen:), I added them to Wikidata:Lexical Masks#List of existing masks, is it allright? Cheers, VIGNERON (talk) 11:38, 11 March 2020 (UTC)[reply]

Hi @VIGNERON:, thanks for your message!

  • We are gradually adding more and more languages (they exist in another format, it's just a matter of converting it). Denny can probably prioritize French so you can play with them soon.
  • Irregular language: Yes, we'd love to test our approach with corner cases! We can have different masks, and the ShEx will check that the entry passes one of them. For example, in French we'll have two noun masks (one for 4 form entry like - "ami, amis, amie, amies" and one for 2 form entry like "table, tables"). Let me check with Denny about the best way for you to set up Breton.

Bruno771 (talk) 13:23, 11 March 2020 (UTC)[reply]

Hi! Yay! Thanks for the encouraging words.
For us the best start on Breton would be if there are a few example entries that you consider complete, and possibly some explanations regarding their structure, and then we'll get right to try out to create some ShEx files for these.
I'll get to French soon and upload the results. --Denny (talk) 01:05, 12 March 2020 (UTC)[reply]
@Denny: thanks a lot. Let's start with French so I can better understand how it's work and then move to Breton (some good examples could be gwez (L62), ki (L69) still very incomplete but the essential is here). Basically Breton noun can have 2 structures : a singular/plural (like you know it in most languages) *or* a collective/singulative (which itself can have plural, but plural of anything, including plural of plural is a common thing in Breton, even in the first case) but Breton words (not just nouns) can also have mutations (depending on the first letter(s) of the noun). So we have (a bit simplified) :
sing/plur coll/sing
no mutation, starting with any letter except the following
starting with a 'k', 't', or 'p' ki (L69)
starting with a 'g', 'gw', 'd', 'b' gwez (L62)
starting with a 'm'
I'll try to create and complete a Lexeme for each case of this table. Cheers, VIGNERON (talk) 08:13, 12 March 2020 (UTC)[reply]

I uploaded the French standard noun here, and on my first run on 100 random entries, the following four Lexemes were reported as incomplete or erroneous: somnambule (L13959), plâtras (L14245), notaire (L15820), and fils (L15917). Does this look right and make sense? --Denny (talk) 00:33, 13 March 2020 (UTC)[reply]

@Denny: yes, it's right and it make sense, for the 3 first lexemes, the grammatical gender (P5185) was missing, I added it. That said, there is probably some few cases where P5185 is optional but it's limited. For the 4th one, I'm guessing it's because masculine (Q499327) as Grammatical feature, for this specific entity I'm not sure but I'm sure there is a lot of cases where it's correct (qv. supra "ami/amie" for instance), so gender should be allowed here. PS: could you please ping me, so I don't miss your answer. Cheers, VIGNERON (talk) 19:03, 14 March 2020 (UTC)[reply]

Hi @VIGNERON, I updated the guidelines to run the ShEx here. We have another ShEx (not implemented yet) that should capture the entry with 4 forms (ami, amie, amies, amis). We are still not sure on how to make the two ShEx run in parallel (or one after the other). Bruno771 (talk) 14:28, 20 March 2020 (UTC)[reply]

Basque failures[edit]

I am not exactly sure where to put them, but I ran the Shex mask for Basque for the first 1500 or so nouns (alphabetically), and here's the list of Lexemes that have been raised as potentially problematic:

  1. Lehendakariorde (L60781)
  2. abdominal (L54733)
  3. abstraktu (L60670)
  4. absurdo (L60685)
  5. adarzabal (L60661)
  6. adieraztea (L222139)
  7. adopzio (L49964)
  8. ageri (L60635)
  9. agindupeko (L60684)
  10. agintaritza (L49920)
  11. aglomeratu (L230988)
  12. aguazil (L60674)
  13. aharrari (L231001)
  14. aita-amaginarreba (L231018)
  15. aitaren (L60667)
  16. albistegi (L50700)
  17. aldameneko (L60650)
  18. alkoholemia (L51448)
  19. altu (L60631)
  20. anai-arreba (L60679)
  21. anestesiko (L60630)
  22. angurri (L50555)
  23. antidopin (L60640)
  24. antolabide (L51802)
  25. antsietate (L49418)
  26. apar (L50864)
  27. aranondo (L50297)
  28. arbin (L230859)
  29. ardozale (L60637)
  30. argi-iturri (L230930)
  31. armadun (L184455)
  32. arranpalo (L60665)
  33. arrosa-leiho (L60636)
  34. aseezin (L184449)
  35. (L35035)
  36. atoi (L50310)

Not exactly sure where to put this, so I put it here for now, but we need a better workflow :) --Denny (talk) 03:24, 15 April 2020 (UTC)[reply]