Wikidata talk:Lexicographical data/Archive/2019/06

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

About the possibility of easily adding regular verbs in Portuguese

Hello everyone. The people at Brazil just started adding more lexemes to Wikidata. However, manually adding information is time consuming and inefficient. We noticed that the Wikidata:Wikidata_Lexeme_Forms requires 84 forms in Portuguese, and we don't need 84 forms if we want to add regular verbs in Portuguese, since it is possible to automate those verbs conjugation. We would still need to add these forms to be able to add other grammatical variants.

We are wondering if we can automate this in the future, or if it is possible to change Wikidata:Wikidata_Lexeme_Forms based on the target language. our idea, in the case of the Portuguese would be to use an auxiliary script to make all the verb variants and then allow the user to edit the possible irregularities.

We also want to comment about adding new lexeme data on Wikidata in general. We believe that Wikidata_Lexeme_Forms isn't very efficient, and adding information about Senses isn't available. Theres a lot of potential for automation, but also a lot of room to improve regarding the manual input of new lexemes to Wikidata. Thanks! Tetizeraz (talk) 19:55, 23 May 2019 (UTC)

Thanks for your feedback! I believe that just like for the rest of Wikidata, plenty of additional user tools can be created to improve the editing process and make it more efficient. Wikidata Lexemes Forms was one of the first onest, there is also Ordia's text to Lexemes. If someone is interested to build more around automated forms, I'm sure it would be pretty useful for the rest of the community :) Lea Lacroix (WMDE) (talk) 07:33, 24 May 2019 (UTC)
@Tetizeraz: Wikidata Lexeme Forms has a feature to support automatic creation of lexemes with forms, where the form representations (the actual text) are provided by some other component (e. g. an auxiliary script) but the tool helps out with some other parts – see the documentation and the announcement for more information. (Of course, we need to finish the Portuguese template(s) before that can be used, but hopefully that shouldn’t take too much longer.) --Lucas Werkmeister (talk) 22:42, 3 June 2019 (UTC)

Modeling gender change of Dutch nouns

A gradual development in the use of Dutch nouns is that a large group of traditionally feminine words are being used as masculine words. I am hoping to avoid the rather pointless discussion about whether these nouns should exclusively get the property feminine of masculine. Depending on where a user's interests lie on the scale of "written-spoken", "old-new" and "South-North" one or the other may be preferable. This is different from the situation where a word historically always has had both genders or where there is a true common gender, so I feel using both genders or the common gender on these lexemes would be misleading.

In the official Woordenlijst Nederlandse taal these words are marked "m/v", while the major Dutch dictionary Van Dale uses "v/m" for the same purpose and WikiWoordenboek (Dutch Wiktionary) does the same. A possible solution would be to create a specific item for this purpose "feminine/masculine". There are linguistic works that could serve a source document. Another approach would be to use qualified statements. Unfortunately I have not been able to find a property that describes a gradual change, but what I came up with is: grammatical gender (P5185)feminine (Q1775415)followed by (P156)masculine (Q499327). Although I would like to have lots of special items describing the marvels of the Dutch language, I can imagine that in the long run the purposes of Wikidata are better served by modeling them with a more limited set of items. But my experience in crafting statements like this is rather limited, so please share your remarks and suggestions. --MarcoSwart (talk) 10:02, 4 June 2019 (UTC)

@MarcoSwart: We are pretty free with items, I think it would be fine to have special items for this sort of thing. Assuming it's not at the level of one new item for each word? ArthurPSmith (talk) 10:10, 4 June 2019 (UTC)
May be to use start time (P580)/end time (P582) as a qualifier to grammatical gender (P5185)? --Infovarius (talk) 10:47, 4 June 2019 (UTC)
@ArthurPSmith: No, in this case it is more the other way around: there are ultimately tens of thousands of lexemes that would be described this way, so it useful to have a clear standard. I will explore your suggestion, while awaiting more reactions.
@Infovarius: The reason why I did not use the properties you mention is that the underlying process is a gradual one, spanning centuries, with different speeds for different words and in different regions and even different speakers. There is no meaningful way to define a beginning or end, as all that is known is the direction of the development and which nouns are concerned.
I have followed the suggestion of ArthurPSmith and created Dutch f/m shift (Q64448167).

Are there any plans to connect the Interwiki-Links on the Wiktionarys to Wikidata? I talked about that with User:Lea Lacroix (WMDE) and she proposed a new property for that (since Wikidata has one page for only one Lexeme). What do you think? Greetings 2A01:112F:742:C00:C421:468F:940E:B1F8 18:32, 20 May 2019 (UTC)

Can you give an example? I thought the Wiktionary interwiki links were automatic now (based on the string form of the word)? ArthurPSmith (talk) 15:22, 21 May 2019 (UTC)
Links between Wiktionary entries are automated indeed, without connection to Wikidata (see Wikidata:Wiktionary/Sitelinks). Here we're talking about linking a Lexeme on Wikidata to a Wiktionary entry I believe. Lea Lacroix (WMDE) (talk) 16:20, 21 May 2019 (UTC)
I do not see how it would be possible because Wikidata and Wiktionary architectures are completely different. On Wikidata, there is one page by lexeme. On Wiktionary there is one page by spelling. Of course, there may be several lexemes for a given language and the spelling may be used in different languages that may have several lexemes as well. And I only speak for only one linguistic version of the Wiktionary. How to link to a specific lexeme on a given linguistic version of the Wiktionary? I am a bit lost. Pamputt (talk) 18:38, 21 May 2019 (UTC)
FWIW, my 2014-10 proposal included the concept of a Representation (R) entity for this very purpose. (Indeed, the intent was that it would be the first entity created in order to most quickly leverage the existing Wiktionary data and improve interwiki coverage.) I'm not sure how much attention was paid to my proposal when Denny and others finalized the plan that was actually implemented, but I note that his 2015-05 proposal includes representations only as strings. — GPHemsley (talk) 11:01, 10 June 2019 (UTC)
Just to comment that using a property to link to Wiktionary would be a really bad idea in the long term. If the page on Wiktionary is moved/deleted, then the sitelinks would get updated, but the property wouldn't. For Commons, we have Commons category (P373) that tries to do something similar, but there's a big backlog there of P373 values that need to be corrected/removed to match the sitelinks and/or the actual location of the relevant info on Commons. Also, if there's a sitelink, then it's possible to access the information on Wikidata from the project, but that's not the case for property links - so if Lexeme content is going to be used on Wiktionaries in the future, then they need to be connected with a sitelink. Thanks. Mike Peel (talk) 14:18, 23 May 2019 (UTC)

@2A01:112F:742:C00:C421:468F:940E:B1F8: wht is the context and why would you need such link, in most cases there are trivial to find as Wiktionaries title are based on spelling. Gebäude (L29296) is obviously equivalent to wikt:Gebäude#German (based on the main lemma, but depending on the context maybe you may want to use the forms and/or the senses). Cdlt, VIGNERON (talk) 09:23, 2 June 2019 (UTC)

I would imagine that interwikis are an intermediate step for getting data from Wikidata Lexeme. Due to importance of the topic I'm creating another post below: #Getting data in Wiktionary. --Infovarius (talk) 10:40, 4 June 2019 (UTC)

Getting data in Wiktionary

@Lea Lacroix (WMDE): are there plans for turning on arbitrary access from Wiktionary? I believe that this step is the main idea of Lexemes themselves and is long-waited in Wiktionaries. Just imagine that Wiktionary pages in all language-edition can retrive forms, pronunciations, translations and many more from a corresponding (in some sense) Lexeme! P.S. I thought that correspondance would be done through some interwikis but as this is hard to do in some way (any official reasons?) I am ok with a property. But how to see a Lexeme with such property from corresponding Wiktionary page? --Infovarius (talk) 10:44, 4 June 2019 (UTC)

Yes, arbitrary access on Wiktionaries it's definitely planned :) For now, we want to learn more about what the actual Wiktionary editors would like to do with the data, so we can design functions (parser functions or Lua) adapted to their needs. We're collecting feedback on this ticket, feel free to add a comment there! We're also looking for Wiktionary communities who would be willing to experiment arbitrary access as a first step before deploying on all of them.
The interwikis on Wiktionary are not using Wikidata. The connection between Lexemes and Wiktionary page is not that obvious, as discussed on this page over the past weeks. The reason behind both of these things, is that Wikidata Lexeme and Wiktionary are not structured the exact same way: Wikidata works with Lexemes, Wiktionaries combine all Lexemes from all languages having the same Lemma on one page. Their interwikilinks, unlike the Wikipedia ones that connect two identical concepts, are based only on an identical page title: that's why it was not necessary to connect them with Wikidata, a Mediawiki extension does the job pretty well. (see also Wikidata:Wiktionary/Sitelinks)
In any case, arbitrary access enables calling data from any Wikidata entity, not only the one that would be connected to a page.
I hope that answered your questions, at least partially, if you have more, feel free to ask :) Lea Lacroix (WMDE) (talk) 11:07, 4 June 2019 (UTC)

Senses

I think the senses should be connected to the Q-items. Those pages already take care of the sense (the meaning behind the word) and with that much more: pictures, categories, and thousands of definitions already written down in different languages. It would be so much easier to ad all the information from there than to do it all once again or copying it... 2A01:112F:742:C00:C421:468F:940E:B1F8 19:06, 20 May 2019 (UTC)

Just use item for this sense (P5137). Infovarius (talk) 12:23, 21 May 2019 (UTC)
@Infovarius: True. Although there is the other problem of item for this sense (P5137) only being used on noun senses. I still feel that Property_talk:P5137#Use on Verb, Adjective and Adverb senses (nominalisation) is our best bet to connect non-noun senses to items. Is there any other way of doing it? Liamjamesperritt (talk) 01:50, 22 June 2019 (UTC)

Suppletion and how to indicate it

Hi,

How to indicate that forms of a word is a suppletion?

In ki (L69), I indicated it explicitelty directly in the Grammatical features (see Lexeme:L69#F3).

In go (L3006), GZWDer indicated it implicitely but more precisely with derived from lexeme (P5191) = wend (L8464).

Both methods sounds good but not perfect.

The second one seems more elegant and better but it could be complicated for higly suppletive words (like aller (L750) were almost all of the 100 forms are suppletive). Plus, derived from lexeme (P5191) has a constraint allowed-entity-types constraint (Q52004125) = Wikibase lexeme (Q51885771) (which can be changed but it needs a discussion). Withtout changing the constraint, maybe it could be also done the other way round, by putting derived from lexeme (P5191) at the lexeme level and qualifying with the forms? (with subject form (P5830))

Cheers, VIGNERON (talk) 12:55, 7 June 2019 (UTC)

We do also have object form (P5548) - although that was specified as a qualifier. ArthurPSmith (talk) 18:38, 7 June 2019 (UTC)
The constraint system is utterly alien to me, because changing a constraint, does not mean the alerts will actually go away. In this case there is not really any need for discussion, but I don't think the system can even handle (See Help:Property constraints portal/Allowed entity types) using inflected form (Q4423888) as an allowed entity! It's frustrating. Circeus (talk) 22:43, 7 June 2019 (UTC)