Wikidata talk:Lexicographical data/Archive/2023/11

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Question about combines lexemes P5238

How do I correctly define combines lexemes (P5238) for a word where the prefix is only part of a noun? I'm looking at the Swedish noun abborrfena (L1206838) with the prefix (L1211366), which is not a word in itself, it is only part of the whole word for perch, i.e. abborre (L235116). Should the prefix be abborr or abborre? If abborr, how do I define it, and how do I correctly link it to abborre?

I checked another case of combines lexemes (P5238) where an s is needed between two words, but that does not help in this case: havsbotten (L242830) consists of hav, -s- and botten, the -s- being an ekstraudstyr (L1153504). Robert (talk) 08:24, 2 November 2023 (UTC)

@Robertsilen: One solution that has been adopted for some German lexemes (and even some Swedish lexemes) is to define a new form with the grammatical feature combining form (Q107614077) and use that in any relevant 'combines' statements: e.g. Krawatten (L36557-F9) or användar (L33166-F9). Mahir256 (talk) 15:57, 2 November 2023 (UTC)
Thanks, I implemented your suggestion. Makes sense. Robert (talk) 20:32, 2 November 2023 (UTC)
See Wikidata:Lexicographical data/Documentation/Languages/de#Form_used_in_compounds for an explanation of why I do it this way for German. - Nikki (talk) 09:52, 28 November 2023 (UTC)
You might also be interested in this discussion we had some time ago :) Vesihiisi (talk) 07:34, 3 November 2023 (UTC)

lexicographic items

Some items are quite dictionarian (e.g. Q21121474), so we have subject lexeme (P6254) on them, linking to an appropriate lexeme. But do we have a means to link it back from lexeme? And I am not talking about item for this sense (P5137) on senses, because the word can have different meanings and different corresponding items, but simultaneously can have an item about itself (not the meaning). Infovarius (talk) 12:00, 20 November 2023 (UTC)

Could you use said to be the same as (P460)? ArthurPSmith (talk) 18:48, 28 November 2023 (UTC)
@Infovarius: of course that would violate some constraints but maybe they should be altered for this case? ArthurPSmith (talk) 18:48, 28 November 2023 (UTC)
Thanks, it would work until better solution. --Infovarius (talk) 20:52, 29 November 2023 (UTC)

Standardized corpus of open data sentences

The idea is to create a new JSON-based standard for sharing open data with sentences that can be referenced and used in tools related to Wikidata lexemes. This would allow for tools to easily incorporate sentences from various sources such as the Swedish Parliament, the Swedish Public Employment Service, EU documents, historical documents, and more. The goal is to standardize this process, making it straightforward to support multiple languages where CC0 data is available. I'm currently working on it, see https://github.com/dpriskorn/riksdagen_sentences for details.

I welcome suggestions for improvement so we can start building a multi million token dataset which we can use to help our users add example phrases to lexemes. @Fnielsen:--So9q (talk) 07:12, 28 November 2023 (UTC)

There is some initiative in Russian: opencorpora.org. Yet half-dead it seems. Infovarius (talk) 20:59, 29 November 2023 (UTC)

Similarities between proverbs that are near-synonyms

Here they talk about similarities between proverbs in different languages as "similar images" We are currently lacking a property to link near synonyms. Would a new property for senses "similar language image" be a good idea? WDYT? So9q (talk) 04:00, 30 November 2023 (UTC)