Wikidata:Property proposal/word stem type

From Wikidata
Jump to navigation Jump to search

Zaliznyak word stem class[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionDescribes the type of word stem
RepresentsZaliznyak's сlassification (Q66148413)
Data typeString
DomainLexemes
Allowed valuesstring
Allowed unitsnumbers
Example 1вода (L189) → 1
Example 2магия (L57919) → 7
Example 3Земля (L34843) → 2
Sourcewikt:ru:Викисловарь:Использование_словаря_Зализняка
Planned usePlanned filling out Russian lexemes
Robot and gadget jobsIt is planned to write a bot that will update these properties for Russian lexemes.
See alsoword stem (P5187), paradigm class (P5911)

Zaliznyak stress pattern[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionDescribes the word's stress pattern
RepresentsZaliznyak's сlassification (Q66148413)
Data typeString
DomainLexemes
Allowed valuesstring
Allowed unitsletters
Example 1вода (L189) → d
Example 2магия (L57919) → a
Example 3Земля (L34843) → d
Sourcewikt:ru:Викисловарь:Использование_словаря_Зализняка
Planned usePlanned filling out Russian lexemes
Robot and gadget jobsIt is planned to write a bot that will update these properties for Russian lexemes.
See alsoword stem (P5187), paradigm class (P5911)

Zaliznyak the alternation of the vowel[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionDescribes the alternation of the vowel
RepresentsZaliznyak's сlassification (Q66148413)
Data typeString
DomainLexemes
Allowed valuesyes or no
Example 1вода (L189) → no
Example 2магия (L57919) → no
Example 3Земля (L34843) → yes
Sourcewikt:ru:Викисловарь:Использование_словаря_Зализняка
Planned usePlanned filling out Russian lexemes
Robot and gadget jobsIt is planned to write a bot that will update these properties for Russian lexemes.
See alsoword stem (P5187), paradigm class (P5911)

Motivation[edit]

It is planned to fill in tokens from the Wiktionary. Since the basis for classifying lexemes in the Russian Wiktionary is the Zeleznyak classification, several properties are needed for support. If this property is approved, then the following properties will also be necessary: the stress position, the alternation of the vowel. This property must be specified as a qualifier for Zaliznyak's сlassification (Q66148413), which is a statement paradigm class (P5911). Iniquity (talk) 20:56, 5 August 2019 (UTC)[reply]

Useful links:

Discussion[edit]

 Comment the data type should probably be "item", with items created for each word stem type. ArthurPSmith (talk) 17:53, 6 August 2019 (UTC)[reply]
In Russian, these are only numbers, and there are generally about 8 for each part of the speech, do you think it is worth creating separate elements? Or do other languages have the same property? Iniquity (talk) 17:59, 6 August 2019 (UTC)[reply]
 Comment If it's just for Zaliznyak's classification, the label should probably reflect that. --- Jura 18:02, 6 August 2019 (UTC)[reply]
Yes, I thought about it, but if several languages are allowed to have a similar classification, it might be better to use a common property? Or is it better to first create a less general one, and then change it? Iniquity (talk) 18:04, 6 August 2019 (UTC)[reply]
Changing it around once created isn't really a good idea. It's generally easier to build and maintain a property with a clear scope. If one language has different ways of classifying the same, this might not work out well in a single property as multiple values in a single property are harder to maintain (at least, that's my view). --- Jura 14:06, 7 August 2019 (UTC)[reply]
I think you're right, the more it is not known whether similar typing exists in other languages. What do you think, if we call it"word stem type according to the classification of Zelensky", will it be okay? Iniquity (talk) 17:30, 7 August 2019 (UTC)[reply]
Maybe it can be done shorter, but I'm not really the best person to ask about the terminology and translation to use. --- Jura 11:35, 8 August 2019 (UTC)[reply]
I think something like that. Iniquity (talk) 15:24, 8 August 2019 (UTC)[reply]
Using multiple value types in the same property make processing data a bit harder. If the property is specifically Zaliznyak's, it shouldn't be used for anything else -- e.g. we don't have a property "book ID", we have a property "ISBN-10" -- because there is a clear expectation of the data it will have. Of course this is more relevant to a free-form string type. Items can have their own instance-of/subclass-of, but still querying for such data is always messier & slower. --Yurik (talk) 15:29, 8 August 2019 (UTC)[reply]
Per chat discussion - I think it would be far better to have values be items rather than strings. There are very few and well defined types (1-8 for first, a-d for second), plus each type could be good in describing what it is in multiple languages + statements. The remaining question is how many of these properties should we need, and if that property(s) should be one or multiple (e.g. the 1-8 are set on one property, the a-d on another), or should there be just one property, but with multiple values. Also if it is only one, should it be Zaleznyak specific, or should it be applicable to all languages that have word classification system (which most probably do). So to sum up, how about all-language word classifier property with multiple values, e.g. "word classification" = "Zaleznyak class 1", "Zaleznyak type a" (two values for the same property, property is not Zaleznyak specific). --Yurik (talk) 21:06, 8 August 2019 (UTC)[reply]