Wikidata:Lexicographical data/Best practices/pl
This page serves as a repository of best practices established over time by different lexeme contributors, often after some descriptions of them in other fora. These may be discussed on this page's talk page if desired.
Should there be a lexeme for it?
- There should be some evidence of the existence of a lexeme in a language at the time that lexeme is created on Wikidata.
- The better documented a language is in general, the more the above should be treated as a requirement rather than merely a best practice.
- As a result, for languages like English, French, Spanish, Mandarin, Russian, and Arabic—that are supported by nation-states and that, by virtue of being used to communicate all sorts of information among very large groups of people, are expected to have diverse vocabularies—this should be taken as obligatory regardless of one's fluency in that language.
- For less well-documented languages like Breton, Sindhi, Acehnese, and Guarani, this remains merely a strong recommendation: once a resource is found for that language, attempts should be taken to use it as evidence for as many existing lexemes in that language as possible.
- For even less well-documented languages like Skolt Sami, Igbo, Angika, and Cia-Cia, this is much less binding even as a recommendation—especially when you are a native speaker of that language and can thus vouch for the use of a particular lexeme in your language community.
- The evidence for the existence of a lexeme may be indicated in a number of ways:
- through adding an external identifier to a lexical resource which describes the lexeme in question; or
- through adding a described by source (P1343) statement pointing to a resource where the lexeme is described (qualified with page(s) (P304) or other specifiers of where in that resource that description occurs); or
- through adding a described at URL (P973) statement pointing to an online resource where the lexeme is described (the most specific URL for this if possible, or with the same sorts of qualifiers that might be used for P1343); or
- through adding a usage example (P5831) statement demonstrating use of that lexeme in some external source (where this source is provided as a reference); or
- through adding a gloss quote (P8394) statement on one of the lexeme's senses providing how the corresponding meaning of that lexeme is expressed in some resource (this resource provided as a reference).
- The better documented a language is in general, the more the above should be treated as a requirement rather than merely a best practice.
- In general, while individual words that aren't merely inflections of other words might warrant lexemes, non-idiomatic phrases typically do not warrant them, since they may be treated as the sum of their parts.
- This does not necessarily discount the addition of non-idiomatic meaning senses to lexemes which do have idiomatic meanings, however, and which have those idiomatic meanings as senses already.
Lemmata
- The lemma of a lexeme should ideally be the representation of that lexeme that is provided in a dictionary. What representation this is will generally depend on the lexeme's language and lexical category.
- Take Indo-European languages: for nouns and adjectives, this may reflect some combination of nominative case, singular number, and masculine gender; for verbs, this may be the infinitival or verbal noun form.
- Other languages may present lemmata differently, for which a non-exhaustive list is given below:
- An Arabic verb generally uses the masculine third-person singular perfect active indicative as a lemma ('كَتَبَ' for 'to write').
- A Korean verb generally uses the verb stem followed by the dedicated citation suffix '-다' ('가다' for 'to go').
- An isiZulu verb generally uses the verb stem on its own, including the final vowel 'a' ('shaywa' for 'to be struck').
- If there are multiple scripts in which a language is generally written, it is desirable for the lemma to contain a representation for each script.
- Where a correspondence in representation exists between multiple related scripts, repeating that correspondence may not be necessary.
- For those Mandarin lexemes which have not been affected by character simplification, a single lemma with code 'zh' suffices.
- For those Esperanto lexemes which do not change under 'hsistemo' or 'xsistemo', a single lemma with code 'eo' suffices.
- Where a correspondence in representation exists between multiple related scripts, repeating that correspondence may not be necessary.
Kategorie leksykalne
- In general, a instance of (P31) value on a lexeme should be more specific than the lexeme's lexical category.
- Thus if abbreviation (Q102786) is a lexical category, there is no need to re-add it as a P31.
Lexeme statements
Derivations
Acronyms should qualify derived from lexeme (P5191) with mode of derivation (P5886) acronym (Q101244) (see ffs (L406751)).
Formy
To help establish the existence and use of a lexeme, at least one form should be referenced—perhaps on a usage example (P5831) statement qualified with subject form (P5830) [the form in question], or on another statement (described by source (P1343), attested as (P7855) or attested in (P5323) are possible other properties). The goal is to have all forms attested or referenced with at least one date, preferably with these dates years apart.
Senses
Tłumaczenia
- Ogólnie dobrą praktyką jest unikanie dodawania translation (P5972) pomiędzy „każdą” parą możliwych tłumaczeń.
- If there is a path of P5972 statements between two senses, that is enough to establish a link between them.
- If every such word is connected to the same item via item for this sense (P5137), then the P5137 links are also enough to establish links between them.
- This is analogous to not stating directly that Manhattan (Q11299) located in the administrative territorial entity (P131) New York (Q1384) (we can infer that through it being P131 New York City (Q60) first), and similarly not stating directly that art museum (Q207694) subclass of (P279) museum (Q33506) (we can infer that through it being P279 museum of culture (Q28737012) first).