Wikidata:Lexicographical data/Layout

This page is a work in progress, not an article or policy, and may be incomplete and/or unreliable.
Please offer suggestions on the talk page.

This is a specification about how data can be modeled (organized) in Wikidata. Everyone can edit this page before Wiktionary support is deployed (and ideally Wiktionary support should not be deployed before the page is completed).

The page also includes properties needed for organizing lexeme data. all terms in <> refers to proposed properties, and the datatype (may be omitted) followed in parentheses; {} refers to proposed items.

Lemma

Part of Lexeme

The lemma is a human readable representation of the lexeme. For languages with two cases of script, the entry name usually begins with a lowercase letter. For example, use work for the English noun and verb, not Work. Words which begin with a capital letter in running text are exceptions. For prefixes, suffixes and other morphemes in most languages, place the character "-" where it links with other words.

Most symbols such as # or | are allowed in lexeme and should be written as such. The title of a matched-pair entry consists of both left and right symbols, with a space in between.

Lexical category / Part of speech

Part of Lexeme

This is a reference to a concrete Item.

Allowed POS headers:

Parts of speech: Adjective, Adverb, Ambiposition, Article, Circumposition, Classifier, Conjunction, Contraction, Counter, Determiner, Ideophone, Interjection, Noun, Numeral, Participle, Particle, Postposition, Preposition, Pronoun, Proper noun, Verb
Morphemes: Circumfix, Combining form, Infix, Interfix, Prefix, Root, Suffix
Symbols and characters: Diacritical mark, Letter, Ligature, Number, Punctuation mark, Syllable, Symbol
Phrases: Phrase, Proverb, Prepositional phrase
Lojban-specific parts of speech: Brivla, Cmavo, Gismu, Lujvo, Rafsi

Transliteration

As Statement of Form

See [1].

Alternative forms

As Form of Lexeme, or Representation of Form

Alternative forms to be provided as other Forms:

hyphenization/compounds: tea cup, tea-cup, teacup
style variation: naiveté, naïveté
uncertain capitalization: laser, LASER

Alternative forms to be provided as another language/variant of Representation in a Form"

regional variations: color, colour; center, centre
different scripts: реч, reč (Serbo-Croatian for word)

Unclear:

historical variations: anæmia, anaemia; coördinate, coordinate

Description

As Statement of Lexeme or Item

<glyph description> (multilingual text) is a visual description the current symbol. See Wiktionary:Votes/2016-08/Description

Glyph origin

As Statement of Lexeme or Item

Maybe we have a property "glyph derived from", but to model compound character we have to use a more complex layout.

Layout 1

<glyph origin> (item)

{borrowing}, qualifier <of lexeme>/<of form>/of (P642) e.g. A (Latin letter) => Α (Greek letter)
{combination}, qualifier <of lexeme>/<of form>/of (P642) e.g. Ǳ => D and Z
{modification}, qualifier <of lexeme>/<of form>/of (P642) e.g. G => C
{pictogram}, qualifier <glyph interpretation> (multilingual text) e.g. 日 => sun
{(simple) ideogram}, qualifier <glyph interpretation> e.g. 一 => one
{compound ideographs}, qualifier <of lexeme>/<of form>/of (P642) and <glyph interpretation> e.g. 武 => 止 and 戈; army going on expedition
{phono-semantic compound}, qualifier <semantic part> (lexeme/form/item) and <phonetic part> (lexeme/form/item) e.g. 菜 => semantic 艹 + phonetic 采

Note we currently use based on (P144) but it is for works.

Layout 2

<glyph derived from> (lexeme/form/item), qualifier <mode of derivation> (item) and <glyph interpretation>

Etymology

As Statement of Lexeme, or of Form

Etymology is preferably stored in lexemes, except suppletions (e.g. went). Only direct etymology is needed. It's better to create seperate lexemes for different etymology, rather than using a qualifier to indicate this.

Similarly we also have two possible layout for etymology.

Layout 1

<etymology> (item): back-formation (Q989162)/contamination (Q287903)/calque (Q204826)/..., qualifier <of lexeme>/<of form> e.g. Wikipedia is a contamination (Q287903) of wiki +‎ encyclopedia

Layout 2

This is the current layout at Wikidata:Wiktionary/Data model examples/hard (adjective, English).

<derived from> (form/item), qualifier <mode of derivation> (item)

Note this layout have several flaws:

It may result in many duplicated qualifiers.
It's not easy to handle multiple possible etymologies.
Onomatopoeia can not be handled.

Common property

<compound of> (phrase) e.g. "seeing is believing" -> "seeing", "is", "believing"
<akin to> (lexeme)

"phrase" may be a new datatype which is a ordered list of lexemes/senses. it may be modeled as a list of values ordered by series ordinal (P1545), but this is not useful for translation (as a term can be translated to different phrases in different languages).

Pronunciation

As Statement of Form

See IPA transcription (P898), pronunciation audio (P443). Sometimes <refers to sense> should be used as qualifier.

Region or accent pronunciation may be indicated by qualifier valid in place (P3005), and new property <accent> (item), <standard in place> (item), <variant in place> (item).

As rhyme is a transitive relation, we use a single property <rhyme> (item) to express rhyme, rather than listing all words that is rhymed with a specific word (like Wikidata:Wiktionary/Data model examples/hard (adjective, English))

<syllabification> (multilingual text?)
<word with diacritical signs> (monolingual text?) e.g. līber, كِتَاب

Morphology

As Statement of Lexeme

Separate lexemes(?) may be created for different morphology features.

<type of declension> (item)
<noun class> (item)
<grammatical gender> (item)?

Inflection

As Form of Lexeme

Each form defines how a lexeme changes based on a specific syntactic role or mode it may take in a sentence. A form includes a list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items.

Definitions

As Sense of Lexeme

Each definition is a sense. The sense can be qualified by statements:

<grammatical property> (item): countable, uncountable, etc.
valid in place (P3005)
<apply to variant> (item)
<register> (item)
<subject field> (item) (reuse existing field of work (P101)?)
<connotation> (item)? (pejorative, etc.)
<grammatical frame> (string)
<classifier> (lexeme)
<attested since> (time)
<attested until> (time)

Usage notes

As Statement of Sense

<usage note> (multilingual text)

Example sentences and quotations

As Statement of Sense

<example sentence> (multilingual text)
<quotation> (multilingual text)

Semantic relations

As Statement of Sense

<synonym> (sense)
<antonym> (sense)
<hypernym> (sense)
<hyponym> (sense)
<meronym> (sense)
<holonym> (sense)
<troponym> (sense)
<coordinate term> (sense)?

Derived terms

As Statement of Sense

<derived term> (lexeme)

Related terms

As Statement of Lexeme

<related terms> (lexeme)?

Translations and denotements

As Statement of Sense

Translations can be either done from sense to sense, or by a sense referencing a common Wikidata item. If the latter is done, the translations will be automatically displayed and kept up to date. This is only possible when the translation is symmetric and transitive, which is often not the case — but frequently enough to merit a specific implementation.

<equivalent concept> (item): book => book (Q571)
<related concept> (item): hot => temperature (Q11466)
<translation> (sense/phrase): whether it is needed? a concept may have thousands of translations.--GZWDer (talk) 05:49, 31 August 2017 (UTC)[reply]

Descendants

As Statement of Lexeme/Sense

<descendant> (lexeme)?

Identifiers and references

As Statement of Lexeme

Anagrams

As Statement of Form

Anagrams can be queried by common alphagram value.

<alphagram> (string)

Wikidata:Lexicographical data/Layout

Contents

Lemma

Lexical category / Part of speech

Transliteration

Alternative forms

Description

Glyph origin

Layout 1

Layout 2

Etymology

Layout 1

Layout 2

Common property

Pronunciation

Morphology

Inflection

Definitions

Usage notes

Example sentences and quotations

Semantic relations

Derived terms

Related terms

Translations and denotements

Descendants

Identifiers and references

Anagrams

Navigation menu

Wikidata:Lexicographical data/Layout

Lemma

Lexical category / Part of speech

Transliteration

Alternative forms

Description

Glyph origin

Layout 1

Layout 2

Etymology

Layout 1

Layout 2

Common property

Pronunciation

Morphology

Inflection

Definitions

Usage notes

Example sentences and quotations

Semantic relations

Derived terms

Related terms

Translations and denotements

Descendants

Identifiers and references

Anagrams

Navigation menu

Search