Wikidata:Lexicographical data/Documentation/Languages/he

From Wikidata
Jump to navigation Jump to search
Translate this page; This page contains changes. Please contact a translation admin to mark them for translation.

This is the documentation page for Lexicographical data of Hebrew (Q9288), the main and official language of Israel (Q801), also used as the main liturgical language of Judaism (Q9268). The documentation is mostly geared toward recording lexemes of Modern Hebrew (Q8141), but it should be borne in mind that other historical varieties of Hebrew exist, such as Biblical Hebrew (Q1982248), Mishnaic Hebrew (Q1649362), or Medieval Hebrew (Q2712572). Forms that are used exclusively in one of the older varieties should be qualified as such.

The modeling recommendations described are are not finalized. As the work progresses, they may change.

Status[edit]

As of mid-2024, many Hebrew nouns and verbs were automatically exported from Hspell (Q6936841). However, there is a lot more work to do:

  • Coverage of nouns, adjectives, and verbs is considerable, but far from complete.
  • Coverage of other parts of speech is very partial.
  • Adding roots has only begun.
  • Some lemmas and forms have incorrectly vocalized representations, and some don't have them at all.
  • Many lexemes lack senses.
  • Many lexemes lack statements that these recommendations require.

Script and spelling variants[edit]

Hebrew is written using the Hebrew alphabet (Q33513), which is an abjad (Q185087), meaning that each letter stands basically for a consonant, and the vowels are not written (or rather, written differently from the way they are written in an alphabet (Q9779)).

The standard writing of Hebrew words using only the letters of this script (and not diacritical marks) has most recently been codified by the Academy of the Hebrew Language (Q190400) in its Plene spelling standard of the Academy of the Hebrew Language, 2017 (Q84822205). Lexemes should be entered using this spelling standard with the plain language code he. (Do not use he-x-Q84822205 for this.)

Since the 5th century, a system of diacritic vowel marks has been elaborated, known as the Tiberian vocalization (Q21283070). In modern use, this system is mostly used with liturgical texts, poetry, children's books, and dictionaries, or when disambiguation between similar-looking words is needed. The vocalized spelling of a lexeme should be entered as a spelling variant with the language code he-x-Q21283070.

Some Hebrew words have alternate spellings. These can be entered as spelling variants to make them findable, but they should be tagged by the type of alternation. Several of those are defined at the moment, and more may be added:

Alternate Hebrew spelling practices
system examples comments
Tiberian vocalization (Q21283070) עכשיו/עַכְשָׁו (L492064) Should be added to all lexemes except roots. (N.B.: Some other exceptions may be added in the future.)
writing Hebrew nly/h roots with a final he (Q125521576) גלי/גלה (L1320375) See the "Roots" section for details.
writing Alef at the end of Hebrew words loaned from Aramaic (Q125560819) קושיה/קושיא/קֻשְׁיָה (L218055)
קופסה/קופסא/קֻפְסָה (L67821)
alternate Hebrew spelling with Alef (Q125560856) בינרי/בינארי/בִּינָרִי (L205772)
טונלי/טונאלי/טוֹנָלִי (L211311)
alternate Hebrew spelling with Yod in words with quadriliteral roots (Q125560888) אפשר/איפשר/אִפְשֵׁר (L205270)

Online references[edit]

You can link lexemes to online references.

Identifier properties[edit]

Roots[edit]

In Semitic languages, all native words, as well as some loanwords, are derived from a Semitic root (Q266273). In Hebrew, this is true about all verbs and about most nouns.

A Semitic root is an abstract sequence of consonants. It cannot be used as a word in a language, but it has a generic sense, sometimes more than one. True words are created by inserting vowels between these consonants, adding a prefix (Q134830) or a suffix (Q102047), and sometimes changing the consonants themselves.

The roots themselves can and should be stored as lexeme items. The lemma must be only with the language code "he", and a variant with vocalization is not needed. The lemma must include only the consonant root letters; don't use periods (.), hyphens (- or ־), or gershayim (Q5553090) (" or ״). If the last letter of the root has a final form (Q5449465), write it in that form (םןץףך and not מנצפכ). If a root has the letter sin (שׂ), add the dot on the left; if it's shin, don't add the dot on the right.

A few specific roots are notable enough to have Wikipedia articles and corresponding Wikidata Q items, such as Ḥ-M-D (Q3138823) and K-T-B (Q6322778). These should be linked using the item for this sense (P5137) property. (N.B.: This recommendation may change. Perhaps there's a more appropriate property.)

Some roots are homographs—they have the same letters, but a different meaning, and possibily a different etymology. When the etymology and the meaning are definitely different, create separate lexeme items. When the etymologies and the meaning may be related, add several senses to the same lexeme. Example: כפר (L1320773) (forgive, atone), כפר (L1320774) (challenge authority, plead not guilty), and כפר (L1320775) (village). (N.B.: Dividing this precisely is challenging, so this recommendation may be updated in the future.)

As of April 2024, very few Hebrew roots have been added. Ideally, all roots must be entered, and all verbs and all relevant nouns must have a defined root property.

Useful queries:

Root categorization[edit]

Roots that are used only for nominals and not for verbs must have the instance of (P31) property, with the value being nominal root (Q125544382). (N.B. This recommendation may change, see talk page.)

Every root that can be used for verbs belongs to a weakness group (גזרה (Q12404900)), which includes the strong verbs (גזרת השלמים (Q125521603)). Every lexeme must have the instance of (P31) property, with the value being its weakness group. (N.B.: Perhaps there should be a dedicated property for this.)

Some particular notes about specific groups follow.

Hebrew roots with third letter y/h (Q113383478)[edit]

For this weakness group, there are two traditions to write the root: with the letter yod (י (L65516)) or with the letter he (ה (L64762)) in the end. Some dictionaries and textbooks use one system, some use the second, and some use both, and students of Hebrew may search using each. Therefore, these roots should be stored using both ways to be easily findable.

Enter both forms in one lexeme item: the form with the letter yod as the lemma with the language code "he", and the form with the letter he as the lemma with language code he-x-Q125521576 (the code refers to a "mini-orthography": writing Hebrew nly/h roots with a final he (Q125521576)).

For an example, see גלי/גלה (L1320375).

To do[edit]

  • Finalize which property to use for linking to Q items about roots. item for this sense (P5137) is probably not perfect. There should probably be a new one.

Parts of speech[edit]

According to the Center of Educational Technology, we can divide the Hebrew vocabulary into six major parts of speech, using the following scheme:

To this list, we may add some minor parts of speech:

There is a single article (Q103184) in Hebrew: ה/הַ (L7396).

Nouns and adjectives[edit]

There is no clear-cut distinction between nouns and adjectives in Hebrew. Adjectives can often stand on their own as nouns. For instance, חכם/חָכָם (L65269) can mean both "wise" as an adjective or "a wise man" as a noun. Usually, if a nominal can act both as a noun and as an adjective, it is better to classify it as an adjective. In Hebrew traditional grammar, both are considered nominal (Q503992) (שם/שֵׁם (L68396)).

The lemma of a nominal is typically its masculine-singular form.

Nouns and adjectives of Semitic origin are derived from a Semitic root (Q266273), which is most often triconsonantal. These should be linked with the property root (P5920).

Unlike Hebrew verbs, all of which have a Semitic root, some Hebrew nominals don't have one that can be clearly defined in terms of traditional Semitic grammar description, especially those that were loaned from non-Semitic languages.

The nominals have the following inflection features:

grammatical number (Q104083)
It can be singular (Q110786), plural (Q146786), or in some cases dual (Q110022). Some nouns, such as מים/מַיִם (L66237) water or שמיים/שָׁמַיִם (L68414) sky are plurale tantum (Q138246), and should thus have the grammatical number (P11054) property marked with the value plural (Q146786) (the lexical category should still be noun (Q1084)).
grammatical gender (Q162378)
It can be either masculine (Q499327) or feminine (Q1775415). Adjectives and nouns of animate beings inflect for gender, while nouns of inanimate beings have a fixed gender and should use the grammatical gender (P5185) property.
state (Q70797774)
Nominals are said to be in the construct state (Q1641446) when modified directly by a following noun. Otherwise, they are in the absolute state (Q70798722). For masculine nouns, these forms are often the same in the unvocalized writing. Nouns may also have a special form, often discernible only when they are vocalized, when they are suffixed by the pronominal possessive enclitics, called pronominal state (Q115767254), e.g. the form ילד/יַלְדּ (L65603-F9) of ילד/יֶלֶד (L65603). The latter need only be indicated when it is different from the construct state form. In some cases, there may be two distinct pronominal forms, which differ only in vocalization, and which alternate depending on the form of the enclitic (and especially on the position of the stress of the full word). In such cases, both should be listed, as is done in שם/שֵׁם (L68396).

Note: Nouns can be modified by the pronominal enclitics, which are possessive (Q2105891). While some see this as inflection, these forms are better seen as augmented by enclitics. Thus, there is no need to explicitly list the possessed forms of nouns in their lexeme entry (contrary to current usage), as this inflates the number of forms significantly.

Verbs[edit]

Hebrew verbs are always derived from a consonantal Semitic root (Q266273), which should be indicated by the root (P5920) property. That root is always put in a derived stem (Q17119048), which should be indicated by the conjugation class (P5186) property. The voice (Q211101) of the verb depends on the derived stem. The seven derived stems available in Hebrew are the following, paired by voice where relevant:

Verbs have the following inflection features:

grammatical tense (Q177691)
In Modern Hebrew, this can be future tense (Q501405), past tense (Q1994301) or present tense (Q192613). The latter originates in a nominal form (a participle (Q814722)), and as such, it can also be listed separately as a noun (e.g. כותב/כּוֹתֵב (L65724) and the corresponding verbal form כותב (L212243-F5)). As part of the verbal paradigm, there is no need to indicate its state (Q70797774) morphology.
grammatical person (Q690940)
This can be one of first person (Q21714344), second person (Q51929049), or third person (Q51929074). The person inflection is not apparent in the present tense (Q192613). The imperative (Q22716) can be assumed to be always in the second person.
grammatical gender (Q162378)
This can be either masculine (Q499327) or feminine (Q1775415). It is always marked in the present tense (Q192613). In the future tense (Q501405), the first person (Q21714344) is never marked for gender. In the past tense (Q1994301), the first person (Q21714344) and the third person (Q51929074) plural (Q146786) is never marked for gender.
grammatical number (Q104083)
This can be either singular (Q110786) or plural (Q146786).
grammatical mood (Q184932)
The majority of Hebrew verbal forms are indicative (Q682111), and this need not be marked explicitly. On morphological grounds, we can distinguish two moods in Modern Hebrew: imperative (Q22716) and infinitive (Q179230). The productive Modern infinitive form is always the one that begins with the prefix ל/לְ (L1319984), whereas Biblical Hebrew (Q1982248) also has the absolute infinitive form, corresponding to the Arabic masdar (Q97662006). Biblical Hebrew has two more moods, which aren't productive in Modern Hebrew: the cohortative (Q500726) (in first person only, e.g. אלכה/אֵלְכָה (L184903-F34)) and jussive (Q462367) (e.g. יהי/יְהִי (L207795-F25)).

In accordance with the traditional Hebrew grammar description and the practice in most published dictionaries, the lemma of the verb is the past tense (Q1994301) third person (Q51929074) masculine (Q499327) singular (Q110786) form.

Note: Verbs can be augmented by the pronominal object enclitics. While some see this as inflection, these forms are better seen as augmented by enclitics. As the inflection pattern is quite regular, there is no need to explicitly list the object forms of verbs in their lexeme entry (contrary to current usage), as this the number of forms inflates significantly.

To do[edit]

Some more topics should be discussed in this document, but aren't discussed yet:

  • The current state: what words is already entered, what is left to be done, etc.
  • How to indicate that a loan nominal doesn't need a Semitic root. (This can be useful to see which nominals need a root value, and which don't.)
  • How to store absolute infinitive (מקור מוחלט).
  • How to store conjunctions and prepositions (and to actually create pages for all of them).
  • How to list proper nouns?

See also[edit]