Wikidata:Property proposal/hyphenation

From Wikidata
Jump to navigation Jump to search

hyphenation[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Done: hyphenation (P5279) (Talk and documentation)
Data typeString
DomainForms on Lexemes
Example
  • example@en → ex‧am‧ple
  • hyphenate@en → hy‧phen‧ate
  • orthography@en → or‧thog‧ra‧phy

Motivation

See Wikidata talk:Lexicographical data/Archive/2017/08#Syllabification?. Authorities sometimes differ on hyphenation, e.g., dictionary is hyphenated as dic‧tion‧ar‧y according to the WordReference Random House Unabridged Dictionary of American English but as dic‧tio‧nary according to Merriam-Webster. Both variants can be stated with a reference.

Which separator to use? There is U+2027 (HYPHENATION POINT) but the English Wiktionary uses U+00B7 (MIDDLE DOT), the Wikidata model example Leiter (noun, German) uses a hyphen and some dictionaries use a vertical line. In the Russian Wiktionary, a green hyphen is used at break points and a red middle dot at syllable boundaries where a word cannot be hyphenated, e.g., о·гры́-зок or веч-но-дви-га-те-ле-стро·е́-ни·е. In German, there are also break points that do not correspond to syllable boundaries (e.g., Lin‧oleum). Anyway, I think this property should not cover phonological syllabification (for which the IPA transcription can be used, e.g., French /ɛɡ.zɑ̃pl/).

How to handle words spelled with a hyphen like three-dimensional? One can hyphenate between three- and dimensional but no additional hyphen is inserted. After all, we do not hyphenate like this:

three--
dimensional

But like this:

three-
dimensional

So what to write? Still three-‧di‧men‧sion‧al or rather three‧di‧men‧sion‧al (compare with the traditional German orthography: it was written Schiffahrt and Zucker but hyphenated Schiff‧fahrt and Zuk‧ker; today it is Schifffahrt and Zu‧cker) or simply three-di‧men‧sion‧al (which would mean that there are two different separators)?

On another note, the German word Interesse, for instance, may be hyphenated as Inter‧esse based on morphology but also as Inte‧resse based on pronunciation. In the German Wiktionary, all break points are indicated in a single occurrence of the spelling, so we have “In·te·r·es·se”. But this may be confusing because r by itself would barely make sense. In Wikidata, we could give two variants instead:

  • In‧te‧res‧se
  • In‧ter‧es‧se

This is rather theoretical because a single word rarely extends over three lines. As a more complex example:

  • Ge‧ri‧a‧trie
  • Ge‧ri‧at‧rie
  • Ger‧ia‧trie
  • Ger‧iat‧rie

(In the German Wiktionary, all break points are economically subsumed into “Ge·r·i·a·t·rie”.)

IvanP (talk) 17:30, 1 June 2018 (UTC)[reply]

Discussion

@ArthurPSmith, Duesentrieb, Jura1, JAn Dudík, IvanP: ✓ Done: hyphenation (P5279). − Pintoch (talk) 08:34, 9 June 2018 (UTC)[reply]