Wikidata:Property proposal/Cuneiform character in this lexeme

From Wikidata
Jump to navigation Jump to search

Cuneiform character in this lexeme[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Withdrawn
DescriptionCuneiform character(s) this lexeme consists of
Representscuneiform sign (Q23017336)
Data typeItem
Domainlexeme, form
Example 1ga/𒂷 (L726974)𒂷 (Q87555355)
Example 2dumu/𒌉 (L643788)𒌉 (Q87556519)
Example 3dingir/𒀭 (L724542)𒀭 (Q87555087)
Planned useLinking Sumerian lexemes to Unicode representations. In the future, this property may also be used to link Akkadian, Hittite, and Lexemes in further languages written in the cuneiform script
See alsoHan character in this lexeme (P5425) which links Han Chinese characters in Japanese and Chinese lexemes to Unicode

This property is now discussed as https://www.wikidata.org/wiki/Wikidata:Property_proposal/character_in_this_lexeme

Motivation[edit]

Sumerian, Akkadian, Hittite, and other languages are written in the cuneiform script. Currently, we lack a property in Wikidata to link lexeme representations of these languages to their Unicode code points, similarly to property Han character in this lexeme (P5425) which links Han Chinese characters in Chinese, Japanese, and Vietnamese to their respective representations in Wikidata.  – The preceding unsigned comment was added by Situxx (talk • contribs) at 12:51, November 10, 2022‎ (UTC).

Discussion[edit]

  •  Oppose I don't think Han character in this lexeme (P5425) should set a precedent. Items for Han characters and Unicode itself have a lot of extra linguistic information for Han characters which wouldn't be on the lexemes (e.g. stroke count (P5205), grade of kanji (P5277), radical (P5280), ideographic description sequence (P5753)) and that doesn't seem to be the case here. - Nikki (talk) 12:23, 25 November 2022 (UTC)[reply]
    Thanks for your comment.
    We can separate cuneiform characters in cuneiform wedge compositions, whereas different wedge types are distinguished depending on their direction on the unit circle.
    That makes it possible to also state a "stroke count", more like a "wedge count" or to express the cuneiform sign in different sign description languages.
    A contested subject in research is the stroke order, as it can often not be retraced from a cuneiform tablet alone.
    We also have "radicals" in cuneiform. About 600 of the cuneiform codepoints are comprised of subparts reused from other characters.
    The "stroke count" or wedge count varies over the centuries and a modeling of etymology per time period would also be feasible I suppose.
    A degree of difficulty definition is not known to me however. I think nobody defined that yet.
    Ideographic description sequences should also exist which I would derive from the character compositions.
    So this would also be available, even though the aforementioned properties are most likely only to be used for Han Chinese characters.
    I will link to some publications to substantiate my claims here:
    PaleoCodage: Character description language for cuneiform https://academic.oup.com/dsh/article/36/Supplement_2/ii127/6421811 also briefly discussion "radicals"
    Gottstein System: Search system for cuneiform signs based on the composition of wedge types https://www.materiale-textkulturen.de/mtc_blog/2012_005_Gottstein.pdf
    Wedge Order: https://www.researchgate.net/publication/301608208_Current_Research_in_Cuneiform_Palaeography_Proceedings_of_the_Workshop_organised_at_the_60_Rencontre_Assyriologique_Internationale_Warsaw_2014
    page 1
    That being said, that is also kind of the point of this proposal. The properties available for Han characters are in my opinion applicable to the cuneiform script as well, but only defined for Han characters.
    If we had more general properties, that would also cover the cuneiform script, I would happily used those, but it would seem that they are not available. Situxx (talk) 18:07, 25 November 2022 (UTC)[reply]
     Support Situxx is correct, and I fully support this proposal (as an Assyriologist). There is as much complexity in cuneiform characters, if not more. Assyriologists also mark these distinguishing characteristics for each character at the level of each wedge as they differ from the 'gestalt' as defined during the Old Babylonian period (https://www.wikidata.org/wiki/Q114877765) and the subtle changes over time are quite dramatic and informative. The scribal tradition becomes especially complex during the Neo-Babylonian period (https://www.wikidata.org/wiki/Q114869307), when scribes intentionally use more complex characters with archaizing features to make their texts look more ancient than they actually are. For example, one of the challenges we face is that many cuneiform characters can be written inside larger signs, these are called compound logograms, and can be found in many periods over the 4000 years of cuneiform writing. These compound logograms combine the semantic and morphological qualities from both signs as they are written together. One of best discussions on this subject is the edited book: The first writing : script invention as history and process [2004] (https://worldcat.org/en/title/224913083). In chapter 4, Jerry Cooper makes a number of comparisons between cuneiform and the Chinese writing system. In doing so, this chapter outlines a number of the similar complexities that exist in cuneiform, and makes some comments about changes in its use by the languages of Sumerian, Akkadian, Hittite, Ugaritic, Elamite, and others over time. The proposal here would allow for Assyriologists to mark these distinguishing characteristics as properties for the development of machine-readable tools and methods. Admndrsn (talk) 19:09, 25 November 2022 (UTC)[reply]
  •  Support This seems fine to me (given above arguments, also User:Admndrsn should probably count as an additional supporting comment). ArthurPSmith (talk) 19:28, 29 November 2022 (UTC)[reply]
     Support As Assyriologist and digital humanist I support the proposal of Situxx. Structurally, there is much to say for the comparison between cuneiform and Chinese strokes. As said above, though the scripts are culturally unrelated, they share formal elements like the use of logograms and compound signs (see also Gong, Y., Yan, H., & Ge, Y. 2009. The Accounts of the Origin of Writing from Sumer, Egypt and China — A Comparative Perspective. Wiener Zeitschrift Für Die Kunde Des Morgenlandes, 99, 137–158. http://www.jstor.org/stable/23861987). Palaeocodage and the proposal above can become a standard way to describe signs based on the relations of strokes to one another, especially useful in terms of digital palaeography and the annotation framework for the purpose of computational analysis. In short, this should be fully endorsed. Shygordin (talk) 23:15, 30 November 2022 (GMT+2)
  •  Comment we absolutely need a property for that (I mentioned it on Lexeme talk:L1 back in 2018) but I would be more in favour of a general properties as Situxx said (either by repurposing or deleting/replacing the Han property). PS: I unmarked this proposal as "ready" as I think we need a bit more time. Cheers, VIGNERON (talk) 12:24, 4 December 2022 (UTC)[reply]
    This might be a good idea. If we can find a general solution for all aforementioned properties (also stroke count a.s.o.), that would surely be benefitial. If not, there will be a lot of follow-up property proposals like "wedge count" or similar ones. Situxx (talk) 23:05, 4 December 2022 (UTC)[reply]
    Is there any update on the discussion here?
    Could we not simply add this property and define maybe a more general property such as "Unicode character in this lexeme"?
    In doing so we could declare Han character in this lexeme and Cuneiform Character in this lexeme as subproperties of "Unicode character in this lexeme"?
    How about that? Situxx (talk) 20:23, 20 December 2022 (UTC)[reply]
    No, "Unicode character in this lexeme" will be problematic in languages with diacritic, for example Arabic. Midleading (talk) 04:13, 12 February 2023 (UTC)[reply]
    Why? Could you give an example? So9q (talk) 13:21, 17 February 2023 (UTC)[reply]
    I would also be interested in an example.
    In case, "Unicode" is an inappropriate term for this superproperty, maybe there is a less specific term we could use?
    Maybe just "character in this lexeme"? Situxx (talk) 19:52, 23 February 2023 (UTC)[reply]
    @Situxx: indeed "character in this lexeme" seems the good solution (no need to explicitely mention Unicode in the name). @Midleading: what would be the problem with diacritics? ب is ٮ + ◌̣ or دَ is د + َ◌. At this point, since the proposal changed uite a bit and to make things more clear, I wonder if we should not close this property proposal and simply restart a new one. Cheers, VIGNERON (talk) 11:02, 1 May 2023 (UTC)[reply]
    I think lexemes should link to ب rather than ٮ + ◌̣. I don't speak Arabic, maybe I'm wrong. Midleading (talk) 12:44, 1 May 2023 (UTC)[reply]
    All done and I linked the new proposal right here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/character_in_this_lexeme Situxx (talk) 16:14, 2 May 2023 (UTC)[reply]
    @VIGNERON Situxx (talk) 16:15, 2 May 2023 (UTC)[reply]