Wikidata:Property proposal/Unicode character
Jump to navigation
Jump to search
Unicode character (entity)[edit]
Originally proposed at Wikidata:Property proposal/Generic
Not done
Description | Unicode character representing the item |
---|---|
Data type | Lexeme |
Example 1 | ellipsis (Q32518) → ⋯ (L291359) |
Example 2 | → invalid ID (L61046) |
Example 3 | → |
Planned use | new items will be created for current values of Unicode character (P487) on instance of (P31) of letter (Q9788). The current values of Unicode character (P487) and Unicode code point (P4213) will be moved to these new items. |
See also | code (P3295), Unicode character (P487) and Unicode code point (P4213) |
Motivation[edit]
This is an alternative proposal to Wikidata:Property proposal/Unicode character (item). See it's discussion about the advantages and disadvantages of either datatype, notably the massive redundancy created by item datatype ([1] (~900 triples), compared to lexeme at [2] (~20 triples). Query server currently has issue with items with a large number of triples. I'm neutral about the actual usefulness of creating all these entities, but L-entities are preferable over items. --- Jura 15:18, 7 April 2020 (UTC)
@GZWDer, Iwan.Aucamp, Tinker_Bell, Infovarius, ArthurPSmith: @Lymantria: --- Jura 15:23, 7 April 2020 (UTC)
Discussion[edit]
- Comment see Wikidata:Property proposal/Unicode character (item) to avoid repeating arguments already discussed there. --- Jura 15:18, 7 April 2020 (UTC)
- Comment While we are here, can we alter the labels to disambiguate between Unicode Code Points, Unicode Abstract Character, and a Unicode Character Encoding? -indo (talk) 02:55, 5 November 2020 (UTC)
- Comment The reason you have 900 triples for Q87524936 is because somebody added '...' as the label in almost 450 languages. You would have a WORSE problem with lexemes, if the same lexeme was added repeatedly in 450 languages. I don't see how this helps. ArthurPSmith (talk) 16:38, 7 April 2020 (UTC)
- Please double-check the sample. This is not planned. --- Jura 16:42, 7 April 2020 (UTC)
- I don't think we should concern how query service behave with labels, given the same information can also be fetched via Unicode character (P487).--GZWDer (talk) 17:58, 7 April 2020 (UTC)
- Well, item creation isn't effected, so understandably you don't concern yourself and prefer to create redundant triples, making it inefficient to edit by others. --- Jura 05:57, 8 April 2020 (UTC)
- What does "inefficient to edit by others" mean? --GZWDer (talk) 13:51, 8 April 2020 (UTC)
- One edit on a triple of an item with 1000 triples = 1000 deletes, 1000 additions. one edit to a lexeme with 15 triples: 15 deletes, 15 additions. --- Jura 13:19, 13 April 2020 (UTC)
- I agree we shouldn't have to concern ourselves with query server update issues, but the developers asked as to do that given the load issues. You should be well aware of that .. --- Jura 13:19, 13 April 2020 (UTC)
- They are fine as long as query service is not lagged (or the lag is considered in edits). I don't think there will be large long-term issue; especially phab:T244590 will be fixed soon.--GZWDer (talk) 23:00, 13 April 2020 (UTC)
- Well, item creation isn't effected, so understandably you don't concern yourself and prefer to create redundant triples, making it inefficient to edit by others. --- Jura 05:57, 8 April 2020 (UTC)
- Oppose If performance is the only concern, we can (and should) resolve it using the appropiate namespace. --Tinker Bell ★ ♥ 05:11, 8 April 2020 (UTC)
- @Tinker Bell: So why do you oppose it? In any case, it's not the only issue. Did you read the discussion at Wikidata:Property proposal/Unicode character (item)? --- Jura 05:57, 8 April 2020 (UTC)
- Support for addressing the efficiency problem inherent in using items in the absence of a 'mul/mis' label language for items and in the absence of a way to force said 'mul/mis' label to be the only such label for an item. Mahir256 (talk) 00:33, 18 April 2020 (UTC)
- @Mahir256: But I also want alias and (both multilingual one, for NFD forms and codepoint, and monolingual one, for names, including canonical and alias Unicode names).--GZWDer (talk) 11:56, 18 April 2020 (UTC)
- It's not clear from your alternate proposal samples how that would be solve there and provided in a structured way. As mentioned in the discussion there, there are various ways to include additional information even if you didn't present it in the initial usecase. --- Jura 11:15, 19 April 2020 (UTC)
- @Mahir256: But I also want alias and (both multilingual one, for NFD forms and codepoint, and monolingual one, for names, including canonical and alias Unicode names).--GZWDer (talk) 11:56, 18 April 2020 (UTC)
- Weak oppose I doubt Unicode characters should be modeled using Wikidata lexemes. --Matěj Suchánek (talk) 11:06, 4 September 2020 (UTC)
- @Matěj Suchánek: [1] Which features of lexeme do you think are not useful for these? [2] What use would have the additional elements (noise) added when created as item? --- Jura 11:10, 4 September 2020 (UTC)
- [1] Forms and maybe senses. On the other hand, having only one "label" is an advantage (but I don't think it's a "lemma"). [2] Wikipedias can have articles about some of them. They might be valid values for depicts (P180) on Commons, main subject (P921), and maybe others. --Matěj Suchánek (talk) 11:22, 4 September 2020 (UTC)
- Good point about Commons. I think we hadn't seen that aspect before. Let me think about it. --- Jura 11:42, 4 September 2020 (UTC)
- [1] it would be a form and a lemma (the later is the name of the label at the top of the entity), see Lexeme:L291359. I think it's much clearer if we have just one such label and an automated description (from Lexical category). The sample has also a sense.
[2] That Wikipedia can have articles about some L-entities isn't a problem limited to these. To use these at Commons, we will eventually have to find a solution too, but this, also, isn't limited to unicode characters. In the opposite direction, we had to create code (image) (P7415) to link some files there. --- Jura 13:53, 5 September 2020 (UTC)
- [1] Forms and maybe senses. On the other hand, having only one "label" is an advantage (but I don't think it's a "lemma"). [2] Wikipedias can have articles about some of them. They might be valid values for depicts (P180) on Commons, main subject (P921), and maybe others. --Matěj Suchánek (talk) 11:22, 4 September 2020 (UTC)
- @Matěj Suchánek: [1] Which features of lexeme do you think are not useful for these? [2] What use would have the additional elements (noise) added when created as item? --- Jura 11:10, 4 September 2020 (UTC)
- @Jura1, Indolering, GZWDer, Tinker Bell, Mahir256, Matěj Suchánek: Closing as Not done for same reasons as Wikidata:Property proposal/Unicode character (item). Maybe we should try to hash this all out on a "WikiProject Unicode" page before any more proposals are made? ArthurPSmith (talk) 17:15, 9 September 2021 (UTC)