Wikidata:Requests for permissions/Bot/GZWDer (flood) 3
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Not done @GZWDer: This request seems to be abandoned, please reopen it if that is not the case. Thanks. Mike Peel (talk) 21:21, 18 January 2022 (UTC)[reply]
GZWDer (flood) 3[edit]
GZWDer (flood) (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: GZWDer (talk • contribs • logs)
Task/s: Creating items for all Unicode characters
Code: Unavailable for now
Function details: Creating items for 137,439 characters (probably excluding those not in Normalization Forms):
- Label in all languages (if the character is printable; otherwise only Unicode name of the character in English)
- Alias in all languages for U+XXXX and in English for Unicode name of the character
- Description in languages with a label of Unicode character (P487)
- instance of (P31) → Unicode character (Q29654788)
- Unicode character (P487)
- Unicode code point (P4213)
- Unicode block (P5522)
- writing system (P282)
- image (P18) (if available)
- HTML entity (P4575) (if available)
- For characters in Han script also many additional properties; see Wikidata:WikiProject CJKV character
For characters with existing items the existing items will be updated.
Question: Do we need only one item for characters with the same normalized forms, e.g. Ω (U+03A9, GREEK CAPITAL LETTER OMEGA) and Ω (U+2126, OHM SIGN)?--GZWDer (talk) 23:08, 23 July 2018 (UTC)[reply]
- CJKV characters belonging to CJK Compatibility Ideographs (Q2493848) and CJK Compatibility Ideographs Supplement (Q2493862) such as 著 (U+FA5F) (Q55726748), 著 (U+2F99F) (Q55738328) will need to be split from their normalized form, eg. 著 (Q54918611) as each of them have different properties. KevinUp (talk) 14:03, 25 July 2018 (UTC)[reply]
Request filed per suggestion on Wikidata:Property proposal/Unicode block.--GZWDer (talk) 23:08, 23 July 2018 (UTC)[reply]
- Support I have already expressed my wish to import such dataset. Matěj Suchánek (talk) 09:25, 25 July 2018 (UTC)[reply]
- Support @GZWDer: Thank you for initiating this task. Also, feel free to add yourself as a participant of Wikidata:WikiProject CJKV character. [1] KevinUp (talk) 14:03, 25 July 2018 (UTC)[reply]
- Support Thank you for your contribution. If possible, I hope you to also add other code (P3295) such as JIS X 0213 (Q6108269) and Big5 (Q858372) in items you create or update. --Okkn (talk) 16:35, 26 July 2018 (UTC)[reply]
- Oppose the use a of the flood account for this. Given the problems with unapproved defective bot run under the "GZWDer (flood)" account, I'd rather see this being done with a new account named "bot" as per policy.
--- Jura 04:50, 31 July 2018 (UTC)[reply]
- Perhaps we could do a test run of this bot with some of the 88,889 items required by Wikidata:WikiProject CJKV character and take note of any potential issues with this bot. @GZWDer: You might want to take note of the account policy required. KevinUp (talk) 10:12, 31 July 2018 (UTC)[reply]
- This account has had a bot flag for over four years. While most bot accounts contain the word "bot", there is nothing in the bot policy that requires it, and a small number of accounts with the bot flag have different names. As I understand it, there is also no technical difference between an account with a flood flag and an account with a bot flag, except for who can assign and remove the flags. - Nikki (talk) 19:14, 1 August 2018 (UTC)[reply]
- The flood account was created and authorized for activities that aren't actually bot activities. While this new task is one. Given that there had already been run defective bot tasks with the flood account, I don't think any actual bot tasks should be authorized. It's sufficient that I already had to clean up 10000s of GZWDer's edits.
--- Jura 19:46, 1 August 2018 (UTC)[reply]
- The flood account was created and authorized for activities that aren't actually bot activities. While this new task is one. Given that there had already been run defective bot tasks with the flood account, I don't think any actual bot tasks should be authorized. It's sufficient that I already had to clean up 10000s of GZWDer's edits.
- Oppose the use a of the flood account for this. Given the problems with unapproved defective bot run under the "GZWDer (flood)" account, I'd rather see this being done with a new account named "bot" as per policy.
- I am ready to approve this request, after a (positive) decision is taken at Wikidata:Requests for permissions/Bot/GZWDer (flood) 4. Lymantria (talk) 09:11, 3 September 2018 (UTC)[reply]
- Wouldn't these fit better into Lexeme namespace? --- Jura 10:31, 11 September 2018 (UTC)[reply]
- There is no language with all Unicode characters as lexemes. KaMan (talk) 14:31, 11 September 2018 (UTC)[reply]
- Not really a problem. language codes provide for such cases. --- Jura 14:42, 11 September 2018 (UTC)[reply]
- I'm not talking about language code but language field of the lexeme where you select q-item of the language. KaMan (talk) 14:46, 11 September 2018 (UTC)[reply]
- Which is mapped to a language code. --- Jura 14:48, 11 September 2018 (UTC)[reply]
- I'm not talking about language code but language field of the lexeme where you select q-item of the language. KaMan (talk) 14:46, 11 September 2018 (UTC)[reply]
- Not really a problem. language codes provide for such cases. --- Jura 14:42, 11 September 2018 (UTC)[reply]
- There is no language with all Unicode characters as lexemes. KaMan (talk) 14:31, 11 September 2018 (UTC)[reply]
- Note I'm going to be inactive for real life issue, so this request is On hold for now. Comments still welcome, but I'm not able to answer it until January 2019.--GZWDer (talk) 12:08, 13 September 2018 (UTC)[reply]
- @GZWDer: you could use the new account for this as well. --- Jura 11:16, 29 September 2019 (UTC)[reply]
- @Jura1: Do you think this question of last September should block approval of the request? Lymantria (talk) 06:14, 26 March 2020 (UTC)[reply]
- @Lymantria: There is now a new discussion at Wikidata:Property proposal/Unicode character (item) is this should be items or lexemes or whatever. --- Jura 17:30, 26 March 2020 (UTC)[reply]
- Support I wonder why the information isn't in Wikidata for such a long time when many less notable subjects have complete data. --Midleading (talk) 02:38, 31 July 2020 (UTC)[reply]
- Oppose This user has no respect on infra's capacity in any way, these accounts along two others has been making wikidata basically unusable (phab:T242081) for months now. I think all of other approvals of this user should be revoked, not to add more on top. (Emphasis: This edit is done in my volunteer capacity)Amir (talk) 17:26, 17 August 2020 (UTC)[reply]
- Repeating from another RFP. Given that WMDE is going to remove noratelimit from bots, your bot won't cause more issues hopefully but you lost your good standing with regards to respecting infra's capacity to me. Amir (talk) 18:53, 10 October 2020 (UTC)[reply]
- While this is open, it is important not to merge letter and Unicode character items, like Nikki did with ɻ (Q56315451) and ɻ (Q87497973), ƾ (Q56316849) and ƾ (Q87497496), ʎ (Q56315460) and ʎ (Q87498018), etc.; the whole goal of this project is to keep them apart. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 14:02, 25 January 2021 (UTC)[reply]
- Is there consensus for that? I am confused as to why this is necessary in most instances, as the whole point of Unicode's rigorous submission process is to ensure that codepoints are semantically unique, and therefore it makes sense to perceive them as representative of the letters (etc.) themselves, rather than unique items in their own right. In most cases where there are multiple code points that represent a specific letter, this is due to either additional characteristic(s) such as bold, circling and so on, or due to compatibility. In such instances, it makes sense to use the depicts/depicted by properties to link them to the primary item. Theknightwho (talk) 22:22, 11 November 2021 (UTC)[reply]
- I do not agree that "Unicode's rigorous submission process is to ensure that codepoints are semantically unique"Template:Snddo you have a source for that? 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 22:23, 11 November 2021 (UTC)[reply]
- The definition of "character" on Unicode's website, specifically (1) read in conjunction with (3): (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape, though in code tables some form of visual representation is essential for the reader’s understanding. (2) Synonym for abstract character. (3) The basic unit of encoding for the Unicode character encoding. (4) The English name for the ideographic written elements of Chinese origin. Theknightwho (talk) 22:34, 11 November 2021 (UTC)[reply]
- Having semantic value is not related to the value's uniqueness. By your logic, we would need separate codepoints for the "!" character referring to exclamation in spoken languages, negation in programming, factorials in mathematics and warnings in traffic signing. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 10:21, 12 November 2021 (UTC)[reply]
- If neither the Unicode standard nor Wikidata makes a distinction between the different uses of "!", then they are functionally referring to one and the same thing and should therefore be considered the same item. Your logic only follows if Wikidata makes such a distinction, but it does not. I am simply working from the stated definition, and not an arbitrary hypothetical interpretation that neither project uses. Theknightwho (talk) 18:15, 12 November 2021 (UTC)[reply]
- Having semantic value is not related to the value's uniqueness. By your logic, we would need separate codepoints for the "!" character referring to exclamation in spoken languages, negation in programming, factorials in mathematics and warnings in traffic signing. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 10:21, 12 November 2021 (UTC)[reply]
- The definition of "character" on Unicode's website, specifically (1) read in conjunction with (3): (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape, though in code tables some form of visual representation is essential for the reader’s understanding. (2) Synonym for abstract character. (3) The basic unit of encoding for the Unicode character encoding. (4) The English name for the ideographic written elements of Chinese origin. Theknightwho (talk) 22:34, 11 November 2021 (UTC)[reply]
- In most cases where there are multiple code points that represent a specific letter, this is due to either [...] bold, circling and so on: The most obvious case is the distinction between capital and small letters. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 18:19, 12 November 2021 (UTC)[reply]
- Which suggests that we should have items for uppercase and lowercase A, not the specific Unicode codepoints. Theknightwho (talk) 18:24, 12 November 2021 (UTC)[reply]
- I do not agree that "Unicode's rigorous submission process is to ensure that codepoints are semantically unique"Template:Snddo you have a source for that? 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 22:23, 11 November 2021 (UTC)[reply]
- Is there consensus for that? I am confused as to why this is necessary in most instances, as the whole point of Unicode's rigorous submission process is to ensure that codepoints are semantically unique, and therefore it makes sense to perceive them as representative of the letters (etc.) themselves, rather than unique items in their own right. In most cases where there are multiple code points that represent a specific letter, this is due to either additional characteristic(s) such as bold, circling and so on, or due to compatibility. In such instances, it makes sense to use the depicts/depicted by properties to link them to the primary item. Theknightwho (talk) 22:22, 11 November 2021 (UTC)[reply]
- Just to add to this, I've done a quick check and can find numerous instances of these merges, instigated by different users. They seem entirely appropriate, for the most part. I don't think it is feasible nor reasonable to keep up this artificial separation en masse like this (particularly given that it is clearly causing confusion), especially in instances where there is absolutely no ambiguity whatsoever. In instances where more than one codepoint could represent a symbol, it may need subdivisions (i.e. separate items for uppercase/lowercase) in addition to the primary item - with appropriate links - but otherwise I see this artificial separation as conceptually flawed and unnecessarily confusing. Theknightwho (talk) 02:16, 13 November 2021 (UTC)[reply]