Wikidata:Property proposal/Aragonario ID

From Wikidata
Jump to navigation Jump to search

Aragonario ID[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

Descriptionidentifier for an Aragonese lexeme in the Aragonario online dictionary
RepresentsAragonario (Q97248069)
Data typeExternal identifier
DomainAragonese lexemes
Allowed values[1-9][0-9]+
Example 1abadejo (L660226)114
Example 2columbario (L675573)6116
Example 3abanderato (L647971)26598
Planned use@Aradgl, Uesca: will add this to existing Aragonese lexemes before creating any new Aragonese lexemes
Number of IDs in source41,772 Aragonese to Spanish entries as of 2021-09 and 26,449 Spanish to Aragonese entries as of 2021-03 [1]
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://aragonario.aragon.es/words/$1/
See alsoDiccionario de la lengua española word (non-ID) (P7790), sense on DHLE (P9583)

Motivation[edit]

This property will help establish that the lexemes @Aradgl, Uesca: have been creating en masse actually do exist. Mahir256 (talk) 23:01, 31 August 2022 (UTC)[reply]

Discussion[edit]

 Support - Nikki (talk) 23:53, 31 August 2022 (UTC)[reply]
 Support --Middle river exports (talk) 01:48, 1 September 2022 (UTC)[reply]
 Support Bovlb (talk) 14:17, 2 September 2022 (UTC)[reply]
 Comment @Mahir256 As Aradgl has just answered, it is not possible to obtain the aragonarioID, we do not have access to that specific data (aragonarioID is not in the original database from which we and the Aragonario obtain the data) because it only exists on the Aragonario website, not It's at the source. I hope you understand and let us continue with our work. Uesca (talk) 08:24, 23 September 2022 (UTC)[reply]
@Uesca: After some time (slowly) scraping the site, I am ready to add this identifier to the majority of the Aragonese lexemes. Please do not tell me that you are somehow unable to add an identifier to the lexemes you create, as any search of the Aragonario site for a term will yield an identifier within the HTML source that can be used to provide a direct link back to the site for that term. Mahir256 (talk) 13:16, 27 September 2022 (UTC)[reply]
@Mahir256Thanks for your help, can you help us and add the aragonarioID to the following Aragonese lexemes that we create? I don't know how to scrape the web and get that ID, I don't have that technical ability. Aradgl (talk) 15:13, 28 September 2022 (UTC)[reply]
@Aradgl: No, I will not do that. It should be your responsibility as lexeme importers to do that; if you have the 'technical ability' to mass-create empty lexemes and throw your hands up and say Deo volente (Q2415781) to people adding senses later, I trust you will have the 'technical ability' to at least inform others that these lexemes exist someplace other than in the imagination of their creator. What I can do is inform you of the process I used to add these identifiers, however:
  • I saved the result of the following query as a TSV file "wordlist.tsv":
    select ?l ?lemma ?cat { ?l dct:language wd:Q8765 ; wikibase:lemma ?lemma ; wikibase:lexicalCategory ?cat . minus { ?l wdt:P11071 [] }
    
    Try it!
    }
  • I ran the script at https://paste.toolforge.org/view/0c5c823e, which produces a list of Aragonario entries (and their identifiers and parts of speech) containing the lemmas in the wordlist from the query above.
  • I filtered the result of that script for entries where the lemmas on Wikidata and Aragonario were the same as well as where the parts of speech were the same (requiring some mapping between the ones given on Aragonario and the Wikidata items).
  • (I massaged these into a QuickStatements batch, but you might instead add the returned identifiers to whatever process you are using to create lexemes in the first place.)
Please let me know if there are parts of this process I should explain further, but note that I do not want to continue cleaning up your mess. Mahir256 (talk) 15:38, 28 September 2022 (UTC)[reply]
@Mahir256
Thank you for making things so difficult for us.
Thank you for wasting us so much time. For making us lose months.
Right now the project of uploading the lexemes in Aragonese to Wikidata is about to disappear due to the impediments you are putting on us.
There are people who have already abandoned.
Can you pass me an Excel with all the words of the aragonario and its id? With this I think I could add them to my Excel (which creates a Json) and run them. I'm not sure though. Uesca (talk) 14:35, 9 October 2022 (UTC)[reply]
@Uesca: I am sorry you think the addition of this identifier to lexemes you create is a 'difficult[y]' and that the instructions and code provided in my reply from the 28th—which you did not seem to acknowledge at all—are 'impediments'. As I understand it only you and @Aradgl: are the ones who have been mass-creating lexemes, and no one else has been, so I certainly regret that anyone else feels dissatisfied with the current situation, but I cannot emphasize enough that I would like Aragonese to thrive as a lexeme language, and that nothing I have done was ever intended to make 'wasting [you] so much time [and] making [you] lose months'—although I am sure the delays between my messages and your replies to them is the major contributor to that time being months rather than weeks or even days—and that I feel better knowing that most existing lexemes have identifiers so that others can now go and more reliably add senses to them even if they do not speak the language. As for a list of 'all the words of the aragonario and its id', that will take a bit of time but I can certainly try to obtain it. Mahir256 (talk) 17:03, 9 October 2022 (UTC)[reply]
@Mahir256 thanks for your reply.
As soon as I have your Excel or csv with aragonario ID I will try to create the lexemes with it.
I understand that you are totally unaware of the situation and reality of the Aragonese and the difficulties of our Wikidata project.
Behind this project we are not just two people, but our resources are very limited and we are really very few.
UNESCO considers that the Aragonese language is in danger of disappearing.
Of course I would love to get back to you the next minute and have hours and hours available to work on Wikidata, but unfortunately that's not the case.
Thanks for all.
I am waiting for your reply. Uesca (talk) 06:10, 11 October 2022 (UTC)[reply]
@Uesca, Aradgl: In the interest of signaling to you that I am not, and do not want to be, the cause of your language's death (even if this signal is something you will completely ignore, just as you have completely ignored my insistence that I am aware of the situation your language faces), here is a spreadsheet listing all 67,955 entries in Aragonario (of which 26,332 are in Spanish and the rest in Aragonese). The columns of this spreadsheet are, in order, 1) the Aragonario ID (to be used with the Aragonario ID property), 2) the language of the lexeme, 3) the word at that ID as provided by Aragonario (something you may need to adjust, given how it abbreviates different forms for different genders), 4) the parts of speech at that ID (a single word might have multiple parts of speech, something you will need to adjust/separate), and 5) the various definitions given for those parts of speech at that ID (which, if you want to import as glosses, you will need to clean up and separate by appropriate part of speech). Mahir256 (talk) 17:32, 20 October 2022 (UTC)[reply]
Thanks @Mahir256
You are not the cause of the death of our language. You are one more obstacle to ensure their safety and survival.
Thanks for everything. I am working on trying to publish the lexemes. I will do it as soon as possible. Uesca (talk) 06:34, 20 January 2023 (UTC)[reply]
@Mahir256
If you really want to help our language, remove the condition of having to add the ID Aragonario.
If it were not for that, we would already have the lexemes published and we would be in phase 2 adding content and meaning to the lexemes published.
Anyway, I continue working on publishing the lexemes with the ID Aragonario. Uesca (talk) 06:40, 20 January 2023 (UTC)[reply]
@Uesca: Glad to see you will be adding these IDs now. If you already had senses, references, or external IDs added or ready to add to the lexemes when you initially created them, rather than leaving them empty and providing no certainty that any of these would in fact be added in the future (opening lexemes to "crowdsourcing" is not a guarantee that they will in fact gain senses), then I would not have insisted that the IDs be present in the first place. Even now, if you just went ahead and filled in senses for the few remaining Aragonese lexemes which I hadn't yet added IDs to, and then for any future lexeme creations added senses immediately afterwards to those new lexemes, then I would welcome that. What I do not believe is helpful, and which I wish to stress was my concern from the very start, is creating lots of lexemes all at once with no obvious way (when just looking at the lexemes) for them to be improved later. Having an ID on it is enough of a start that even non-speakers of the language can help to improve the lexemes later. Mahir256 (talk) 06:57, 20 January 2023 (UTC)[reply]
@Mahir256, Nikki, Middle river exports, Bovlb, Aradgl, Uesca: ✓ Done enjoy! VIGNERON (talk) 18:18, 27 September 2022 (UTC)[reply]