Wikidata:Requests for permissions/Bot/alilmed

From Wikidata
Jump to navigation Jump to search

alilmed[edit]

alilmed (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Alilmed (talkcontribslogs)

Task/s: uploading thousands of new items and editing existing ones, database from lieder.net

Code:

Function details: --Alilmed (talk) 16:04, 23 August 2020 (UTC)[reply]

  •  Comment I have already written to the user here about the first about 100 edits made through OpenRefine. I repeat the main points:
    1. regarding existent items: English labels and descriptions should not be changed (changes are usually worsenings) if they are already present; additions of occupation (P106)author (Q482980) should not be performed if a value of occupation (P106) is already present; additions of date of birth (P569) and date of death (P570) with precision "year" should not be performed if values with precision "month" or "day" are already present;  Support the addition of references to LiederNet author ID (P8234), although they aren't so necessary
    2. regarding the creation of new items:  Oppose any bot-creation, too high risk of creating duplicates; I suggest to import two new Mix'n'match catalogs substituting existent author and composer catalogs; the new catalogs should contain dates of birth/death, so that, if new items are created, dates are added automatically
--Epìdosis 23:50, 23 August 2020 (UTC)[reply]
Hi @Epìdosis:, I totally agree with your remarks regarding already existing items. Regarding the substituting of Mix'n'match catalogs: Is it possible to do this without losing the "matching" work that has already been done on the previous version of the catalogs? --Beat Estermann (talk) 15:43, 27 August 2020 (UTC)[reply]
By the way, dates are added automatically already today, see for example Nikolai Savvich Abaza (Q98687954). Therefore, I don't really see the point of substituting the existing catalogues. --Beat Estermann (talk) 15:54, 27 August 2020 (UTC)[reply]
@Beat Estermann: Regarding Mix'n'match catalogs: yes, it is possible to create new catalogs without losing preexistent matches (it is sufficient to create the new catalogs, then Magnus Manske will merge the old catalogs into them, including the old matches); anyway, you are right, the two catalogs seem to contain all the possible dates, so no need to import new catalogs.
Regarding the import using OpenRefine (I've seen your comment here): so, all the previous remarks are valid; "In order to avoid creating many new person items with little information and of limited relevance, I have advised Alilmed to refrain from ingesting new authors and composers that are linked to no more than one song/setting in the Lieder.Net Archive" > I perfectly agree, but I would be still more strict - not only more than one song/setting, but at least one birth and/or death date, I would avoid creating items with no dates because they may be against WD:N2 ("It refers to an instance of a clearly identifiable conceptual or material entity": no dates means the subject is not so clearly identifiable); Conversely, newly created items that are linked to more than 5 songs/settings in the Lieder.Net Archive should be double checked manually in order to spot potential duplicates, especially when transliteration of names is involved, e.g. "Nikolay Vasil'yevich Berg" → Nikolai Berg (Q1964476)" > sure (!), checking items one by one is vital, also because the example item you created today (Nikolai Savvich Abaza (Q98687954)) in fact already existed (Nikolai Savvich Abaza (Q4054209)); when merging, imprecise dates coming from LiederNet should be cleaned; when new items are created, looking at https://viaf.org/ for a VIAF ID (P214) is very important.
Little statistic, after revising all the 114 edits of the first import by Alilmed: 54 preexistent items were edited (with the problems underlined above; all cleaned); 60 new items were created, of which 30 (!) were duplicates, while 30 weren't duplicates but often had to be substantially expanded (although in a few cases I was not able to find any other information about the subjects).
Suggestion: try another little import of new items (no more than 50) and let's see if it's possible to lower the percentage of duplicates at least under 20%. --Epìdosis 18:55, 27 August 2020 (UTC)[reply]
Thanks for your work! And yes, I agree with your suggestion. By the way, what makes you think that in case of diverging birth/death dates, the LiederNet information is wrong? - Does Wikidata regularly have reliable references for these? --Beat Estermann (talk) 06:12, 28 August 2020 (UTC)[reply]
@Beat Estermann: So, my concern is not primarily about diverging dates (e.g. if Wikidata has 1830 and LiederNet 1831, it may be worth retaining also 1831), but about less precise dates (e.g. if Wikidata has 1/6/1831 and LiederNet 1831, it is not worth inserting 1831). If you can distinguish between these two cases, you can insert dates in the first; if it is too complicate to create such a distinction, better not to insert dates at all, at least in this first phase. --Epìdosis 11:16, 28 August 2020 (UTC)[reply]