Wikidata:Requests for permissions/Bot/Marius851000's Bot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 16:36, 5 January 2022 (UTC)[reply]
Marius851000's Bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Marius851000 (talk • contribs • logs)
Task/s: To link musicbrainz identifier to wikidata using data matching, via OpenRefine and some script
Code: OpenRefine (sometimes via QuickStatements), https://github.com/marius851000/wikidata_musicbrainz_script (will change, code only for initial import)
Function details: I made a rust tool to perform data matching from MusicBrainz to Wikidata by the IMDB id. I did some initial testing, limited to composer. Then I found it wasn't good enought, so I want throught learning OpenRefine. Did some manual correction, removed duplicate, ... (and fixed wikidata and musicbrainz entries, performing some merging). I now plan to import it, as well as to apply it to every item that have a imdb url without musicbrainz url (may also be used to spot duplicate or inconsistancies, but I don't know how this could be automatically solved). Also plan to perform matching with other identifier (a second phase would be to import data from musicbrainz, but I will make another permission request if I does this).
I also don't plan to do totally unsupervised import, but I won't manually check every individual value to check if they are valid.
Planned original/example import : https://hacknews.pmdcollab.org/composer_metabrainz_by_imdb_id.txt --Marius851000 (talk) 21:20, 18 December 2021 (UTC)[reply]
- isn't there already a bot doing this or something similar? Maybe look at Wikidata:Requests for permissions/Bot/Soweego bot 4. might want to talk to them (they use a slightly different reference format). generally looks good to me though. are you filtering down by instance of (P31) types? e.g. human? BrokenSegue (talk) 21:38, 18 December 2021 (UTC)[reply]
- I filtered by human in OpenRefine too (but not in the pre-processing script. There were a specific case of a duo that had an occupation of composer, so I manually set their QID for the import). I'll take a look at this Soweego. The original reason I did this was because I found it was doable in mix-n-match. Marius851000 (talk) 09:52, 19 December 2021 (UTC)[reply]
- Soweego indeed seems to be pretty interesting, but a bit more complex. I'll spend some understanding its working, but my original request still stand. Marius851000 (talk) 17:48, 19 December 2021 (UTC)[reply]
- I filtered by human in OpenRefine too (but not in the pre-processing script. There were a specific case of a duo that had an occupation of composer, so I manually set their QID for the import). I'll take a look at this Soweego. The original reason I did this was because I found it was doable in mix-n-match. Marius851000 (talk) 09:52, 19 December 2021 (UTC)[reply]
- Please make some test edits--Ymblanter (talk) 20:13, 25 December 2021 (UTC)[reply]
- The bot user isn't auto-confirmed, and couldn't use QuickStatements, so I tried to use the function to edit directly in OpenRefine, that correctly detect that there are too fast editing, but doesn't seems to slow down. Anyway, I have imported around 30 statements ( https://www.wikidata.org/wiki/Special:Contributions/Marius851000%27s_Bot ). Before running the test import, I rechecked for element with already present data in case some where updated. Marius851000 (talk) 22:49, 27 December 2021 (UTC)[reply]
Regarding the test edits: Generally ok but I wish the reference were more specific (you based it on record linkage with the IMDB identifier). That said that is a bit picky so I think we should generally lean towards approval here. BrokenSegue (talk) 18:46, 1 January 2022 (UTC)[reply]
- I will approved the request and flag the bot in a couple of days, provided that no objections will be raised. Lymantria (talk) 10:53, 2 January 2022 (UTC)[reply]