Wikidata:Requests for permissions/Bot/MsynBot 8
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 06:25, 24 December 2021 (UTC)[reply]
MsynBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: MisterSynergy (talk • contribs • logs)
Task/s: remove sitelinks to inexistent pages on client wikis
Code: (still in preparation)
Function details: The bot will run periodically, with the execution interval to be determined (likely weekly or monthly). It will compare all page titles from client wikis (queried from the Mediawiki page table of the client wikis) with the sitelinks found at Wikidata (queried from the Wikibase wb_items_per_site table of Wikidata). Sitelinks found in Wikidata, but not in client wikis will be checked individually and removed, if indeed inexistent.
There are currently several thousand of such cases, mainly because some actions in client wikis ("page deletion" or "page move without a redirect left behind") are sometimes not automatically synced in the repository. Apparently we do not have a bot who fixes these cases currently since some of them exist for several years meanwhile. The bot will include all client wikis of Wikidata.
The code is not really ready yet, but I am currently testing with PAWS to get it completed. At a later point, the job will (at least partially, memory limitations may be problematic) be scheduled and executed on Toolforge in the msynbot
tool account. The editing will be done with pywikibot, as with my other bot jobs as well. A copy of the source code can also be made available onwiki, if someone asks. —MisterSynergy (talk) 15:09, 16 December 2021 (UTC)[reply]
- Support of course, very useful operation which is missing as of now. --Epìdosis 16:34, 16 December 2021 (UTC)[reply]
- Comment if you haven't written it yet, maybe asking the devs to spend the equivalent time on fixing Wikidata:Report_a_technical_problem#Resolve_data_quality_issues_of_the_sitelink_system would be a better investment. The problem with fixing data quality issues coming from design issues in Wikibase is that it's essentially endless. If most is caught, the underlying issue may never be addressed, or even community repair used as a pretext of not addressing it. Still, we will have the same issue once you discontinue the task and developers might not even be aware of the unresolved problem (beyond stray tasks in phab). --- Jura 18:04, 16 December 2021 (UTC)[reply]
- We don’t know whether the devs will fix this problem, and—if so—when. For the time being, a community-based fix is probably the best as there is meanwhile a 5-figure amount of sitelinks to inexistent pages. I would also expect any fix to be incomplete in the sense that some cases might still be missed no matter what.
- Most of the code is ready, Special:Diff/1544161775 was a test edit. What is still missing is to wrap everything up, deal with the memory limitations that matter for a job such as this one, catch some more special cases that can crash the script, and deploy it to Toolforge so that it runs reliably. —MisterSynergy (talk) 18:36, 16 December 2021 (UTC)[reply]
- Supposedly if just one user notices (or mentions it on Wikidata:Report_a_technical_problem), it might not be looked into.
- For future analysis, maybe the edits could be with a separate bot account, how about User:Sitelink deletion resync ? --- Jura 20:35, 16 December 2021 (UTC)[reply]
- I will add a task-specific hashtag to the edit summary. That way, the edits of this task can easily be queried. —MisterSynergy (talk) 21:17, 16 December 2021 (UTC)[reply]
- The advantage of a separate account is that one wouldn't need to do much analysis to get the basic stats. The name of the account isn't that important. Call it User:MsynBot_8 if you prefer.
- Maybe a tag for each wiki could be interesting, or at least enwiki. --- Jura 11:39, 17 December 2021 (UTC)[reply]
- An additional project hashtag is a good idea. It would also not be overly complicated to add some txt based logging to the bot and make it accessible to interested users. —MisterSynergy (talk) 12:10, 17 December 2021 (UTC)[reply]
- I will add a task-specific hashtag to the edit summary. That way, the edits of this task can easily be queried. —MisterSynergy (talk) 21:17, 16 December 2021 (UTC)[reply]
@Lymantria, Ymblanter: are we actually waiting for anything to happen here, or can the task already be approved? —MisterSynergy (talk) 23:06, 23 December 2021 (UTC)[reply]
- I still think a separate account would be better, but let's move ahead with this. If I recall, you already did similar deletions in the past .. so I suppose we can skip extensive testing.--- Jura 01:26, 24 December 2021 (UTC)[reply]