User:BrokenSegue/CounterVandalismBotPlan

Summary

The proposal is to deploy a machine learning powered bot to fight vandalism on Wikipedia. This will be done in stages as we increase our confidence in the tool.

The goal is to decrease the volume of vandalism that we do not identify quickly.

Deployment outline

I propose this be done in stages as we verify the bot is meeting expected performance metrics:

Announcement of intent / discussion
- Seek community approval to try to do this
Offline performance measurement
- Compute accuracy and precision (Q272035)/sensitivity (Q6842789) at various confidence levels
- Share batches of predicted "vandalous" edits with the community for their inspection
Online edit tagging
- Bot runs on live edits and applies a new revision tag to edits it would have reverted (if it were fully activated)
- Users can use the edit tags in recent changes to get a sense for the quality.
Edit reverting trial
- Monitoring of performance and community / user reaction
- Collecting / monitoring false positives
- Post trial community discussion
Edit reverting in production
- Periodic retraining of the model
(Speculative) Edit patrolling
- If the model is found to be very accurate we could consider having the bot patrol edits it is very confident are not vandalism.

Open community questions

Do we need/want an anti-vandal bot? Is vandalism a problem? Do we have enough volunteers patrolling? On average there is a new unpatrolled edit every 5 seconds.
How accurate does the model need to be to deployed? 95% accuracy?
What kind of biases are acceptable in such a bot? How do we ensure German performance is comparable to English? What about very obscure languages?
Where do we draw the line between "suspcious edits" ("changing the date of birth of someone by a day") and clear vandalism ("changing Obama's date of birth to 1492")
Should certain classes of user be immune or specifically targeted by the bot ("don't revert confirmed users", "only revert anonymous users", "don't revert QS jobs")? Running on all edits (including bots) is probably not computationally feasible.
How should the bot interact with users when it does a revert? Should it leave a note for the user (in what language?)? How long after an edit should it wait before reverting (Wikidata users often make lots of consecutive edits so a vandalous-looking edit may actually be a good edit in the context of a sequence of edits)?
What policies should the bot follow (e.g. "don't revert on the same item more than twice per 24 hours")
Do all components of the counter vandalism bot need to be open source (e.g. would using GPT4 be against the spirit of the project) ?

Comparison to ClueBot

The effort to build this is being modeled on the sucesfull effort to make ClueBot.

ClueBot was run with a targeted false positive rate of between 0.25% and 0.1%. We likely will need to target an even better false positive rate given the higher per user edit volume.

Other notes

I performed basic statistics on the current ORES scores and they are definitely not good enough to use as for this purpose. This was true even at very high thresholds (still had bad precision).
This is separate and distinct from the WMF project to train new vandalism models on Wikidata edit traffic. That project will not result in a bot to revert edits but will just expose new better scores that predict vandalism. The effort of deploying the model into a bot is left to us. It is possible we could leverage their model in the creation of this anti-vandalism bot

Comments?

Put them on the talk page User talk:BrokenSegue/CounterVandalismBotPlan

User:BrokenSegue/CounterVandalismBotPlan

Contents

Summary

Deployment outline

Open community questions

Comparison to ClueBot

Other notes

Comments?

Navigation menu

User:BrokenSegue/CounterVandalismBotPlan

Summary

Deployment outline

Open community questions

Comparison to ClueBot

Other notes

Comments?

Navigation menu

Search