User:BrokenSegue/CounterVandalismBotPlan

From Wikidata
Jump to navigation Jump to search

Summary

[edit]

The proposal is to deploy a machine learning powered bot to fight vandalism on Wikipedia. This will be done in stages as we increase our confidence in the tool.

The goal is to decrease the volume of vandalism that we do not identify quickly.

Deployment outline

[edit]

I propose this be done in stages as we verify the bot is meeting expected performance metrics:

  1. Announcement of intent / discussion
    • Seek community approval to try to do this
  2. Offline performance measurement
  3. Online edit tagging
    • Bot runs on live edits and applies a new revision tag to edits it would have reverted (if it were fully activated)
    • Users can use the edit tags in recent changes to get a sense for the quality.
  4. Edit reverting trial
    • Monitoring of performance and community / user reaction
    • Collecting / monitoring false positives
    • Post trial community discussion
  5. Edit reverting in production
    • Periodic retraining of the model
  6. (Speculative) Edit patrolling
    • If the model is found to be very accurate we could consider having the bot patrol edits it is very confident are not vandalism.

Open community questions

[edit]
  • Do we need/want an anti-vandal bot? Is vandalism a problem? Do we have enough volunteers patrolling? On average there is a new unpatrolled edit every 5 seconds.
  • How accurate does the model need to be to deployed? 95% accuracy?
  • What kind of biases are acceptable in such a bot? How do we ensure German performance is comparable to English? What about very obscure languages?
  • Where do we draw the line between "suspcious edits" ("changing the date of birth of someone by a day") and clear vandalism ("changing Obama's date of birth to 1492")
  • Should certain classes of user be immune or specifically targeted by the bot ("don't revert confirmed users", "only revert anonymous users", "don't revert QS jobs")? Running on all edits (including bots) is probably not computationally feasible.
  • How should the bot interact with users when it does a revert? Should it leave a note for the user (in what language?)? How long after an edit should it wait before reverting (Wikidata users often make lots of consecutive edits so a vandalous-looking edit may actually be a good edit in the context of a sequence of edits)?
  • What policies should the bot follow (e.g. "don't revert on the same item more than twice per 24 hours")
  • Do all components of the counter vandalism bot need to be open source (e.g. would using GPT4 be against the spirit of the project) ?

Comparison to ClueBot

[edit]

The effort to build this is being modeled on the sucesfull effort to make ClueBot.

ClueBot was run with a targeted false positive rate of between 0.25% and 0.1%. We likely will need to target an even better false positive rate given the higher per user edit volume.

Other notes

[edit]
  • I performed basic statistics on the current ORES scores and they are definitely not good enough to use as for this purpose. This was true even at very high thresholds (still had bad precision).
  • This is separate and distinct from the WMF project to train new vandalism models on Wikidata edit traffic. That project will not result in a bot to revert edits but will just expose new better scores that predict vandalism. The effort of deploying the model into a bot is left to us. It is possible we could leverage their model in the creation of this anti-vandalism bot

Comments?

[edit]

Put them on the talk page User talk:BrokenSegue/CounterVandalismBotPlan