User:MisterSynergy/deltabot

From Wikidata
Jump to navigation Jump to search

Notes regarding Pasleim's tools (deltabot, pltools, plnode on Toolforge). This page describes the target state, not the current state.

September 2023[edit]

There are ~50 Python scripts running under DeltaBot and PLbot account. Many are in dire need of rewrites.

  • Merge all wiki-writing bots to Deltabot; pltools retains some webtools and jobs that generate input for webtools; plnode remains unchanged.
  • Login using OAuth (currently Botpasswords)
  • Use a single Github repo for all scripts (currently only Deltabot has version control, and that is even outdated)
  • Use the Toolforge jobs framework
  • Use a shared Python venv; target to use the latest Python image
  • Use individual cronjobs for each Python script; configure as little mem and cpu as possible (currently container shell scripts with excessive memory requests are used)
  • Restarting policy (TBD)

Functionality to consider:

  • Python-specific:
    • config as constants
    • context manager for database access, and file IO
    • migrate database connector to mariadb
    • use dict.get()
    • HTTP user agents for every external request
    • Use the logging module (to replace excessive .err files)
    • modernize pywikibot usage
    • use type hints
    • architecture/structure of code
    • variable names with snake_case
    • centralize common functionality into a custom package
  • Write tests for every piece of code

February 2024[edit]

Status quo after several rearrangements and code updates in fall 2023:

  • User:PLbot is retired from regular operation; its remaining functional jobs have been migrated to DeltaBot
  • User:DeltaBot is the sole active bot; it operates completely in the deltabot tool on Toolforge
  • pltools offers a couple of web tools; plnode serves as a node.js-based backend for some of these tools

DeltaBot[edit]

  • All source code is at Github in MisterSynergy's account. Sources should also be publicly accessible on Toolforge for users with a Toolforge account.
  • The bot uses pywikibot and authenticates via OAuth.
  • The bot uses a single, shared Python 3.11 virtual environment (venv) for all jobs.
    • Required modules (without dependencies) are: mariadb, mwoauth, mwparserfromhell, python-stdnum, pywikibot, requests, requests-oauthlib; there is (potentially outdated) requirements.txt file in ~/operations/requirements.txt on Toolforge.
    • The bot does not use shared pywikibot anymore, but a locally installed version (8.3.3) in the venv.
  • The bot uses the Toolforge jobs framework for job scheduling.
    • Each job has its own cronjob on k8s (~50 in total).
    • Each job has individual .err and .out logfiles.
    • Each job has tailored memory and cpu requests.
    • Each job restarts exactly once on failure.
    • A single .yaml definition file defines all jobs (~/operations/deltabot_jobs.yaml).
    • Monitoring: k8s-status, Grafana.
  • Although some functionality is used in many jobs/scripts, there is no unifying "deltabot package" that would centralize shared functionality. This intentional decision should keep individual jobs portable independently from the rest of the bot, in case the setup does not work anymore in the future.
  • The default jobs limit at Toolforge (50 per tool) has been raised to 100 for deltabot per request.
  • The source code is licensed as CC0.

There are still some todos:

  • A few (rather complex) jobs still need to be rewritten. It needs to be figured out whether there is still need for them, though.
  • A couple of jobs have rather excessive execution times of the order of one day. It needs to be figured out whether this is necessary, and could possibly be reduced.
  • The merge-project job has a remaining database/web-tool dependency in the pltools account; as a consequence, it is not fully working right now.
  • There is no code testing at all in place. It needs to be figured out whether it is worth it, because most of the code depends on WMF infrastructure and sane data, rather than algorithmic complexity. Currently, Python type hints are used in VSCode to maintain some sort of an overview what the scripts do and how functions interact with each other type-wise, but beyond that there is nothing in place to validate the code.
  • The venv needs to be refreshed occasionally, particularly the pywikibot library.
  • The ~/logs folder needs to be revisited for excessively large logfiles regularly.

PLbot[edit]

  • The bot is retired and there are no plans to reactivate it.
  • Some of the dysfunctional scripts have not been reactivated due to a changed environment or lack of need in modern times, but they are still available on MisterSynergy's local computer on request.

pltools and plnode[edit]

  • These tools have not been changed, and will not receive updates in the foreseeable future. Let's hope the best that they do not break.