Wikidata:Wikidata curricula/Activities/Pywikibot/Missing label in target language/code
Jump to navigation
Jump to search
#!/usr/bin/python3
# Global technical parameters
modnm = 'Pywikibot missing_person_label' # Module name (using the Pywikibot package)
pgmid = '2020-10-29 (gvp)' # Program ID and version
"""
Static definitions
"""
# Functional configuration flags
# Restrictions: cannot disable both labels and wikipedia. We need at least one of the options.
usealias = True # Allow using the language aliases (disable with -s)
fallback = True # Allow for English fallback (could possibly be embarassing for local languages; disable with -t)
notability = True # Notability requirements (disable with -n; this is not encouraged, unless for "no"-cleanup)
safemode = False # Avoid label/description homonym conflicts (can be activated with -x when needed)
uselabels = True # Use the language labels (disable with -l)
wikipedia = True # Allow using Wikipedia article names (best, because Wikipedia is multilingual; disable with -w)
wdquery = '' # Wikidata query (generated from template by default)
itemlist = [] # Item list from input file (parameter -i)
"""
Language setup: 2 or 3 letter ISO 639-1 code (bad codes are skipped).
Restrictions: the list of languages must be completely changed when using other alphabets than Roman.
Block (and skip) incompatible alphabets or languages (setup other codes for non-Roman alphabets).
Prevent target languages to be processed repeatedly within the same command.
Disallow some languages, e.g. when there are too many errors for a certain language with -a.
Unexisting alphabeths, e.g. the cz language does not exist (Czechia country) - the correct code is cs (Czech language).
In Czech and Polish there are no clear/fixed rules for (foreign) person naming (ova-problem).
Norwegian has 3 language codes: nb, nn, no (nb is the preferred one; no is only used for Wikipedia and is excluded for Wikidata).
"""
reflang = 'en' # Reference language code (English)
refwiki = 'enwiki' # Reference Wikipedia (English)
langdone = ['ar', 'arz', 'bg', 'cs', 'cz', 'fa', 'he', 'hu', 'no', 'pl', 'ru', 'zh'] # Languages done
"""
Default target list of languages; only add compatible languages.
The current values are valid for West-European languages (Using the Roman alphabet).
Important: langdone and listlang must be adapted for other alphabets.
Do not use 'no' (Norwegian Wikipedia code).
"""
# List of target languages
listlang = ['nl', 'fr', 'en', 'de', 'it', 'es', 'pt', 'da', 'fi', 'nb', 'sv']
# Technical configuration flags
# Defaults: transparent and safe
debug = False # Can be activated with -d (errors and configuration changes are always shown)
errorstat = True # Show error statistics (disable with -e)
exitfatal = True # Exit on fatal error (can be disabled with -p; please take care)
shell = True # Shell available (command line parameters are available; automatically overruled by PAWS)
showcode = False # Show the generated SPARQL code (activate with -c)
verbose = True # Can be set with -q or -v (better keep verbose to monitor the bot progress)
# Technical parameters
"""
Default error penalty wait factor (can be overruled with -f).
Larger values ensure that maxlag errors are avoided, but temporarily delay processing.
It is advised not to overrule this value.
"""
exitstat = 0 # (default) Exit status
errwaitfactor = 4 # Extra delay after error; best to keep the default value (maximum delay of 4 x 150 = 600 s = 10 min)
maxdelay = 150 # Maximum error delay in seconds (overruling any extreme long processing delays)
minsucrate = 70.0 # Minimum success rate per target language (the script is stopped below this threshold)
# To be set in user-config.py (what parameters is PAWS using?)
"""
noisysleep = 60.0, to avoid the majority/all of the confusing sleep messages
maxlag = 5, to avoid overloading the servers
put_throttle = 1, for maximum transaction speed (bot account required)
"""
# The helptext is displayed with -h
codedoc = """
Add missing language labels to Wikidata items using Wikidata Query and Pywikibot.
It is typically used to amend "national" items, amongst other languages.
It is highly customizable, it works for a large range of instances, and is easily executed.
Basically you only need a SPARQL query to deliver an item list.
Then pagegenerator loops though the item list to amend all missing language labels.
It copies target language labels from the target or source language Wikipedia article name, the source language label (or English as a last resource). At least English should always have a label; Wikipedia is the preferred source, since it is multi-lingual.
Originally this program was written for human items, but it can be used for any valid instance. Pay attention that for most instances it might not make sense to replicate the language label untranslated, unless it is a person name (that is normally language independent).
Parameters:
The parameters are a list of valid ISO 639 language codes.
P1: source language (mandatory)
P2...: list of target languages (can be replaced by -a; more codes can easily be added in the script)
Validations:
No default languages. One source and (at least one) target language is explicitly required.
Target languages must always be different from the source language (languagues refering to themselves are ignored).
Duplicate language codes are ignored.
The Wikidata Query system runs on a replicated database (there is a delay from the live database).
The transactions are validated at runtime to avoid conflicting or duplicate updates.
Source aliases are merged with existing aliases.
When the label is equal to any alias, conflicting aliases are removed.
Aliases with unrecognized alphabeths are ignored.
Label suffixes are removed (parentesis, comma)
Pay attention that all language codes share the same alphabet.
Internal setup:
Additional parameters could be set in the script (e.g. language lists).
Qualifiers:
-a Process all default language targets (you must first set the source language)
-c Show generated code
-d Enable debug mode (show technical info)
-e Disable error statistics reporting
-f File containing a Wikidata query (SPARQL; not allowed together with -a)
-h Show help text (how to use this program; you see the same text when the command is wrong)
-i Item list file
-l Disable language labels (not allowed together with -w)
-m Fast mode (minimum maxlag wait; but this can increase the number of consecutive error transactions)
-n Disable the notability filter: items having a description, a Wikipedia article, Commons Category, etc.
-p Proceed after fatal error (process next item or language)
-q Set quiet mode (show minimum output; basically only errors)
-s Disable language aliases
-t Translation required (disable automatic English fallback)
-v Set verbose mode (show maximum output)
-V Show version (show the version of the program)
-w Disable sourcing from Wikipedia page names (not allowed together with -l)
-x Exclude target descriptions to avoid homonym conflicts (speeds up the processing)
Qualifiers and parameters are processed in the order of occurrence in the command line.
Some limitations and exclusivity can occur.
Some flags have a cumulative effect.
Restrictions:
The choosen source and target languages should all share the same alphabet (P282).
The source language could better align with the country, e.g. France and French to avoid label inconsistencies.
Some countries have multiple languages, so it can be pretty confusing... process them after the unique combinations.
Some languages have a Wikipedia language code that is different from the Wikidata code (e.g. Norwegian).
Examples:
The log file (stdout) can be redirected.
# Transfer labels from nl to fr:
./missing_person_label.py nl fr
# Transfer labels from nl to a list of languages:
./missing_person_label.py nl en fr de es it pt da no fi sv cs hu pl
# Transfer labels from nl to a predefined list of languages:
./missing_person_label.py -n nl -a
# Copy label from Wikipedia page or source label, but ignore English fallback:
./missing_person_label.py -t xx -a
# Disable Wikipedia and avoid conflicting homonyms:
./missing_person_label.py -w -s en nl
# Generate logfile
./missing_person_label.py en -p -a |tee ena.log
# Run offline -> program continues to run after a session disconnect, or an "unsuccessful" language (a log file being created)
nohup ./missing_person_label.py en -p -a > ena.log &
# Move and merge Norwegian language labels, descriptions, aliases
./missing_person_label.py -p -n no -t -w nb
Return status:
The following status is returned to the shell:
0 Normal termination
1 Ctrl-c pressed, program interrupted (multiple Ctrl-c are required when in language update mode)
2 Invalid or missing source or target language (mandatory language pair)
3 Halting program due to huge error count (less than 70% succes; use -p to proceed anyway)
4 Unknown, or conflicting qualifiers -l -w
5 SPARQL query is incompatible with -a flag
9 Help requested (-h)
Author:
Geert Van Pamel, 2020-08-03, CC BY-SA 4.0
Transaction errors:
in/out: Source and target language (transaction succeeded; only the label was updated)
Alias: Only the alias was updated/merged
Both: Both the label and the alias were updated
Name: Firstname and Lastname
Noted: Well-documented on Wikipedia (articles), Wikidata (statements) or Wikimedia Commons (media files)
Pict: Picture available (P18 property)
Safe: Avoid homonym conflicts (target description is filled in)
Skip: Duplicate update avoided (target label and alias were previously set, Wikidata query is possibly wrong)
Stop: Finish the current target language (ctrl-c pressed during update; continue with next language, if any)
Trivia: Unnotable (no language descriptions, nor Wikipedia article available, no picture, no important statements)
Error: A (general) error occurred (either data, network, or technical; see detailed error message)
For no/nb the no labels are transferred (moved/merged) from "no" to "nb".
Documentation:
https://www.wikidata.org/wiki/Wikidata:Wikidata_curricula/Activities/Pywikibot/Missing_label_in_target_language
https://www.wikidata.org/wiki/Help:Contents
https://www.wikidata.org/wiki/Help:Label
https://www.wikidata.org/wiki/Help:Alias
https://www.wikidata.org/wiki/Help:Description
https://www.wikidata.org/wiki/Help:Multilingual
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial
https://www.wikidata.org/wiki/Special:PrefixIndex?prefix=Wikidata:Pywikibot_-_Python_3_Tutorial
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Big_Data
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Iterate_over_a_SPARQL_query
https://www.mediawiki.org/wiki/Manual:Pywikibot
https://www.mediawiki.org/wiki/Manual:Pywikibot/Wikidata
https://www.mediawiki.org/wiki/Manual:Pywikibot/Global_Options
https://www.mediawiki.org/wiki/Manual:Pywikibot/PAWS
https://www.mediawiki.org/wiki/Manual:Pywikibot/user-config.py
https://doc.wikimedia.org/pywikibot/stable/
https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial
https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes (language codes)
Algorithm:
A Wikidata query is executed to get a list of items that have a missing label in the target language.
Then each missing item label is replicated from the Wikipedia article name or the source label to the target language label.
When a Wikipedia page exists in the target or source language, this value is taken (when allowed).
If not, then the English page or label is used, as a fallback (when allowed).
This program can be used to correct data errors. e.g. "no" labels.
Error handling:
Exceptions are properly reported, and counted.
For each target language detailed statistics are reported.
Ctrl-c causes a return to the next higher callstack level (causing eventual program termination).
The same query might run more than once at the same time; duplicate updates are reported and properly skipped.
Transactions might be skipped when data is not available or has already been updated.
Manual interventions (whenever Error) are required to correct any data exceptions, unless caused by external technical factors.
Data errors generally require manual investigation and correction of the corresponding Wikidata items.
Some technical exceptions do not depend on the data (and are related to network, transactions, and server load).
Those transactions can be recovered by re-executing the same command, once again (no human intervention required).
When most transactions for one language fail, the program stops (a serious problem occurred; possible internal logical error).
At the end total transaction statistics are shown.
Components:
Shell (Linux)
Python
Regular expressions
pywikibot (prerequisite; to be installed on a private client, or PAWS)
pagegenerators (part of pywikibot)
Wikidata Query (SPARQL)
Wikibase (database)
PAWS Jupyter notebook (optional, no shell, no command line parameters)
Platforms:
Most of the (Linux) environments have a shell and support parameters.
Linux
A (headless) Raspberry Pi with Raspbian (ideal for home use: low-powered, small size, permanently connected -> "piwikibot").
On a Raspberri Pi 4 it uses around 5-10% CPU.
Linux on an Oracle VirtualBox (Windows host; best of both worlds; better than Cygwin).
PAWS Jupyter (shared environment; is the best solution when you do not own a Linux system)
The PAWS Jupyter environment does not have a shell, so has no parameters; need to edit the script manually.
User interventions:
(Plan Do Check Act)
Think about important missing Wikidata labels
Write a valid Wikidata Query to find those items
Verify the list of items (via Wikidata Query)
Transfer the query into the script
Choose the source language and the list of target languages
Run the script
Monitor the progress
Verify the results
Correct any exceptions manually
Major data selection properties:
P17: Country (company)
P27: Nationality (human)
P31: Instance of (mandatory)
P101: Activity domain
P106: Profession
Preferably include a Wikipedia linkcount condition (> 0 for notable items to avoid flooding the database)
Other properties for specific data selection:
P19: Birth place
P20: Death place
P21: Gender (to be avoided unless we increase women presence...)
P39: Position held
P103: Native language
P282: Alphabet
P569: Birth date
P570: Death date
P571: Creation date
P937: Work location (city)
To further limit the data replication row count:
When the number of rows is too large, the query can return a timeout error. To return fewer rows to process to avoid a query timeout) the following techniques can be used:
P18: Picture
P214: VIAF ID
P373: Commmons category
P800: Notable work
P856: Web site
P990: Voice
P1442: Grave picture
P1472: Commons creator
P3919: Contributed to
Description (to be available, possibly in a specific language, to limit the number of items).
Remove the subclass property suffix (wdt:P31/(wdt:Q329*)).
Add a LIMIT statement with a low enough row count.
Environment variables:
LANG: System language
Transaction logging:
Sequence number
Timestamp
Elapsed time
Error/Status
Item number
Label
Commons category
Aliases between []
Description
Execution speed:
The query should never return more than 50000 items (both for query performance and to limit batch excution elapsed time).
If necessary add a LIMIT statement, or filter more items.
Typical execution speed is 10 items/min, about 14400 items/day (using a non-bot user account).
You may obtain higher execution speeds with a bot account (1 item/s, about 86400 items/day).
When the servers are loaded at full capacity, this might drop to 1 item/min, reducing the speed to only 1440 items/day.
The tool dynamically adapts its execution speed to the load of the Wikidata server.
When the elapased time per transaction increases (tps decreases), every consecutive error adds additional sleep time.
The next successful transaction resets the additional delay back to 0. See the code for details.
It does not help to run multiple instances for the same Bot account concurrently; the transaction rate will drop pro rata.
Transaction tuning:
You can set the following global parameters in user-config.py to tune the transaction execution speed:
noisysleep = 60.0, to avoid the majority/all of the confusing sleep messages
maxlag = 5, to avoid overloading the servers
put_throttle = 1, to speed up a bit without overloading the server (bot account required)
max_retries = 4, avoid needless retry
retry_wait = 30, allow minimum wait time
retry_max = 320, allow maximum wait time
It is not advised to use the -f flag to avoid a slowdown after a technical error, since it may cause multiple consecutive error transactions.
When the servers are heavily loaded, it is better to wait longer after a technical error, to avoid multiple consecutive transaction errors.
Tips:
Keep the SPARQL query as simple as possible, returning only the items you need to process.
First try with one single language pair, and a limited number of items (try your query in Wikidata Query first).
Always include the English language (for source or target) because it allows to easily exchange data amongst languages.
Use your onwn language as source for all the others, to "export" your culture internationally.
English can be considered as the central language for Wikidata.
Every Wikidata item should at least have an English label and description.
Filter for items that have at least one Wikipedia article, in any language (Notability; avoid flooding the database).
You might also filter for items having a source language description, for the same reason.
Do not use the n-flag, to require at least one language description, for the same reason.
Known problems:
Wait until the reporting database is in synch with the live database, to avoid inconsistencies.
Do not rerun the same command immediately again with the same languages.
When no data is obtained, conflicting conditions might be resolved via Wikidata Query.
To avoid a query timeout, try to select fewer data (extra where clauses and filters).
To limit the number of rows add e.g. a LIMIT 1000, 5000, 10000, 15000, 20000, or 30000 statement.
Human names are mostly language independent, unless another alphabet is used.
Pay attention to non-human instances, their name is usually language dependent.
Wikipedia target language pages can usually resolve the translation problem.
Be careful with multi-lingual countries; labels could be replicated in the wrong language.
Pay attention to homonyms, generally requiring manual intervention, to correct the conflicting descriptions.
It mighe be necessary to amend descriptions in case of label conflicts; apply a suffix (birth year-death year).
Manually apply mutual P1889 (different from) to mark those conflicting items.
Or you might need to merge (confirmed) duplicate items.
WARNING: wikibase-form datatype is not supported yet.
When running a Pywikibot script from a PAWS Jupyter OAuth notebook:
* T168222/T252306 - For items having interwiki namespace pages:
WARNING: API error mwoauth-invalid-authorization-invalid-user:
The authorization headers in your request are for a user that does not exist here.
See https://phabricator.wikimedia.org/T168222 and T252306 for more details.
Possible ameliorations:
Know the number of items before starting the update loop.
Supporting tools:
https://www.wikidata.org (Wikidata)
https://query.wikidata.org (SPARQL)
https://hub.paws.wmcloud.org (PAWS)
Example queries:
There are different possibilities. Work out what is best for you. Be creative. Follow your domains of interest.
Add missing hospital labels:
SELECT DISTINCT ?item WHERE {
VALUES ?hascountry { wdt:P17 }
VALUES ?instance { wd:Q16917 }
VALUES ?country { wd:Q145 wd:Q30 }
?item wdt:P31/wdt:P279* ?instance;
?hascountry ?country;
wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = 'en') FILTER(?linkcount > 0) MINUS {
?item rdfs:label ?label. FILTER((LANG(?label)) = 'nl') } }
Add missing organisation labels: (no subclass wdt:P279* statement)
SELECT DISTINCT ?item WHERE {
VALUES ?hascountry { wdt:P17 }
VALUES ?instance { wd:Q43229 }
VALUES ?country { wd:Q183 }
?item wdt:P31 ?instance;
?hascountry ?country; wdt:P571 ?createdate;
wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = 'de') FILTER(?linkcount > 0) MINUS {
?item rdfs:label ?label. FILTER((LANG(?label)) = 'nl') } }
Add missing politician labels: (include only a single profession)
SELECT DISTINCT ?item WHERE {
VALUES ?hascountry { wdt:P27 }
VALUES ?instance { wd:Q5 }
VALUES ?country { wd:Q31 }
VALUES ?profession { wd:Q82955 }
?item wdt:P31 ?instance;
?hascountry ?country; wdt:P106/wdt:P279* ?profession;
wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = 'nl') FILTER(?linkcount > 0) MINUS {
?item rdfs:label ?label. FILTER((LANG(?label)) = 'de') }
Add missing activity domain labels: (add the subclass wdt:P279* statement)
SELECT DISTINCT ?item WHERE {
VALUES ?hascountry { wdt:P27 }
VALUES ?instance { wd:Q5 }
VALUES ?country { wd:Q29999 }
VALUES ?domain { wd:Q184485 }
?item wdt:P31 ?instance;
?hascountry ?country; wdt:P101/wdt:P279* ?domain;
wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = 'nl') FILTER(?linkcount > 0) MINUS {
?item rdfs:label ?label. FILTER((LANG(?label)) = 'it') } }
Add missing lastname labels: (remark the LIMIT statement)
SELECT DISTINCT ?item WHERE {
VALUES ?instance { wd:Q101352 }
?item wdt:P31 ?instance; wdt:P282 wd:Q8229;
schema:description ?itemDescription;
wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = "nl") FILTER(?linkcount > 0)
FILTER((LANG(?itemDescription)) = 'nl')
MINUS { ?item wdt:P31 wd:Q4167410. } MINUS {
?item rdfs:label ?label. FILTER((LANG(?label)) = "fr") } } LIMIT 1000
Move Norway language labels:
SELECT DISTINCT ?item WHERE {
VALUES ?instance { wd:Q5 }
VALUES ?hascountry { wdt:P27 }
VALUES ?country { wd:Q38 }
?item wdt:P31 ?instance;
?hascountry ?country;
rdfs:label ?itemLabel.
FILTER((LANG(?itemLabel)) = 'no') }
SELECT DISTINCT ?item WHERE {
VALUES ?profession { wd:Q483501 }
?item wdt:P31 wd:Q5;
wdt:P106/wdt:P279* ?profession;
skos:altLabel ?itemAlias.
FILTER((LANG(?itemAlias)) = 'no') }
Add missing person labels:
Add missing family names:
Other ideas:
P373: Commons category not having a language label
P1472: Commons institution, not having a Commons category
P1612: Commons creator, not having a Commons category
P1889: Duplicate labels: add P1889 for any items having the same language label (homonyms, any language)
Technical documentation:
(technical configuration)
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_up_Shop
https://www.wikidata.org/wiki/Wikidata:Creating_a_bot
(only for programmers)
https://docs.python.org (official documentation)
https://docs.python.org/3/library/re.html
https://docs.python.org/3/howto/sorting.html
https://www.w3schools.com/python/ (very nice site)
https://www.w3schools.com/python/python_datatypes.asp
https://www.w3schools.com/python/python_lists.asp
(caveat)
https://stackoverflow.com/questions/47986224/list-assignment-in-python (list assignment by reference; take a copy!)
Configuration:
user-config.py (not required for PAWS because OAuth is used)
Adapt the Wikidata query in this source code
Design principles:
Keep it simple, generic, structured, extendible, strict, clean, well documented, modular, understandable, following standards and guidelines, user friendly, stable, safe, efficient, error-resistent, and collaborative for our human volunteers. Let the system and the servers do most of the work without overloading them; the programmer is lazy.
Origin:
A prototype (proof of concept) was first developed using Wikidata Query and QuickStatements,
with a manual intervention to create the tranaction file with Excel, see
https://www.wikidata.org/wiki/User:Geertivp/training/Wikidata_Query/Missing_label_in_target_language
The current program is an MVP, see https://en.wikipedia.org/wiki/Minimum_viable_product, but it is still evolving.
Can also run on PAWS (unsolved problem T168222/T252306 for items having interwiki namespace pages).
Database:
Wikidata (Wikibase)
Programming this tool requires quite a lot of insight in how an RDF database works, more specifically Wikidata, and SPARQL.
It also requires decent knowledge of Python, Linux, and/or PAWS.
Automatic corrections:
Aliases being equal to the label will be removed from source and target languages.
"""
# List the required modules
import os # Operating system: getenv
import re # Regular expressions (very handy!)
import sys # System: argv, exit (get the parameters, terminate the program)
import time # sleep
import pywikibot # API interface to Wikidata
import urllib.parse # URL encoding/decoding (e.g. Wikidata Query URL)
from datetime import datetime # now, strftime, delta time, total_seconds
from pywikibot import pagegenerators as pg # Wikidata Query interface
def fatal_error(errcode, errtext):
"""
A fatal error has occurred; we will print the error messaga, and exit with an error code
"""
global exitstat
exitstat = errcode
print(errtext)
if exitfatal: # unless we ignore fatal errors
sys.exit(errcode)
else:
print('Proceed after fatal error')
def wd_proc_all_items_for_lang(inlang, outlang):
"""
Main logic: Executed for each of the target languages.
This Wikidata Query gets the initial list of items, and adds the missing label.
It is generated at runtime.
Main parameters: inlang, outlang.
Craft your query using the below template: (un)comment respective RSEs, add additional properties + filters.
You should limit tne number of selected Q-numbers.
Depending on the instance, additional query statements should be added.
For e.g. Q31 human, nationality and either profession or activity domain should be included.
Tip: prepare, verify, and troubleshoot the query using https://query.wikidata.org.
"""
global exitstat
global langcount
global langdone
global totranscount
"""
Here you can design your own query by (un)commenting and changing property values.
Only (DISTINCT) item is required/allowed in the results.
The source and target language filter the resulting item list.
"""
# Can be overruled with wdquery
querytxt = """# Search for Belgian citizens with a missing language label
SELECT DISTINCT ?item WHERE {
VALUES ?instance { # One single instance
#wd:Q166118 # archief
#wd:Q4830453 # bedrijf
#wd:Q101352 # last name
wd:Q5 # mens
#wd:Q215380 # muziekgroep
#wd:Q43229 # organisatie
#wd:Q16917 # ziekenhuis
}
VALUES ?hascountry { # Choose one single property
#wdt:P17 # country
#wdt:P495 # land van oorsprong
wdt:P27 # nationality
}
VALUES ?country { # Please process in sequence to avoid language conflicts
# You can group countries that share the same source language
#wd:Q145 # UK
#wd:Q30 # USA
#wd:Q142 # Frankrijk
#wd:Q32 # Luxemburg
#wd:Q183 # Duitsland
wd:Q39 # Zwitserland
#wd:Q40 # Oostenrijk
#wd:Q31 # Belgie
#wd:Q29999 # Nederland
#wd:Q38 # Italie
#wd:Q29 # Spanje
#wd:Q20 # Noorwegen
#wd:Q34 # Zweden
#wd:Q33 # Finland
#wd:Q35 # Denemarken
#wd:Q36 # Polen
#wd:Q298 # Chili
#wd:Q155 # Brazilie
#wd:Q928 # Filippijnen
#wd:Q15180 # Sovjetunie
#wd:Q129286 # Brits Indie
#wd:Q70972 # Koninkrijk Frankrijk
#wd:Q1747689# Oude Rome
#wd:Q11768 # Oud Egypte
#wd:Q179293 # Castilie
#wd:Q948 # Tunesie
#wd:Q668 # India
#wd:Q851 # Arabia
#wd:Q93180 # Seleudidische rijk
#wd:Q9683 # Tang dynastie
#wd:Q408 # Australie
#wd:Q172579 # Koninkrijk Italie
}
VALUES ?profession { # Multiple values are allowed
wd:Q40348 # advocaat
#wd:Q12961474 # anarchist
#wd:Q11513337 # atleet
#wd:Q33999 # acteur
#wd:Q212238 # ambtenaar
#wd:Q482980 # auteur
#wd:Q1281618 # beeldhouwer
#wd:Q15214752 # cabaretier
#wd:Q5716684 # danser
#wd:Q193391 # diplomaat
#wd:Q39631 # dokter
#wd:Q188094 # economist
#wd:Q4964182 # filosoof
#wd:Q2526255 # filmregisseur
#wd:Q169470 # fysicus
#wd:Q33231 # fotograaf
#wd:Q81096 # ingenieur
#wd:Q1930187 # journalist
#wd:Q483501 # kunstenaar
#wd:Q1028181 # kunstschilder
#wd:Q47064 # militair
#wd:Q5322166 # ontwerper
#wd:Q82955 # politieker
#wd:Q42603 # priester
#wd:Q121594 # professor
#wd:Q16533 # rechter
#wd:Q1028181 # schilder
#wd:Q2066131 # sporter
#wd:Q186360 # verpleger
#wd:Q937857 # voetballer
#wd:Q901 # wetenschapper
#wd:Q2309784 # wielrenner
#wd:Q170790 # wiskundige
#wd:Q177220 # zanger
#wd:Q55645123 # zorgverstrekker
}
VALUES ?domain { # One single value
#wd:Q11023 # engineering
#wd:Q11190 # geneeskunde
#wd:Q21198 # informatica
#wd:Q15980804 # media
#wd:Q184485 # podiumkunsten
#wd:Q336 # wetenschap
#wd:Q395 # wiskunde
#wd:Q349 # sport
wd:Q2695280 # techniek
}
?item wdt:P31 ?instance; # Main query options
#?item wdt:P31/wdt:P279* ?instance # Much more rows; might impact the performance
?hascountry ?country; # Filter on country
#wdt:P19 ?birthplace;
#wdt:P20 ?deathplace;
wdt:P39 [];
#wdt:P101/wdt:P279* ?domain;
#wdt:P103 ?nativelang;
#wdt:P106 ?profession;
#wdt:P106/wdt:P279* ?profession;
#wdt:P131 ?city wd:Q456544;
#wdt:P214 ?viafid;
#wdt:P282 wd:Q8229; # Roman alphabet (e.g. last name)
#wdt:P373 ?commonscat;
#wdt:P569 ?birthdate;
#wdt:P570 ?deathdate;
#wdt:P571 ?createdate;
#wdt:P646 ?freebaseid;
#wdt:P734 ?lastname;
#wdt:P735 ?firstname;
#wdt:P1705 ?labelofflang;
wikibase:sitelinks ?linkcount;
#schema:description ?itemDescription;
#skos:altLabel ?itemAlias;
rdfs:label ?itemLabel.
#?article schema:about ?item; # Copy the label from the target language Wikipedia page
#schema:inLanguage '""" + outlang + """';
#schema:isPartOf <https://""" + outlang + """.wikipedia.org/>.
FILTER(?linkcount > 0) # increasing notability (recommended)
FILTER((LANG(?itemLabel)) = '""" + inlang + """')
#FILTER((LANG(?itemAlias)) = '""" + inlang + """')
#FILTER((LANG(?itemDescription)) = '""" + inlang + """')
#FILTER(?birthdate >= "1920-01-01T00:00:00Z"^^xsd:dateTime) # limit number of rows
#FILTER(?createdate >= "2000-01-01T00:00:00Z"^^xsd:dateTime) # limit number of rows
MINUS { ?item wdt:P31 wd:Q4167410. } # Wikipedia redirect page
MINUS { ?item wdt:P31 wd:Q4167836. } # Wikimedia category
#MINUS { ?item wdt:P18 ?picture. } # Skip picture with PAWS, see T168222
# Search items with missing label for the target language (mandatory)
MINUS { ?item rdfs:label ?label. FILTER((LANG(?label)) = '""" + outlang + """') }
}
LIMIT 50000 # take lower values to avoid any query timeout
# Possibly add more properties or filters
"""
# Verify that the target language is OK
if outlang == inlang or outlang in langdone or not langre.search(outlang): # Skip input language, duplicate and bad language codes (generated by PAWS first run)
print('Skipping language %s' % outlang)
return
# Print preferences
if verbose or debug:
print('\nUse labels:\t%s' % uselabels)
print('Avoid homonym:\t%s' % safemode)
print('Use aliases:\t%s' % usealias)
print('Fallback on English:\t%s' % fallback)
print('Use Wikipedia:\t%s' % wikipedia)
print('Notability:\t%s' % notability)
print('\nShow code:\t%s' % showcode)
print('Verbose mode:\t%s' % verbose)
print('Debug mode:\t%s' % debug)
print('Exit on fatal error:\t%s' % exitfatal)
print('Error wait factor:\t%d' % errwaitfactor)
langcount += 1 # Process the next target language
"""
Build the SPARQL query.
You can paste the resulting code into Wikidata Query to troubleshoot.
Tip: click on the diamond to reformat.
"""
squery = querytxt # Generate or overrule the query
if wdquery != '':
squery = wdquery
squery = humsqlre.sub(' ', squery) # Convert to human readable (remove comments)
if verbose:
print('\nQuery %d for %s %s:' % (langcount, inlang, outlang))
print(squery) # Show human readable formatted query (usefull to debug)
squery = comsqlre.sub(' ', squery) # Convert to computer readable (remove duplicate whitespace)
if showcode or debug:
print('\nSPARQL query for %s %s:' % (inlang, outlang))
print(squery) # Show computer readable query (usefull to debug)
# Execute the SPARQL query to get the item list
# The list could possibly be empty
generator = pg.WikidataSPARQLPageGenerator(squery, site=wikidata_site)
# Loop initialisation
transcount = 0 # Total transaction counter
notecount = 0 # Notability count
pictcount = 0 # Picture count
unotecount = 0 # Notability problem
safecount = 0 # Safe transaction
dupcount = 0 # Duplicate counter
errcount = 0 # Error counter
errsleep = 0 # Technical error penalty (sleep delay in seconds)
# Avoid that the user is waiting for a response while the data is being queried
if verbose:
print('\nReplicating language labels from %s to %s' % (inlang, outlang))
# Set the Wikipedia instance
inwiki = inlang + 'wiki'
wpwiki = outlang + 'wiki' # Set default Wikipedia sitecode
if outlang == 'nb': # nn has separate nnwiki
wpwiki = 'nowiki' # The Norwegian Wikipedia code is different from the Wikidata language code
# Transaction timing
now = datetime.now() # Start the main transaction timer
status = 'Start' # Force loop entry
# Process all items in the list
for item in generator: # Main loop for all DISTINCT items
if status != 'Stop': # Ctrl-c pressed -> stop this target language in a proper way
if debug:
print(item) # Format: [[wikidata:Q...]]
# Transaction initialisation required for status reporting (mind the error handling)
transcount += 1 # New transaction for this language pair
status = inlang + '/' + outlang # Default status OK, unless an error occurs
label = '' # Item labels
alias = [] # No alias yet (remark the list)
descr = '' # No description yet
wpart = '' # No Wikipedia page yet
commonscat = '' # Commons category
try: # Error trapping (prevents premature exit on transaction error)
item.get() # Get the data item (problem T168222/T252306 with interwiki namespace pages with PAWS OAuth)
"""
Get the labels and aliases for the source language and English
"""
inlabel = ''
if inlang in item.labels:
inlabel = item.labels[inlang]
enlabel = ''
if reflang in item.labels:
enlabel = item.labels[reflang]
"""
The target label is missing; this script will search the best one.
Priority is given to Wikipedia page names (by language, target language first).
Then the source language labels.
Then the English Wikipedia, or label, as fall back - if allowed.
Any suffixes are stripped from the label, as required as per the label guidelines.
"""
if 'P735' in item.claims and 'P734' in item.claims: # Firstname and lastname
label = item.claims['P735'][0].getTarget() + ' ' + item.claims['P734'][0].getTarget()
status = 'Name'
elif wikipedia and wpwiki in item.sitelinks: # Get target sitelink
sitelink = item.sitelinks[wpwiki]
linklabel = urlbre.search(str(sitelink)) # Output URL superseeds source label
wpart = linklabel.group(0)
label = wpart # Get Wikipedia page
elif wikipedia and inwiki in item.sitelinks: # Get source sitelink
sitelink = item.sitelinks[inwiki]
linklabel = urlbre.search(str(sitelink)) # Input URL superseeds source label
wpart = linklabel.group(0)
label = wpart
elif uselabels and inlabel != '': # Get source label, if no URL
label = inlabel
elif wikipedia and fallback and refwiki in item.sitelinks: # Get English sitelink
sitelink = item.sitelinks[refwiki]
linklabel = urlbre.search(str(sitelink)) # English URL superseeds source label
wpart = linklabel.group(0)
label = wpart
elif uselabels and fallback and enlabel != '':
label = enlabel # Default English label
if label != '':
baselabel = suffre.search(label) # Remove () suffix, if any (we do the same for a comma)
if baselabel:
label = label[:baselabel.start()] # Get canonical form
"""
Try to get any usefull alias; remark that a copy of the list is made (lists are assigned by reference).
"""
# Get the aliases; remark the list
inalias = []
if inlang in item.aliases:
inalias = item.aliases[inlang]
enalias = []
if reflang in item.aliases:
enalias = item.aliases[reflang]
if outlang in item.aliases:
alias = item.aliases[outlang]
elif usealias and inalias != []: # Get source alias
alias = inalias[:] # list datatype (take a copy!)
elif usealias and fallback and enalias != []: # Get source alias, if no URL
alias = enalias[:] # list datatype (take a copy!)
# Insert missing alias from label
if uselabels and inlabel != '' and label != inlabel and not inlabel in alias:
alias.insert(0, inlabel)
elif uselabels and fallback and enlabel != '' and label != enlabel and not enlabel in alias:
alias.insert(0, enlabel)
# Cleanup duplicate aliases and aliases in non-native alphabets
if inlabel in inalias:
inalias.remove(inlabel) # Filter label (alias <> label)
# (We do not touch enalias => shared resource)
if label in alias:
alias.remove(label) # Filter label (alias <> label)
for ai in alias:
if not romanre.search(ai): # Filter non-Roman strings
alias.remove(ai)
"""
The user should see the description in any of the available languages.
Get optional data before the update transaction.
Perform this first because we need it in case of an update error.
We take the target, source, or English description in that order.
"""
indescr = ''
if inlang in item.descriptions:
indescr = item.descriptions[inlang]
outdescr = ''
if outlang in item.descriptions:
outdescr = item.descriptions[outlang]
if outdescr == '':
outdescr = indescr
if 'P18' in item.claims:
status = 'Pict'
pictcount += 1 # Possible duplicate counting...
if outdescr != '': # Get item description in target language
descr = outdescr
if safemode: # Avoid homonym label/description conflicts
status = 'Safe'
elif indescr != '': # input item description
descr = indescr
elif mainlang in item.descriptions: # Main language item description
descr = item.descriptions[mainlang]
elif 'nl' in item.descriptions: # Dutch item description
descr = item.descriptions['nl']
elif 'fr' in item.descriptions: # French item description
descr = item.descriptions['fr']
elif 'en' in item.descriptions: # English item description
descr = item.descriptions['en']
elif 'de' in item.descriptions: # Deutsch item description
descr = item.descriptions['de']
elif 'es' in item.descriptions: # Spanish item description
descr = item.descriptions['es']
elif 'it' in item.descriptions: # Italian item description
descr = item.descriptions['it']
elif wpart != '' or status in ['Pict'] or 'P373' in item.claims or 'P39' in item.claims or 'P214' in item.claims or 'P800' in item.claims or 'P856' in item.claims or 'P990' in item.claims or 'P1442' in item.claims or 'P1472' in item.claims or 'P3919' in item.claims: # Avoid items that have no description, unless they have a Wikipedia page, picture, Commons category, function, VIAF, notable work, website, voice, grave picture, Commons creator, contributed to
status = 'Noted'
notecount += 1
elif notability:
status = 'Trivia'
"""
For additional/optional information.
"""
# Get Commons category/creator
if 'P373' in item.claims: # Wikimedia Commons Category
commonscat = item.claims['P373'][0].getTarget()
elif 'P1472' in item.claims: # Wikimedia Commons Creator
commonscat = item.claims['P1472'][0].getTarget()
"""
This is the main functionality. We have gathered all required data now.
Now we verify which type of update can safely be performed, without overwriting any exiting label/alias values.
Note that there can be a replication delay betwen the live and the reporting database, so you should take care of unexpected transactions (possibly caused by concurrent users).
We remove duplicate labels from source aliases. We do not add duplicate labels to the target language aliases.
errsleep must only be reset when the update transaction was successful.
We prefer editEntity because it is generic (and avoid the specific editLabels and editAliases, because that would have been too complex).
"""
if status == 'Trivia': # Not notable - do not replicate
unotecount += 1
elif status == 'Safe': # Avoid homonym naming/description conflicts
safecount += 1
elif outlang in item.labels and item.labels[outlang] != label or label == '': # Target label was already updated by other user
if label != '' and not label in alias: # Add label in alias
alias.append(label)
if item.labels[outlang] in alias: # Remove duplicate alias
alias.remove(item.labels[outlang])
if inlang == 'no' and outlang in ['nb', 'nn']: # Remove language labels
for ai in inalias: # Merge aliases
if not ai in alias:
alias.append(ai)
item.editEntity( {'labels': {inlang: ''}, 'aliases': {outlang: alias, inlang: []}, 'descriptions': {outlang: outdescr, inlang: ''}}, summary=transcmt ) # Update only the alias (rare)
status = 'Alias'
errsleep = 0 # Successful transaction -> Reset any pending penalty wait
elif not outlang in item.aliases and alias != []:
item.editAliases(aliases={outlang: alias, inlang: inalias}, summary=transcmt ) # Update only the alias
status = 'Alias'
errsleep = 0 # Successful transaction -> Reset any pending penalty wait
else:
status = 'Skip' # Both label and alias already set (nothing to do here)
dupcount += 1
elif outlang in item.aliases or alias == []: # Update only the label
if inlang == 'no' and outlang in ['nb', 'nn']: # Remove language labels
for ai in inalias: # Merge aliases
if not ai in alias:
alias.append(ai)
item.editEntity( {'labels': {outlang: label, inlang: ''}, 'aliases': {outlang: alias, inlang: []}, 'descriptions': {outlang: outdescr, inlang: ''}}, summary=transcmt ) # Move the label from inlang to outlang
else:
item.editEntity( {'labels': {outlang: label}, 'aliases': {inlang: inalias}}, summary=transcmt )
errsleep = 0 # Successful transaction -> Reset any pending penalty wait
else: # Update both the target item language label and alias
if inlang == 'no' and outlang in ['nb', 'nn']: # Remove language labels
item.editEntity( {'labels': {outlang: label, inlang: ''}, 'aliases': {outlang: alias, inlang: []}, 'descriptions': {outlang: outdescr, inlang: ''}}, summary=transcmt )
else:
item.editEntity( {'labels': {outlang: label}, 'aliases': {outlang: alias, inlang: inalias}}, summary=transcmt )
status = 'Both'
errsleep = 0 # Successful transaction -> Reset any pending penalty wait
"""
Error handling section
"""
except KeyboardInterrupt:
status = 'Stop' # Ctrl-c trap; process next language, if any
dupcount += 1 # Press a second time Ctrl-c to stop the script completely
exitstat = 1
except: # Attempt error recovery
if exitfatal: # Stop on first error
raise
status = 'Error' # Handle any generic error
errcount += 1
deltasecs = int((datetime.now() - now).total_seconds()) # Calculate technical error penalty
if deltasecs >= 30: # Technical error; for transactional errors there is no wait time increase
errsleep += errwaitfactor * min(maxdelay, deltasecs)
# Technical errors get additional penalty wait
# Consecutive technical errors accumulate the wait time, until the first successful transaction
# We limit the delay to a multitude of maxdelay seconds
if errsleep > 0: # Allow the servers to catch up; slowdown the transaction rate
print('%d seconds maxlag wait' % errsleep)
time.sleep(errsleep)
"""
The transaction was either executed correctly, or an error occurred.
Possibly already a system error message was issued.
We will report the results here, as much as we can, one line per item.
"""
# Get the elapsed time in seconds and the timestamp in string format
prevnow = now # Transaction status reporting
now = datetime.now() # Refresh the timestamp to time the following transaction
if verbose or status in ['Error', 'Stop']: # Print transaction results
isotime = now.strftime("%Y-%m-%d %H:%M:%S") # Only needed to format output
totsecs = (now - prevnow).total_seconds() # Elapsed time for this transaction
print('%d\t%s\t%f\t%s\t%s\t%s\t%s\t%s\t%s' % (transcount, isotime, totsecs, status, item.getID(), label, commonscat, alias, descr))
"""
One language has been completed.
Print the transaction success rate for the current target language.
When the success rate is < 100% the reason of the errors should be investigated.
"""
# Report aggregate statistics for this language pair
suctranscount = transcount - unotecount - safecount - dupcount - errcount
totranscount += suctranscount
if transcount > 0: # Avoid division by zero error
sucrate = 100.0 * suctranscount / transcount
if errorstat:
print('\nStatistics for language %s %s:\n%d skipped\n%d unsafe\n%d errors\n%d unnotable\n%d notable\n%d pictures\n%d done\n%d transactions\n%f%% successful' % ( inlang, outlang, dupcount, safecount, errcount, unotecount, notecount, pictcount, suctranscount, transcount, sucrate) )
if sucrate < minsucrate:
fatal_error(3, 'Halting program due to huge error count') # Can be overruled with -p
else:
print('No transactions for language %s' % outlang)
# Prevent that the same target language is executed repeatedly (avoid duplicate transactions)
langdone.append(outlang)
if debug:
print('Languages done: %s' % sorted(langdone))
def wd_proc_all_languages():
"""
Process a standard list of languages.
You need to always give one mandatory source language as P1.
Not compatible with a SPARQL query.
"""
if not langre.search(inlang): # Source language required (fatal error)
fatal_error(2, 'Input language missing or invalid; use -h for help')
sys.exit(2) # Required parameter, must stop
if wdquery != '':
fatal_error(5, 'SPARQL query incompatible with -a flag; use -h for help')
sys.exit(5) # Fatal error, must stop
if debug:
print('Process target languages %s' % listlang)
for outlang in listlang:
wd_proc_all_items_for_lang(inlang, outlang) # Execute all items for one language
def show_help_text():
# Show program help and exit (only show head text)
helptxt = re.search(r'^(.*\n)+\nDocumentation:\n\n.+\n', codedoc)
if helptxt:
print(helptxt.group(0)) # Show helptext
sys.exit(9) # Must stop
def show_prog_version():
# Show program version
print('%s version %s' % (modnm, pgmid))
def get_next_param():
"""
Get the next command parameter, and handle any qualifiers
"""
global showcode
global debug
global errwaitfactor
global exitfatal
global fallback
global notability
global safemode
global usealias
global uselabels
global verbose
global wikipedia
global wdquery
cpar = sys.argv.pop(0) # Get next command parameter
if debug:
print('Parameter %s' % cpar)
if cpar.startswith('-a'): # all languages
wd_proc_all_languages()
elif cpar.startswith('-c'): # code check
showcode = True
print('Show generated code')
elif cpar.startswith('-d'): # debug mode
debug = True
print('Setting debug mode')
elif cpar.startswith('-e'): # error stat
errorstat = False
print('Disable error statistics')
elif cpar.startswith('-f'): # file Wikidata query
fname = sys.argv.pop(0)
print('Reading Wikidata query from "%s"' % fname)
with open(fname, 'r') as query_file:
wdquery = query_file.read()
wdquery = urllib.parse.unquote(wdquery) # Decode URL
if showcode:
print(wdquery)
elif cpar.startswith('-h'): # help
show_help_text()
elif cpar.startswith('-l'): # language labels
if not wikipedia:
fatal_error(4, 'Conflicting qualifiers -l -w')
uselabels = False
print('Disable label reuse')
elif cpar.startswith('-m'): # fast mode
errwaitfactor = 1
print('Setting fast mode')
elif cpar.startswith('-n'): # notability
notability = False
print('Disable notability mode')
elif cpar.startswith('-p'): # proceed after fatal error
exitfatal = False
print('Setting proceed after fatal error')
elif cpar.startswith('-q'): # quiet mode
verbose = False
print('Setting quiet mode')
elif cpar.startswith('-s'): # alias (synonym) usage
usealias = False
print('Disable alias reuse')
elif cpar.startswith('-t'): # translation required (no English)
fallback = False
print('Disable translation fallback')
elif cpar.startswith('-v'): # verbose mode
verbose = True
print('Setting verbose mode')
elif cpar.startswith('-V'): # Version
show_prog_version()
elif cpar.startswith('-w'): # disallow Wikipedia
if not uselabels:
fatal_error(4, 'Conflicting qualifiers -l -w')
wikipedia = False
print('Disable Wikipedia reuse')
elif cpar.startswith('-x'): # safe mode
safemode = True
print('Setting safe mode')
elif cpar.startswith('-'): # unrecognized qualifier (fatal error)
fatal_error(4, 'Unrecognized qualifier; use -h for help')
return cpar # Return the parameter or the qualifier to the caller
# Main program entry
# First identify the program
if verbose:
show_prog_version() # Print the module name
try:
pgmnm = sys.argv.pop(0) # Get the name of the executable
if debug:
print('%s version %s' % (pgmnm, pgmid)) # Physical program
except:
shell = False
print('No shell available') # Most probably running on PAWS Jupyter
"""
Start main program logic
Precompile the Regular expressions, once (for efficiency reasons; they will be used in loops)
"""
humsqlre = re.compile(r'\s*#.*\n') # Human readable query, remove all comments including LF
comsqlre = re.compile(r'\s+') # Computer readable query, remove duplicate whitespace
urlbre = re.compile(r'[^\[\]]+') # Remove URL square brackets (keep the article page name)
suffre = re.compile(r'\s*[(,]') # Remove () and , suffix (keep only the base label)
langre = re.compile(r'^[a-z]{2,3}$') # Verify for valid ISO 639-1 language codes
romanre = re.compile(r'^[a-z ."\'åáàâäãæčçéèêëēíìîïñóòôöőðøšßúùûüž-]{2,}$', flags=re.IGNORECASE) # Roman alphabet
totranscount = 0 # Total successful transactions
langcount = 0 # Count of languages processed
# Global parameters
mainlang = os.getenv('LANG', 'nl')[:2] # Default description language
if verbose or debug:
print('Main language:\t%s' % mainlang)
print('Maximum delay:\t%d s' % maxdelay)
print('Minimum success rate:\t%f%%' % minsucrate)
# Get the source language
inlang = '-' # Adapt and hardcode the target language code when no shell
while len(sys.argv) > 0 and inlang.startswith('-'): # Get first non-qualifier
inlang = get_next_param().lower() # get P1 = Source language (mandatory parameter)
if not langre.search(inlang): # Source language required (fatal error)
fatal_error(2, 'Input language missing or invalid; use -h for help')
sys.exit(2) # Required parameter, must stop
# Connect to database
transcmt = 'Pwb copy ' + inlang + ' label' # Wikidata transaction comment
wikidata_site = pywikibot.Site("wikidata", "wikidata") # Login to Wikibase instance
# Main loop for list of target languages
while len(sys.argv) > 0:
# Get every ISO 639 language code in lowercase
outlang = get_next_param().lower() # Loop for all target languages (mandatory parameter)
if not outlang.startswith('-'): # Skip qualifiers
wd_proc_all_items_for_lang(inlang, outlang) # Execute all items for one language
# PAWS environment
if not shell:
wd_proc_all_languages() # (un)comment to process all target languages when no shell
# wd_proc_all_items_for_lang(inlang, 'nl') # (un)comment and hardcode one target language when no shell
"""
Print all sitelinks (base addresses)
PAWS is using tokens (passwords can't be used because Python scripts are public)
Shell is using passwords (from user-password.py file)
"""
if debug:
for site in sorted(pywikibot._sites.values()):
print(site, site.username(), site.is_oauth_token_available(), site.logged_in())
if langcount > 0:
print('\n%d languages processed for %s\n%d total successful transactions' % (langcount, inlang, totranscount) ) # All OK
else:
fatal_error(2, 'Input and/or output language missing; use -h for help')
sys.exit(exitstat)