Wikidata:Wikidata curricula/Activities/Pywikibot/Missing label in target language/code

#!/usr/bin/python3

# Global technical parameters
modnm = 'Pywikibot missing_person_label'    # Module name (using the Pywikibot package)
pgmid = '2020-10-29 (gvp)'	                # Program ID and version

"""
    Static definitions
"""

# Functional configuration flags
# Restrictions: cannot disable both labels and wikipedia. We need at least one of the options.
usealias = True     # Allow using the language aliases (disable with -s)
fallback = True		# Allow for English fallback (could possibly be embarassing for local languages; disable with -t)
notability = True	# Notability requirements (disable with -n; this is not encouraged, unless for "no"-cleanup)
safemode = False	# Avoid label/description homonym conflicts (can be activated with -x when needed)
uselabels = True	# Use the language labels (disable with -l)
wikipedia = True	# Allow using Wikipedia article names (best, because Wikipedia is multilingual; disable with -w)
wdquery = ''        # Wikidata query (generated from template by default)
itemlist = []       # Item list from input file (parameter -i)

"""
    Language setup: 2 or 3 letter ISO 639-1 code (bad codes are skipped).
    Restrictions: the list of languages must be completely changed when using other alphabets than Roman.

    Block (and skip) incompatible alphabets or languages (setup other codes for non-Roman alphabets).
    Prevent target languages to be processed repeatedly within the same command.
    Disallow some languages, e.g. when there are too many errors for a certain language with -a.

    Unexisting alphabeths, e.g. the cz language does not exist (Czechia country) - the correct code is cs (Czech language).
    In Czech and Polish there are no clear/fixed rules for (foreign) person naming (ova-problem).
    Norwegian has 3 language codes: nb, nn, no (nb is the preferred one; no is only used for Wikipedia and is excluded for Wikidata).
"""

reflang = 'en'      # Reference language code (English)
refwiki = 'enwiki'  # Reference Wikipedia (English)
langdone = ['ar', 'arz', 'bg', 'cs', 'cz', 'fa', 'he', 'hu', 'no', 'pl', 'ru', 'zh']    # Languages done

"""
    Default target list of languages; only add compatible languages.
    The current values are valid for West-European languages (Using the Roman alphabet).
    Important: langdone and listlang must be adapted for other alphabets.
    Do not use 'no' (Norwegian Wikipedia code).
"""

# List of target languages
listlang = ['nl', 'fr', 'en', 'de', 'it', 'es', 'pt', 'da', 'fi', 'nb', 'sv']

# Technical configuration flags
# Defaults: transparent and safe
debug = False		# Can be activated with -d (errors and configuration changes are always shown)
errorstat = True    # Show error statistics (disable with -e)
exitfatal = True	# Exit on fatal error (can be disabled with -p; please take care)
shell = True		# Shell available (command line parameters are available; automatically overruled by PAWS)
showcode = False	# Show the generated SPARQL code (activate with -c)
verbose = True		# Can be set with -q or -v (better keep verbose to monitor the bot progress)

# Technical parameters

"""
    Default error penalty wait factor (can be overruled with -f).
    Larger values ensure that maxlag errors are avoided, but temporarily delay processing.
    It is advised not to overrule this value.
"""

exitstat = 0        # (default) Exit status
errwaitfactor = 4	# Extra delay after error; best to keep the default value (maximum delay of 4 x 150 = 600 s = 10 min)
maxdelay = 150		# Maximum error delay in seconds (overruling any extreme long processing delays)
minsucrate = 70.0   # Minimum success rate per target language (the script is stopped below this threshold)

# To be set in user-config.py (what parameters is PAWS using?)
"""
    noisysleep = 60.0, to avoid the majority/all of the confusing sleep messages
    maxlag = 5, to avoid overloading the servers
    put_throttle = 1, for maximum transaction speed (bot account required)
"""

# The helptext is displayed with -h
codedoc = """
Add missing language labels to Wikidata items using Wikidata Query and Pywikibot.

It is typically used to amend "national" items, amongst other languages.
It is highly customizable, it works for a large range of instances, and is easily executed.

Basically you only need a SPARQL query to deliver an item list.
Then pagegenerator loops though the item list to amend all missing language labels.

It copies target language labels from the target or source language Wikipedia article name, the source language label (or English as a last resource). At least English should always have a label; Wikipedia is the preferred source, since it is multi-lingual.

Originally this program was written for human items, but it can be used for any valid instance. Pay attention that for most instances it might not make sense to replicate the language label untranslated, unless it is a person name (that is normally language independent).

Parameters:

	The parameters are a list of valid ISO 639 language codes.

        P1:	source language (mandatory)
        P2...:	list of target languages (can be replaced by -a; more codes can easily be added in the script)

	Validations:

		No default languages. One source and (at least one) target language is explicitly required.
		Target languages must always be different from the source language (languagues refering to themselves are ignored).
		Duplicate language codes are ignored.
		The Wikidata Query system runs on a replicated database (there is a delay from the live database).
		The transactions are validated at runtime to avoid conflicting or duplicate updates.
        Source aliases are merged with existing aliases.
        When the label is equal to any alias, conflicting aliases are removed.
        Aliases with unrecognized alphabeths are ignored.
        Label suffixes are removed (parentesis, comma)
		Pay attention that all language codes share the same alphabet.

	Internal setup:

		Additional parameters could be set in the script (e.g. language lists).

Qualifiers:

	-a	Process all default language targets (you must first set the source language)
	-c	Show generated code
	-d	Enable debug mode (show technical info)
    -e  Disable error statistics reporting
    -f  File containing a Wikidata query (SPARQL; not allowed together with -a)
	-h	Show help text (how to use this program; you see the same text when the command is wrong)
    -i  Item list file
	-l	Disable language labels (not allowed together with -w)
	-m	Fast mode (minimum maxlag wait; but this can increase the number of consecutive error transactions)
	-n	Disable the notability filter: items having a description, a Wikipedia article, Commons Category, etc.
	-p	Proceed after fatal error (process next item or language)
	-q	Set quiet mode (show minimum output; basically only errors)
    -s  Disable language aliases
	-t	Translation required (disable automatic English fallback)
	-v	Set verbose mode (show maximum output)
	-V	Show version (show the version of the program)
	-w	Disable sourcing from Wikipedia page names (not allowed together with -l)
	-x	Exclude target descriptions to avoid homonym conflicts (speeds up the processing)

	Qualifiers and parameters are processed in the order of occurrence in the command line.
    Some limitations and exclusivity can occur.
    Some flags have a cumulative effect.

Restrictions:

	The choosen source and target languages should all share the same alphabet (P282).
	The source language could better align with the country, e.g. France and French to avoid label inconsistencies.
	Some countries have multiple languages, so it can be pretty confusing... process them after the unique combinations.
	Some languages have a Wikipedia language code that is different from the Wikidata code (e.g. Norwegian).

Examples:

The log file (stdout) can be redirected.

	# Transfer labels from nl to fr:
	./missing_person_label.py nl fr

	# Transfer labels from nl to a list of languages:
	./missing_person_label.py nl en fr de es it pt da no fi sv cs hu pl

	# Transfer labels from nl to a predefined list of languages:
	./missing_person_label.py -n nl -a

	# Copy label from Wikipedia page or source label, but ignore English fallback:
	./missing_person_label.py -t xx -a

	# Disable Wikipedia and avoid conflicting homonyms:
	./missing_person_label.py -w -s en nl
    
    # Generate logfile
    ./missing_person_label.py en -p -a |tee ena.log
    
    # Run offline -> program continues to run after a session disconnect, or an "unsuccessful" language (a log file being created)
    nohup ./missing_person_label.py en -p -a > ena.log &

    # Move and merge Norwegian language labels, descriptions, aliases
    ./missing_person_label.py -p -n no -t -w nb

Return status:

    The following status is returned to the shell:

	0 Normal termination
	1 Ctrl-c pressed, program interrupted (multiple Ctrl-c are required when in language update mode)
	2 Invalid or missing source or target language (mandatory language pair)
	3 Halting program due to huge error count (less than 70% succes; use -p to proceed anyway)
	4 Unknown, or conflicting qualifiers -l -w
    5 SPARQL query is incompatible with -a flag
	9 Help requested (-h)

Author:

	Geert Van Pamel, 2020-08-03, CC BY-SA 4.0

Transaction errors:

	in/out:		Source and target language (transaction succeeded; only the label was updated)
    Alias:      Only the alias was updated/merged
    Both:       Both the label and the alias were updated
    Name:       Firstname and Lastname
	Noted:		Well-documented on Wikipedia (articles), Wikidata (statements) or Wikimedia Commons (media files)
	Pict:		Picture available (P18 property)
	Safe:		Avoid homonym conflicts (target description is filled in)
	Skip:		Duplicate update avoided (target label and alias were previously set, Wikidata query is possibly wrong)
	Stop:		Finish the current target language (ctrl-c pressed during update; continue with next language, if any)
	Trivia:		Unnotable (no language descriptions, nor Wikipedia article available, no picture, no important statements)
	Error:		A (general) error occurred (either data, network, or technical; see detailed error message)
    
    For no/nb the no labels are transferred (moved/merged) from "no" to "nb".

Documentation:

	https://www.wikidata.org/wiki/Wikidata:Wikidata_curricula/Activities/Pywikibot/Missing_label_in_target_language

	https://www.wikidata.org/wiki/Help:Contents
    https://www.wikidata.org/wiki/Help:Label
    https://www.wikidata.org/wiki/Help:Alias
    https://www.wikidata.org/wiki/Help:Description
    https://www.wikidata.org/wiki/Help:Multilingual

	https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help
	https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial
    https://www.wikidata.org/wiki/Special:PrefixIndex?prefix=Wikidata:Pywikibot_-_Python_3_Tutorial
    https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Big_Data
	https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Iterate_over_a_SPARQL_query

	https://www.mediawiki.org/wiki/Manual:Pywikibot
	https://www.mediawiki.org/wiki/Manual:Pywikibot/Wikidata
	https://www.mediawiki.org/wiki/Manual:Pywikibot/Global_Options
	https://www.mediawiki.org/wiki/Manual:Pywikibot/PAWS
	https://www.mediawiki.org/wiki/Manual:Pywikibot/user-config.py

	https://doc.wikimedia.org/pywikibot/stable/
    
    https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial

	https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes (language codes)

Algorithm:

	A Wikidata query is executed to get a list of items that have a missing label in the target language.
	Then each missing item label is replicated from the Wikipedia article name or the source label to the target language label.
	When a Wikipedia page exists in the target or source language, this value is taken (when allowed).
	If not, then the English page or label is used, as a fallback (when allowed).
    This program can be used to correct data errors. e.g. "no" labels.

Error handling:

	Exceptions are properly reported, and counted.
	For each target language detailed statistics are reported.
	Ctrl-c causes a return to the next higher callstack level (causing eventual program termination).
	The same query might run more than once at the same time; duplicate updates are reported and properly skipped.
	Transactions might be skipped when data is not available or has already been updated.
	Manual interventions (whenever Error) are required to correct any data exceptions, unless caused by external technical factors.
	Data errors generally require manual investigation and correction of the corresponding Wikidata items.
	Some technical exceptions do not depend on the data (and are related to network, transactions, and server load).
	Those transactions can be recovered by re-executing the same command, once again (no human intervention required).
	When most transactions for one language fail, the program stops (a serious problem occurred; possible internal logical error).
	At the end total transaction statistics are shown.

Components:

	Shell (Linux)
	Python
	Regular expressions
	pywikibot (prerequisite; to be installed on a private client, or PAWS)
	pagegenerators (part of pywikibot)
	Wikidata Query (SPARQL)
	Wikibase (database)
	PAWS Jupyter notebook (optional, no shell, no command line parameters)

Platforms:

	Most of the (Linux) environments have a shell and support parameters.

	Linux
	A (headless) Raspberry Pi with Raspbian (ideal for home use: low-powered, small size, permanently connected -> "piwikibot").
    On a Raspberri Pi 4 it uses around 5-10% CPU.
	Linux on an Oracle VirtualBox (Windows host; best of both worlds; better than Cygwin).

	PAWS Jupyter (shared environment; is the best solution when you do not own a Linux system)
	The PAWS Jupyter environment does not have a shell, so has no parameters; need to edit the script manually.

User interventions:

    (Plan Do Check Act)
	Think about important missing Wikidata labels
	Write a valid Wikidata Query to find those items
	Verify the list of items (via Wikidata Query)
	Transfer the query into the script
	Choose the source language and the list of target languages
	Run the script
	Monitor the progress
	Verify the results
	Correct any exceptions manually

Major data selection properties:

	P17:	Country (company)
	P27:	Nationality (human)
	P31:	Instance of (mandatory)
	P101:	Activity domain
	P106:	Profession

	Preferably include a Wikipedia linkcount condition (> 0 for notable items to avoid flooding the database)

Other properties for specific data selection:

	P19:	Birth place
	P20:	Death place
	P21:	Gender (to be avoided unless we increase women presence...)
	P39:	Position held
	P103:	Native language
	P282:	Alphabet
	P569:	Birth date
	P570:	Death date
	P571:	Creation date
	P937:	Work location (city)

To further limit the data replication row count:

    When the number of rows is too large, the query can return a timeout error. To return fewer rows to process to avoid a query timeout) the following techniques can be used:

	P18:	Picture
	P214:	VIAF ID
	P373:	Commmons category
    P800:   Notable work
	P856:	Web site
    P990:   Voice    
	P1442:	Grave picture
    P1472:  Commons creator
    P3919:  Contributed to

	Description (to be available, possibly in a specific language, to limit the number of items).
	Remove the subclass property suffix (wdt:P31/(wdt:Q329*)).
	Add a LIMIT statement with a low enough row count.

Environment variables:

    LANG:   System language

Transaction logging:

    Sequence number
    Timestamp
    Elapsed time
    Error/Status
    Item number
    Label
    Commons category
    Aliases between []
    Description

Execution speed:

	The query should never return more than 50000 items (both for query performance and to limit batch excution elapsed time).
	If necessary add a LIMIT statement, or filter more items.

	Typical execution speed is 10 items/min, about 14400 items/day (using a non-bot user account).
	You may obtain higher execution speeds with a bot account (1 item/s, about 86400 items/day).
	When the servers are loaded at full capacity, this might drop to 1 item/min, reducing the speed to only 1440 items/day.

	The tool dynamically adapts its execution speed to the load of the Wikidata server.
	When the elapased time per transaction increases (tps decreases), every consecutive error adds additional sleep time.
	The next successful transaction resets the additional delay back to 0. See the code for details.
	It does not help to run multiple instances for the same Bot account concurrently; the transaction rate will drop pro rata.

Transaction tuning:

	You can set the following global parameters in user-config.py to tune the transaction execution speed:

	    noisysleep = 60.0, to avoid the majority/all of the confusing sleep messages
        maxlag = 5, to avoid overloading the servers
	    put_throttle = 1, to speed up a bit without overloading the server (bot account required)
        max_retries = 4, avoid needless retry
        retry_wait = 30, allow minimum wait time
        retry_max = 320, allow maximum wait time

	It is not advised to use the -f flag to avoid a slowdown after a technical error, since it may cause multiple consecutive error transactions.
	When the servers are heavily loaded, it is better to wait longer after a technical error, to avoid multiple consecutive transaction errors.

Tips:

	Keep the SPARQL query as simple as possible, returning only the items you need to process.
	First try with one single language pair, and a limited number of items (try your query in Wikidata Query first).
	Always include the English language (for source or target) because it allows to easily exchange data amongst languages.
	Use your onwn language as source for all the others, to "export" your culture internationally.
	English can be considered as the central language for Wikidata.
	Every Wikidata item should at least have an English label and description.
	Filter for items that have at least one Wikipedia article, in any language (Notability; avoid flooding the database).
	You might also filter for items having a source language description, for the same reason.
	Do not use the n-flag, to require at least one language description, for the same reason.

Known problems:

	Wait until the reporting database is in synch with the live database, to avoid inconsistencies.
	Do not rerun the same command immediately again with the same languages.
	When no data is obtained, conflicting conditions might be resolved via Wikidata Query.
	To avoid a query timeout, try to select fewer data (extra where clauses and filters).
	To limit the number of rows add e.g. a LIMIT 1000, 5000, 10000, 15000, 20000, or 30000 statement.

	Human names are mostly language independent, unless another alphabet is used.
	Pay attention to non-human instances, their name is usually language dependent.
	Wikipedia target language pages can usually resolve the translation problem.
	Be careful with multi-lingual countries; labels could be replicated in the wrong language.

	Pay attention to homonyms, generally requiring manual intervention, to correct the conflicting descriptions.
 	It mighe be necessary to amend descriptions in case of label conflicts; apply a suffix (birth year-death year).
	Manually apply mutual P1889 (different from) to mark those conflicting items.
	Or you might need to merge (confirmed) duplicate items.

	WARNING: wikibase-form datatype is not supported yet.

	When running a Pywikibot script from a PAWS Jupyter OAuth notebook:

	* T168222/T252306 - For items having interwiki namespace pages:
		WARNING: API error mwoauth-invalid-authorization-invalid-user:
		The authorization headers in your request are for a user that does not exist here.
		See https://phabricator.wikimedia.org/T168222 and T252306 for more details.

Possible ameliorations:

    Know the number of items before starting the update loop.

Supporting tools:

	https://www.wikidata.org (Wikidata)
	https://query.wikidata.org (SPARQL)
	https://hub.paws.wmcloud.org (PAWS)

Example queries:

	There are different possibilities. Work out what is best for you. Be creative. Follow your domains of interest.

	Add missing hospital labels:

	SELECT DISTINCT ?item WHERE {
	 VALUES ?hascountry { wdt:P17 }
	 VALUES ?instance { wd:Q16917 }
	 VALUES ?country { wd:Q145 wd:Q30 }
	?item wdt:P31/wdt:P279* ?instance;
	?hascountry ?country;
	wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
	FILTER((LANG(?itemLabel)) = 'en') FILTER(?linkcount > 0) MINUS {
	 ?item rdfs:label ?label. FILTER((LANG(?label)) = 'nl') } }

	Add missing organisation labels: (no subclass wdt:P279* statement)

	SELECT DISTINCT ?item WHERE {
	 VALUES ?hascountry { wdt:P17 }
	 VALUES ?instance { wd:Q43229 }
	 VALUES ?country { wd:Q183 }
	?item wdt:P31 ?instance;
	?hascountry ?country; wdt:P571 ?createdate;
	wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
	FILTER((LANG(?itemLabel)) = 'de') FILTER(?linkcount > 0) MINUS {
	 ?item rdfs:label ?label. FILTER((LANG(?label)) = 'nl') } }

	Add missing politician labels: (include only a single profession)

	SELECT DISTINCT ?item WHERE {
	 VALUES ?hascountry { wdt:P27 }
	 VALUES ?instance { wd:Q5 }
	 VALUES ?country { wd:Q31 }
	 VALUES ?profession { wd:Q82955 }
	?item wdt:P31 ?instance;
	?hascountry ?country; wdt:P106/wdt:P279* ?profession;
	wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
	FILTER((LANG(?itemLabel)) = 'nl') FILTER(?linkcount > 0) MINUS {
	 ?item rdfs:label ?label. FILTER((LANG(?label)) = 'de') }

	Add missing activity domain labels: (add the subclass wdt:P279* statement)

	SELECT DISTINCT ?item WHERE {
	 VALUES ?hascountry { wdt:P27 }
	 VALUES ?instance { wd:Q5 }
	 VALUES ?country { wd:Q29999 }
	 VALUES ?domain { wd:Q184485 }
	?item wdt:P31 ?instance;
	?hascountry ?country; wdt:P101/wdt:P279* ?domain;
        wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
	FILTER((LANG(?itemLabel)) = 'nl') FILTER(?linkcount > 0) MINUS {
	 ?item rdfs:label ?label. FILTER((LANG(?label)) = 'it') } }

	Add missing lastname labels: (remark the LIMIT statement)

	SELECT DISTINCT ?item WHERE {
	 VALUES ?instance { wd:Q101352 }
	?item wdt:P31 ?instance; wdt:P282 wd:Q8229;
	schema:description ?itemDescription;
	wikibase:sitelinks ?linkcount; rdfs:label ?itemLabel.
	FILTER((LANG(?itemLabel)) = "nl") FILTER(?linkcount > 0)
	FILTER((LANG(?itemDescription)) = 'nl')
	MINUS { ?item wdt:P31 wd:Q4167410. } MINUS {
	?item rdfs:label ?label. FILTER((LANG(?label)) = "fr") } } LIMIT 1000

    Move Norway language labels:

    SELECT DISTINCT ?item WHERE {
    VALUES ?instance { wd:Q5 }
    VALUES ?hascountry { wdt:P27 }
    VALUES ?country { wd:Q38 }
    ?item wdt:P31 ?instance;
    ?hascountry ?country;
    rdfs:label ?itemLabel.
    FILTER((LANG(?itemLabel)) = 'no') }

    SELECT DISTINCT ?item WHERE {
     VALUES ?profession { wd:Q483501 }
    ?item wdt:P31 wd:Q5;
    wdt:P106/wdt:P279* ?profession;
    skos:altLabel ?itemAlias.
    FILTER((LANG(?itemAlias)) = 'no') }

	Add missing person labels:

    Add missing family names:

Other ideas:

	P373:	Commons category not having a language label
	P1472:	Commons institution, not having a Commons category
	P1612:	Commons creator, not having a Commons category
	P1889:	Duplicate labels: add P1889 for any items having the same language label (homonyms, any language)

Technical documentation:

	(technical configuration)

	https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_up_Shop
	https://www.wikidata.org/wiki/Wikidata:Creating_a_bot

	(only for programmers)

	https://docs.python.org (official documentation)
	https://docs.python.org/3/library/re.html
	https://docs.python.org/3/howto/sorting.html

	https://www.w3schools.com/python/ (very nice site)
    https://www.w3schools.com/python/python_datatypes.asp
	https://www.w3schools.com/python/python_lists.asp

    (caveat)
    
    https://stackoverflow.com/questions/47986224/list-assignment-in-python (list assignment by reference; take a copy!)
    
Configuration:

	user-config.py (not required for PAWS because OAuth is used)
	Adapt the Wikidata query in this source code

Design principles:

	Keep it simple, generic, structured, extendible, strict, clean, well documented, modular, understandable, following standards and guidelines, user friendly, stable, safe, efficient, error-resistent, and collaborative for our human volunteers. Let the system and the servers do most of the work without overloading them; the programmer is lazy.

Origin:

	A prototype (proof of concept) was first developed using Wikidata Query and QuickStatements,
	with a manual intervention to create the tranaction file with Excel, see
	https://www.wikidata.org/wiki/User:Geertivp/training/Wikidata_Query/Missing_label_in_target_language

	The current program is an MVP, see https://en.wikipedia.org/wiki/Minimum_viable_product, but it is still evolving.

	Can also run on PAWS (unsolved problem T168222/T252306 for items having interwiki namespace pages).

Database:

	Wikidata (Wikibase)

	Programming this tool requires quite a lot of insight in how an RDF database works, more specifically Wikidata, and SPARQL.

	It also requires decent knowledge of Python, Linux, and/or PAWS.

Automatic corrections:

    Aliases being equal to the label will be removed from source and target languages.
    
"""


# List the required modules
import os               # Operating system: getenv
import re		    	# Regular expressions (very handy!)
import sys		    	# System: argv, exit (get the parameters, terminate the program)
import time		    	# sleep
import pywikibot		# API interface to Wikidata
import urllib.parse     # URL encoding/decoding (e.g. Wikidata Query URL)
from datetime import datetime	# now, strftime, delta time, total_seconds
from pywikibot import pagegenerators as pg # Wikidata Query interface


def fatal_error(errcode, errtext):
    """
    A fatal error has occurred; we will print the error messaga, and exit with an error code
    """
    global exitstat
    
    exitstat = errcode
    print(errtext)
    if exitfatal:		# unless we ignore fatal errors
        sys.exit(errcode)
    else:
        print('Proceed after fatal error')


def wd_proc_all_items_for_lang(inlang, outlang):
    """
	Main logic: Executed for each of the target languages.
   
	This Wikidata Query gets the initial list of items, and adds the missing label.
	It is generated at runtime.
	Main parameters: inlang, outlang.

	Craft your query using the below template: (un)comment respective RSEs, add additional properties + filters.
	You should limit tne number of selected Q-numbers.
	Depending on the instance, additional query statements should be added.
	For e.g. Q31 human, nationality and either profession or activity domain should be included.
	Tip: prepare, verify, and troubleshoot the query using https://query.wikidata.org.
    """

    global exitstat
    global langcount
    global langdone
    global totranscount

    """
	Here you can design your own query by (un)commenting and changing property values.
	Only (DISTINCT) item is required/allowed in the results.
    The source and target language filter the resulting item list.
    """

# Can be overruled with wdquery
    querytxt = """# Search for Belgian citizens with a missing language label
SELECT DISTINCT ?item WHERE {

  VALUES ?instance {	# One single instance
#wd:Q166118	# archief
#wd:Q4830453	# bedrijf
#wd:Q101352	# last name
 wd:Q5		# mens
#wd:Q215380	# muziekgroep
#wd:Q43229	# organisatie
#wd:Q16917	# ziekenhuis
}
  VALUES ?hascountry {	# Choose one single property
#wdt:P17	# country
#wdt:P495	# land van oorsprong
 wdt:P27	# nationality
}
  VALUES ?country {	# Please process in sequence to avoid language conflicts
			# You can group countries that share the same source language
#wd:Q145	# UK
#wd:Q30		# USA
#wd:Q142	# Frankrijk
#wd:Q32		# Luxemburg
#wd:Q183	# Duitsland
 wd:Q39		# Zwitserland
#wd:Q40		# Oostenrijk
#wd:Q31		# Belgie
#wd:Q29999	# Nederland
#wd:Q38		# Italie
#wd:Q29		# Spanje
#wd:Q20     # Noorwegen
#wd:Q34     # Zweden
#wd:Q33     # Finland
#wd:Q35     # Denemarken
#wd:Q36     # Polen
#wd:Q298    # Chili
#wd:Q155    # Brazilie
#wd:Q928    # Filippijnen
#wd:Q15180  # Sovjetunie
#wd:Q129286 # Brits Indie
#wd:Q70972  # Koninkrijk Frankrijk
#wd:Q1747689# Oude Rome
#wd:Q11768  # Oud Egypte
#wd:Q179293 # Castilie
#wd:Q948    # Tunesie
#wd:Q668    # India
#wd:Q851    # Arabia
#wd:Q93180  # Seleudidische rijk
#wd:Q9683   # Tang dynastie
#wd:Q408    # Australie
#wd:Q172579 # Koninkrijk Italie
}
  VALUES ?profession {	# Multiple values are allowed
 wd:Q40348      # advocaat
#wd:Q12961474   # anarchist
#wd:Q11513337	# atleet
#wd:Q33999       # acteur
#wd:Q212238     # ambtenaar
#wd:Q482980	# auteur
#wd:Q1281618	# beeldhouwer
#wd:Q15214752   # cabaretier
#wd:Q5716684    # danser
#wd:Q193391     # diplomaat
#wd:Q39631	# dokter
#wd:Q188094	# economist
#wd:Q4964182   # filosoof
#wd:Q2526255    # filmregisseur
#wd:Q169470    # fysicus
#wd:Q33231	# fotograaf
#wd:Q81096	# ingenieur
#wd:Q1930187	# journalist
#wd:Q483501	# kunstenaar
#wd:Q1028181	# kunstschilder
#wd:Q47064      # militair
#wd:Q5322166	# ontwerper
#wd:Q82955	# politieker
#wd:Q42603      # priester
#wd:Q121594	# professor
#wd:Q16533      # rechter
#wd:Q1028181	# schilder
#wd:Q2066131	# sporter
#wd:Q186360     # verpleger
#wd:Q937857	# voetballer
#wd:Q901	# wetenschapper
#wd:Q2309784	# wielrenner
#wd:Q170790     # wiskundige
#wd:Q177220	# zanger
#wd:Q55645123   # zorgverstrekker
}
  VALUES ?domain {	# One single value
#wd:Q11023	# engineering
#wd:Q11190	# geneeskunde
#wd:Q21198	# informatica
#wd:Q15980804	# media
#wd:Q184485	# podiumkunsten
#wd:Q336	# wetenschap
#wd:Q395	# wiskunde
#wd:Q349	# sport
 wd:Q2695280    # techniek
}

 ?item wdt:P31 ?instance; # Main query options
#?item wdt:P31/wdt:P279* ?instance	# Much more rows; might impact the performance
 ?hascountry ?country;	# Filter on country
#wdt:P19 ?birthplace;
#wdt:P20 ?deathplace;
wdt:P39 [];
#wdt:P101/wdt:P279* ?domain;
#wdt:P103 ?nativelang;
#wdt:P106 ?profession;
#wdt:P106/wdt:P279* ?profession;
#wdt:P131 ?city wd:Q456544;
#wdt:P214 ?viafid;
#wdt:P282 wd:Q8229;	# Roman alphabet (e.g. last name)
#wdt:P373 ?commonscat;
#wdt:P569 ?birthdate;
#wdt:P570 ?deathdate;
#wdt:P571 ?createdate;
#wdt:P646 ?freebaseid;
#wdt:P734 ?lastname;
#wdt:P735 ?firstname;
#wdt:P1705 ?labelofflang;
 wikibase:sitelinks ?linkcount;
#schema:description ?itemDescription;
#skos:altLabel ?itemAlias;
 rdfs:label ?itemLabel.

#?article schema:about ?item;	# Copy the label from the target language Wikipedia page
#schema:inLanguage '""" + outlang + """';
#schema:isPartOf <https://""" + outlang + """.wikipedia.org/>.

 FILTER(?linkcount > 0)		# increasing notability (recommended)
 FILTER((LANG(?itemLabel)) = '""" + inlang + """')
#FILTER((LANG(?itemAlias)) = '""" + inlang + """')
#FILTER((LANG(?itemDescription)) = '""" + inlang + """') 
#FILTER(?birthdate >= "1920-01-01T00:00:00Z"^^xsd:dateTime)	# limit number of rows
#FILTER(?createdate >= "2000-01-01T00:00:00Z"^^xsd:dateTime)	# limit number of rows

MINUS { ?item wdt:P31 wd:Q4167410. } 	# Wikipedia redirect page
MINUS { ?item wdt:P31 wd:Q4167836. } 	# Wikimedia category
#MINUS { ?item wdt:P18 ?picture. } 	# Skip picture with PAWS, see T168222
# Search items with missing label for the target language (mandatory)
 MINUS { ?item rdfs:label ?label. FILTER((LANG(?label)) = '""" + outlang + """') }
}
LIMIT 50000	# take lower values to avoid any query timeout
# Possibly add more properties or filters
"""

# Verify that the target language is OK
    if outlang == inlang or outlang in langdone or not langre.search(outlang):	# Skip input language, duplicate and bad language codes (generated by PAWS first run)
        print('Skipping language %s' % outlang)
        return

# Print preferences
    if verbose or debug:
        print('\nUse labels:\t%s' % uselabels)
        print('Avoid homonym:\t%s' % safemode)
        print('Use aliases:\t%s' % usealias)
        print('Fallback on English:\t%s' % fallback)
        print('Use Wikipedia:\t%s' % wikipedia)
        print('Notability:\t%s' % notability)

        print('\nShow code:\t%s' % showcode)
        print('Verbose mode:\t%s' % verbose)
        print('Debug mode:\t%s' % debug)
        print('Exit on fatal error:\t%s' % exitfatal)
        print('Error wait factor:\t%d' % errwaitfactor)

    langcount += 1	    	# Process the next target language

    """
	Build the SPARQL query.
	You can paste the resulting code into Wikidata Query to troubleshoot.
	Tip: click on the diamond to reformat.
    """

    squery = querytxt       # Generate or overrule the query
    if wdquery != '':
        squery = wdquery
        
    squery = humsqlre.sub(' ', squery)	# Convert to human readable (remove comments)
    if verbose:
        print('\nQuery %d for %s %s:' % (langcount, inlang, outlang))
        print(squery)		# Show human readable formatted query (usefull to debug)

    squery = comsqlre.sub(' ', squery)	# Convert to computer readable (remove duplicate whitespace)
    if showcode or debug:
        print('\nSPARQL query for %s %s:' % (inlang, outlang))
        print(squery)		# Show computer readable query (usefull to debug)

# Execute the SPARQL query to get the item list
# The list could possibly be empty
    generator = pg.WikidataSPARQLPageGenerator(squery, site=wikidata_site)

# Loop initialisation
    transcount = 0	    	# Total transaction counter
    notecount = 0	    	# Notability count
    pictcount = 0	    	# Picture count
    unotecount = 0	    	# Notability problem
    safecount = 0	    	# Safe transaction
    dupcount = 0	    	# Duplicate counter
    errcount = 0	    	# Error counter
    errsleep = 0	    	# Technical error penalty (sleep delay in seconds)

# Avoid that the user is waiting for a response while the data is being queried
    if verbose:
        print('\nReplicating language labels from %s to %s' % (inlang, outlang))

# Set the Wikipedia instance
    inwiki = inlang + 'wiki'
    wpwiki = outlang + 'wiki'	# Set default Wikipedia sitecode
    if outlang == 'nb':     # nn has separate nnwiki
        wpwiki = 'nowiki'	# The Norwegian Wikipedia code is different from the Wikidata language code

# Transaction timing
    now = datetime.now()	# Start the main transaction timer
    status = 'Start'		# Force loop entry

# Process all items in the list
    for item in generator:	# Main loop for all DISTINCT items
    
      if status != 'Stop':	# Ctrl-c pressed -> stop this target language in a proper way

        if debug:
            print(item)     # Format: [[wikidata:Q...]]

# Transaction initialisation required for status reporting (mind the error handling)
        transcount += 1	# New transaction for this language pair
        status = inlang + '/' + outlang	# Default status OK, unless an error occurs
        label = ''		# Item labels
        alias = []		# No alias yet (remark the list)
        descr = ''		# No description yet
        wpart = ''		# No Wikipedia page yet
        commonscat = '' # Commons category

        try:			# Error trapping (prevents premature exit on transaction error)
            item.get()	# Get the data item (problem T168222/T252306 with interwiki namespace pages with PAWS OAuth)

            """
    Get the labels and aliases for the source language and English
            """

            inlabel = ''
            if inlang in item.labels:
                inlabel = item.labels[inlang]

            enlabel = ''
            if reflang in item.labels:
                enlabel = item.labels[reflang]

            """
    The target label is missing; this script will search the best one.
    Priority is given to Wikipedia page names (by language, target language first).
    Then the source language labels.
    Then the English Wikipedia, or label, as fall back - if allowed.
    
    Any suffixes are stripped from the label, as required as per the label guidelines.
            """
            
            if 'P735' in item.claims and 'P734' in item.claims: # Firstname and lastname
                label = item.claims['P735'][0].getTarget() + ' ' + item.claims['P734'][0].getTarget()
                status = 'Name'

            elif wikipedia and wpwiki in item.sitelinks:    # Get target sitelink
                sitelink = item.sitelinks[wpwiki] 
                linklabel = urlbre.search(str(sitelink))	# Output URL superseeds source label
                wpart = linklabel.group(0)
                label = wpart		                    	# Get Wikipedia page

            elif wikipedia and inwiki in item.sitelinks:	# Get source sitelink
                sitelink = item.sitelinks[inwiki] 
                linklabel = urlbre.search(str(sitelink))	# Input URL superseeds source label
                wpart = linklabel.group(0)
                label = wpart

            elif uselabels and inlabel != '':		        # Get source label, if no URL
                label = inlabel

            elif wikipedia and fallback and refwiki in item.sitelinks:	# Get English sitelink
                sitelink = item.sitelinks[refwiki] 
                linklabel = urlbre.search(str(sitelink))	# English URL superseeds source label
                wpart = linklabel.group(0)
                label = wpart

            elif uselabels and fallback and enlabel != '':
                label = enlabel		                    	# Default English label

            if label != '':
                baselabel = suffre.search(label)  	    	# Remove () suffix, if any (we do the same for a comma)
                if baselabel:
                    label = label[:baselabel.start()]   	# Get canonical form

            """
    Try to get any usefull alias; remark that a copy of the list is made (lists are assigned by reference).
            """
   
# Get the aliases; remark the list
            inalias = []
            if inlang in item.aliases:
                inalias = item.aliases[inlang]

            enalias = []
            if reflang in item.aliases:
                enalias = item.aliases[reflang]

            if outlang in item.aliases:
                alias = item.aliases[outlang]
            elif usealias and inalias != []:	# Get source alias
                alias = inalias[:]              # list datatype (take a copy!)
            elif usealias and fallback and enalias != []:	# Get source alias, if no URL
                alias = enalias[:]              # list datatype (take a copy!)

# Insert missing alias from label
            if uselabels and inlabel != '' and label != inlabel and not inlabel in alias:
                alias.insert(0, inlabel)
            elif uselabels and fallback and enlabel != '' and label != enlabel and not enlabel in alias:
                alias.insert(0, enlabel)

# Cleanup duplicate aliases and aliases in non-native alphabets
            if inlabel in inalias:
                inalias.remove(inlabel)         # Filter label (alias <> label)
            # (We do not touch enalias => shared resource)
            if label in alias:
                alias.remove(label)             # Filter label (alias <> label)
            for ai in alias:
                if not romanre.search(ai):      # Filter non-Roman strings
                    alias.remove(ai)

            """
    The user should see the description in any of the available languages.
    Get optional data before the update transaction.
    Perform this first because we need it in case of an update error.
    We take the target, source, or English description in that order.
            """

            indescr = ''
            if inlang in item.descriptions:
                indescr = item.descriptions[inlang]

            outdescr = ''
            if outlang in item.descriptions:
                outdescr = item.descriptions[outlang]
            if outdescr == '':
                outdescr = indescr

            if 'P18' in item.claims:
                status = 'Pict'
                pictcount += 1      # Possible duplicate counting...

            if outdescr != '':   	# Get item description in target language
                descr = outdescr
                if safemode:        # Avoid homonym label/description conflicts
                    status = 'Safe'
            elif indescr != '':  	# input item description
                descr = indescr
            elif mainlang in item.descriptions:	# Main language item description
                descr = item.descriptions[mainlang]
            elif 'nl' in item.descriptions:	# Dutch item description
                descr = item.descriptions['nl']
            elif 'fr' in item.descriptions:	# French item description
                descr = item.descriptions['fr']
            elif 'en' in item.descriptions:	# English item description
                descr = item.descriptions['en']
            elif 'de' in item.descriptions:	# Deutsch item description
                descr = item.descriptions['de']
            elif 'es' in item.descriptions:	# Spanish item description
                descr = item.descriptions['es']
            elif 'it' in item.descriptions:	# Italian item description
                descr = item.descriptions['it']
            elif wpart != '' or status in ['Pict'] or 'P373' in item.claims or 'P39' in item.claims or 'P214' in item.claims or 'P800' in item.claims or 'P856' in item.claims or 'P990' in item.claims or 'P1442' in item.claims or 'P1472' in item.claims or 'P3919' in item.claims:	# Avoid items that have no description, unless they have a Wikipedia page, picture, Commons category, function, VIAF, notable work, website, voice, grave picture, Commons creator, contributed to
                status = 'Noted'
                notecount += 1
            elif notability:
                status = 'Trivia'

            """
    For additional/optional information.
            """

# Get Commons category/creator
            if 'P373' in item.claims:       # Wikimedia Commons Category
                commonscat = item.claims['P373'][0].getTarget()
            elif 'P1472' in item.claims:    # Wikimedia Commons Creator
                commonscat = item.claims['P1472'][0].getTarget()

            """
    This is the main functionality. We have gathered all required data now.
    
    Now we verify which type of update can safely be performed, without overwriting any exiting label/alias values.
    Note that there can be a replication delay betwen the live and the reporting database, so you should take care of unexpected transactions (possibly caused by concurrent users).
    
    We remove duplicate labels from source aliases. We do not add duplicate labels to the target language aliases.
    
    errsleep must only be reset when the update transaction was successful.
    
    We prefer editEntity because it is generic (and avoid the specific editLabels and editAliases, because that would have been too complex).
            """

            if status == 'Trivia':	        # Not notable - do not replicate
                unotecount += 1
            elif status == 'Safe':	        # Avoid homonym naming/description conflicts
                safecount += 1
            elif outlang in item.labels and item.labels[outlang] != label or label == '':    # Target label was already updated by other user
                if label != '' and not label in alias:  # Add label in alias
                    alias.append(label)
                if item.labels[outlang] in alias:       # Remove duplicate alias
                    alias.remove(item.labels[outlang])

                if inlang == 'no' and outlang in ['nb', 'nn']:  # Remove language labels
                    for ai in inalias:                          # Merge aliases
                        if not ai in alias:
                            alias.append(ai)
                    item.editEntity( {'labels': {inlang: ''}, 'aliases': {outlang: alias, inlang: []}, 'descriptions': {outlang: outdescr, inlang: ''}}, summary=transcmt )  # Update only the alias (rare)
                    status = 'Alias'
                    errsleep = 0        	# Successful transaction -> Reset any pending penalty wait
                elif not outlang in item.aliases and alias != []:
                    item.editAliases(aliases={outlang: alias, inlang: inalias}, summary=transcmt )	# Update only the alias
                    status = 'Alias'
                    errsleep = 0	        # Successful transaction -> Reset any pending penalty wait
                else:
                    status = 'Skip'     	# Both label and alias already set (nothing to do here)
                    dupcount += 1

            elif outlang in item.aliases or alias == []:	    # Update only the label
                if inlang == 'no' and outlang in ['nb', 'nn']:  # Remove language labels
                    for ai in inalias:                          # Merge aliases
                        if not ai in alias:
                            alias.append(ai)
                    item.editEntity( {'labels': {outlang: label, inlang: ''}, 'aliases': {outlang: alias, inlang: []}, 'descriptions': {outlang: outdescr, inlang: ''}}, summary=transcmt )		     # Move the label from inlang to outlang
                else:
                    item.editEntity( {'labels': {outlang: label}, 'aliases': {inlang: inalias}}, summary=transcmt )
                errsleep = 0		# Successful transaction -> Reset any pending penalty wait

            else:		        	# Update both the target item language label and alias
                if inlang == 'no' and outlang in ['nb', 'nn']:  # Remove language labels
                    item.editEntity( {'labels': {outlang: label, inlang: ''}, 'aliases': {outlang: alias, inlang: []}, 'descriptions': {outlang: outdescr, inlang: ''}}, summary=transcmt )
                else:
                    item.editEntity( {'labels': {outlang: label}, 'aliases': {outlang: alias, inlang: inalias}}, summary=transcmt )
                status = 'Both'
                errsleep = 0		# Successful transaction -> Reset any pending penalty wait

            """            
    Error handling section
            """
        
        except KeyboardInterrupt:
            status = 'Stop'	# Ctrl-c trap; process next language, if any
            dupcount += 1   # Press a second time Ctrl-c to stop the script completely
            exitstat = 1

        except:             # Attempt error recovery
            if exitfatal:           # Stop on first error
                raise
            status = 'Error'	    # Handle any generic error
            errcount += 1
            deltasecs = int((datetime.now() - now).total_seconds())	# Calculate technical error penalty
            if deltasecs >= 30: 	# Technical error; for transactional errors there is no wait time increase
                errsleep += errwaitfactor * min(maxdelay, deltasecs)
                # Technical errors get additional penalty wait
				# Consecutive technical errors accumulate the wait time, until the first successful transaction
				# We limit the delay to a multitude of maxdelay seconds
            if errsleep > 0:    	# Allow the servers to catch up; slowdown the transaction rate
                print('%d seconds maxlag wait' % errsleep)
                time.sleep(errsleep)

        """
    The transaction was either executed correctly, or an error occurred.
    Possibly already a system error message was issued.
    We will report the results here, as much as we can, one line per item.
        """

# Get the elapsed time in seconds and the timestamp in string format
        prevnow = now	        	# Transaction status reporting
        now = datetime.now()	    # Refresh the timestamp to time the following transaction

        if verbose or status in ['Error', 'Stop']:		# Print transaction results
            isotime = now.strftime("%Y-%m-%d %H:%M:%S") # Only needed to format output
            totsecs = (now - prevnow).total_seconds()	# Elapsed time for this transaction
            print('%d\t%s\t%f\t%s\t%s\t%s\t%s\t%s\t%s' % (transcount, isotime, totsecs, status, item.getID(), label, commonscat, alias, descr))

    """
	One language has been completed.
	Print the transaction success rate for the current target language.
	When the success rate is < 100% the reason of the errors should be investigated.
    """

# Report aggregate statistics for this language pair
    suctranscount = transcount - unotecount - safecount - dupcount - errcount
    totranscount += suctranscount

    if transcount > 0:            # Avoid division by zero error
        sucrate = 100.0 * suctranscount / transcount
        if errorstat:
            print('\nStatistics for language %s %s:\n%d skipped\n%d unsafe\n%d errors\n%d unnotable\n%d notable\n%d pictures\n%d done\n%d transactions\n%f%% successful' % ( inlang, outlang, dupcount, safecount, errcount, unotecount, notecount, pictcount, suctranscount, transcount, sucrate) )
        if sucrate < minsucrate:
            fatal_error(3, 'Halting program due to huge error count')	# Can be overruled with -p
    else:
        print('No transactions for language %s' % outlang)

# Prevent that the same target language is executed repeatedly (avoid duplicate transactions)
    langdone.append(outlang)
    if debug:
        print('Languages done: %s' % sorted(langdone))


def wd_proc_all_languages():
    """
	Process a standard list of languages.
	You need to always give one mandatory source language as P1.
    Not compatible with a SPARQL query.
    """

    if not langre.search(inlang):	# Source language required (fatal error)
        fatal_error(2, 'Input language missing or invalid; use -h for help')
        sys.exit(2)		# Required parameter, must stop

    if wdquery != '':
        fatal_error(5, 'SPARQL query incompatible with -a flag; use -h for help')
        sys.exit(5)		# Fatal error, must stop

    if debug:
        print('Process target languages %s' % listlang)

    for outlang in listlang:
        wd_proc_all_items_for_lang(inlang, outlang)	# Execute all items for one language


def show_help_text():
# Show program help and exit (only show head text)
    helptxt = re.search(r'^(.*\n)+\nDocumentation:\n\n.+\n', codedoc)
    if helptxt:
        print(helptxt.group(0))	# Show helptext
    sys.exit(9)         # Must stop


def show_prog_version():
# Show program version
    print('%s version %s' % (modnm, pgmid))


def get_next_param():
    """
    Get the next command parameter, and handle any qualifiers
    """

    global showcode
    global debug
    global errwaitfactor
    global exitfatal
    global fallback
    global notability
    global safemode
    global usealias
    global uselabels
    global verbose
    global wikipedia
    global wdquery

    cpar = sys.argv.pop(0)	    # Get next command parameter
    if debug:
        print('Parameter %s' % cpar)

    if cpar.startswith('-a'):	# all languages
        wd_proc_all_languages()
    elif cpar.startswith('-c'):	# code check
        showcode = True
        print('Show generated code')
    elif cpar.startswith('-d'):	# debug mode
        debug = True
        print('Setting debug mode')
    elif cpar.startswith('-e'):	# error stat
        errorstat = False
        print('Disable error statistics')
    elif cpar.startswith('-f'):	# file Wikidata query
        fname = sys.argv.pop(0)
        print('Reading Wikidata query from "%s"' % fname)
        with open(fname, 'r') as query_file:
            wdquery = query_file.read()
            wdquery = urllib.parse.unquote(wdquery) # Decode URL
        if showcode:
            print(wdquery)
    elif cpar.startswith('-h'):	# help
        show_help_text()
    elif cpar.startswith('-l'):	# language labels
        if not wikipedia:
            fatal_error(4, 'Conflicting qualifiers -l -w')
        uselabels = False
        print('Disable label reuse')
    elif cpar.startswith('-m'):	# fast mode
        errwaitfactor = 1
        print('Setting fast mode')
    elif cpar.startswith('-n'):	# notability
        notability = False
        print('Disable notability mode')
    elif cpar.startswith('-p'):	# proceed after fatal error
        exitfatal = False 
        print('Setting proceed after fatal error')
    elif cpar.startswith('-q'):	# quiet mode
        verbose = False 
        print('Setting quiet mode')
    elif cpar.startswith('-s'):	# alias (synonym) usage
        usealias = False
        print('Disable alias reuse')
    elif cpar.startswith('-t'):	# translation required (no English)
        fallback = False
        print('Disable translation fallback')
    elif cpar.startswith('-v'):	# verbose mode
        verbose = True
        print('Setting verbose mode')
    elif cpar.startswith('-V'):	# Version
        show_prog_version()
    elif cpar.startswith('-w'):	# disallow Wikipedia
        if not uselabels:
            fatal_error(4, 'Conflicting qualifiers -l -w')
        wikipedia = False
        print('Disable Wikipedia reuse')
    elif cpar.startswith('-x'):	# safe mode
        safemode = True
        print('Setting safe mode')
    elif cpar.startswith('-'):	# unrecognized qualifier (fatal error)
        fatal_error(4, 'Unrecognized qualifier; use -h for help')
    return cpar		# Return the parameter or the qualifier to the caller


# Main program entry
# First identify the program
if verbose:
    show_prog_version()	    	# Print the module name

try:
    pgmnm = sys.argv.pop(0)	    # Get the name of the executable
    if debug:
        print('%s version %s' % (pgmnm, pgmid)) # Physical program
except:
    shell = False
    print('No shell available')	# Most probably running on PAWS Jupyter

"""
    Start main program logic
    Precompile the Regular expressions, once (for efficiency reasons; they will be used in loops)
"""

humsqlre = re.compile(r'\s*#.*\n')          # Human readable query, remove all comments including LF
comsqlre = re.compile(r'\s+')		        # Computer readable query, remove duplicate whitespace
urlbre = re.compile(r'[^\[\]]+')	        # Remove URL square brackets (keep the article page name)
suffre = re.compile(r'\s*[(,]')		        # Remove () and , suffix (keep only the base label)
langre = re.compile(r'^[a-z]{2,3}$')        # Verify for valid ISO 639-1 language codes
romanre = re.compile(r'^[a-z ."\'åáàâäãæčçéèêëēíìîïñóòôöőðøšßúùûüž-]{2,}$', flags=re.IGNORECASE)  # Roman alphabet

totranscount = 0	          	# Total successful transactions
langcount = 0		         	# Count of languages processed

# Global parameters
mainlang = os.getenv('LANG', 'nl')[:2]     # Default description language
if verbose or debug:
    print('Main language:\t%s' % mainlang)
    print('Maximum delay:\t%d s' % maxdelay)
    print('Minimum success rate:\t%f%%' % minsucrate)

# Get the source language
inlang = '-'		    	    # Adapt and hardcode the target language code when no shell

while len(sys.argv) > 0 and inlang.startswith('-'):	# Get first non-qualifier
    inlang = get_next_param().lower()	# get P1 = Source language (mandatory parameter)

if not langre.search(inlang):	# Source language required (fatal error)
    fatal_error(2, 'Input language missing or invalid; use -h for help')
    sys.exit(2)	            	# Required parameter, must stop

# Connect to database
transcmt = 'Pwb copy ' + inlang + ' label'	    	    # Wikidata transaction comment
wikidata_site = pywikibot.Site("wikidata", "wikidata")  # Login to Wikibase instance

# Main loop for list of target languages
while len(sys.argv) > 0:

# Get every ISO 639 language code in lowercase
    outlang = get_next_param().lower()	# Loop for all target languages (mandatory parameter)
    if not outlang.startswith('-'): 	# Skip qualifiers
        wd_proc_all_items_for_lang(inlang, outlang)	# Execute all items for one language

# PAWS environment
if not shell:
    wd_proc_all_languages()	    # (un)comment to process all target languages when no shell
#   wd_proc_all_items_for_lang(inlang, 'nl')	# (un)comment and hardcode one target language when no shell

"""
    Print all sitelinks (base addresses)
    PAWS is using tokens (passwords can't be used because Python scripts are public)
    Shell is using passwords (from user-password.py file)
"""
if debug:
    for site in sorted(pywikibot._sites.values()):
        print(site, site.username(), site.is_oauth_token_available(), site.logged_in())

if langcount > 0:
    print('\n%d languages processed for %s\n%d total successful transactions' % (langcount, inlang, totranscount) )	# All OK
else:
    fatal_error(2, 'Input and/or output language missing; use -h for help')

sys.exit(exitstat)
Wikidata:Wikidata curricula/Activities/Pywikibot/Missing label in target language/code

Navigation menu

Search