Wikidata:Pywikibot - Python 3 Tutorial/Templates, Generators, Tables

From Wikidata
Jump to navigation Jump to search

This tutorial shows how you can iterate over template usage on Wikidata using a generator function. The generator itself was already explained in Wikidata:Pywikibot - Python 3 Tutorial/Big Data, where we iterated over pages that were using a specific infobox on Wikipedia. In this example we will do the same for a template used on Wikidata (Template:Constraint:Units). We will gather the usage and then write a table to a page on Wikidata.

First of all we need to write the generator. From what you learned from the previous chapters this shouldn't be too difficult:

import pywikibot
from pywikibot import pagegenerators as pg

def list_template_usage(site_obj, tmpl_name):
    """
    Takes Site object and template name and returns a generator.

    The function expects a Site object (pywikibot.Site()) and
    a template name (String). It creates a list of all
    pages using that template and returns them as a generator.
    It only returns pages in the 121-namespace (property-talk-pages).
    The generator will load 50 pages at a time for iteration.
    """
    name = "{}:{}".format(site.namespace(10), tmpl_name)
    tmpl_page = pywikibot.Page(site, name)
    ref_gen = pg.ReferringPageGenerator(tmpl_page, onlyTemplateInclusion=True)
    filter_gen = pg.NamespaceFilterPageGenerator(ref_gen, namespaces=[121])
    generator = site.preloadpages(filter_gen, pageprops=True)
    return generator

template_name = "Constraint:Units"
site = pywikibot.Site("wikidata", 'wikidata')
tmpl_gen = list_template_usage(site, template_name)

In this example we use the namespace=121 which corresponds to property talk pages (You can use this link to find the namespaces on any Wikimedia site: https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces).

Now that we have stored the generator in a variable, we can pass it to another function that iterates over the returned pages, find the templates and stores the data we are interested in:

def gather_template_usage(generator, template_name, header=None):
    """
    Takes a generator and a template name and returns usage list.

    The function can also take a header (list of strings) that will be
    the headers of the table (Needs to be the same dimension as the table).
    The first column needs to be a link to property (It will be made into
    a link. In this example the second column is a list of links to Q-items.
    """
    tmpl_usage = []
    if header != None:
        tmpl_usage.append(header)

    for page in generator:
        page_str = page.get()
        tmpl_list = pywikibot.textlib.extract_templates_and_params(page_str)

        for tmpl in tmpl_list:
            if template_name in tmpl:
                page_title = page.title().split(":")[1]
                property_link = "{{{{P|{}}}}}".format(page_title)
                tmpl_usage.append([property_link, tmpl[1]["list"]])
    return tmpl_usage

template_name = "Constraint:Units"
header = ["Property", "Constraint:Units"]
tmpl_usage = gather_template_usage(tmpl_gen, template_name, header)

As you can see we supply the second function, with the template we are looking for and with a header, which will be the header-row of the wikitable we will generate. The first for-loop iterates over all returned pages, while the second for-loop iterates over the templates. It finds the "Constraint:Unit" template and appends it to the list together with the title of the page (which is reformatted to be a instance of (P31) link). Note that "{{" means that the first bracket is escaped and only the second one is present in the string. This is a requirement for the .format() usage.

Next we have to write a function that generates a wikitable. Pywikibot currently doesn't include any convenience-function for that. The following function will create a standard wikitable. It will choose the dimensions from the list it receives:

def create_table_string(data, highlight=(True, False, False, False),
                        table_class='wikitable', style=''):
    """
    Takes a list and returns a wikitable.

    @param data: The list that is converted to a wikitable.
    @type data: List (Nested)
    @param highlight: Tuple of rows and columns that should be highlighted.
                      (first row, last row, left column, right column)
    @type highlight: Tuple
    @param table_class: A string containing the class description.
                        See wikitable help.
    @type table_class: String
    @param style: A string containing the style description.
                  See wikitable help.
    @type style: String
    """
    last_row = len(data) - 1
    last_cell = len(data[0]) - 1

    table = '{{| class="{}" style="{}"\n'.format(table_class, style)
    for key, row in enumerate(data):
        if key == 0 and highlight[0] or key == last_row and highlight[1]:
            row_string = '|-\n! ' + '\n! '.join(cell for cell in row)
        else:
            row_string = '|-'
            cells = ''
            for ckey, cell in enumerate(row):
                if ckey == 0 and highlight[2]:
                    cells += '\n! ' + cell
                elif ckey == last_cell and highlight[3]:
                    cells += '\n! ' + cell
                else:
                    cells += '\n| ' + cell
            row_string += cells

        table += row_string + '\n'
    table += '|}'
    return table

table = create_table_string(tmpl_usage)

Finally we can write the page, by calling the page we would like to update (actually overwrite) and calling the page.save function. The text goes into the text= keyword argument (goes into the **kwargs of the page.save function).

Whole Example[edit]

The whole code should look like this. If you run it, you will update this page Wikidata:Database reports/List of properties with Constraint:Unit templates and their values.

import pywikibot
from pywikibot import pagegenerators as pg

def list_template_usage(site_obj, tmpl_name):
    """
    Takes Site object and template name and returns a generator.

    The function expects a Site object (pywikibot.Site()) and
    a template name (String). It creates a list of all
    pages using that template and returns them as a generator.
    It only returns pages in the 121-namespace (property-talk-pages).
    The generator will load 50 pages at a time for iteration.
    """
    name = "{}:{}".format(site.namespace(10), tmpl_name)
    tmpl_page = pywikibot.Page(site, name)
    ref_gen = pg.ReferringPageGenerator(tmpl_page, onlyTemplateInclusion=True)
    filter_gen = pg.NamespaceFilterPageGenerator(ref_gen, namespaces=[121])
    generator = site.preloadpages(filter_gen, pageprops=True)
    return generator

def gather_template_usage(generator, template_name, header=None):
    """
    Takes a generator and a template name and returns usage list.

    The function can also take a header (list of strings) that will be
    the headers of the table (Needs to be the same dimension as the table).
    The first column needs to be a link to property (It will be made into
    a link. In this example the second column is a list of links to Q-items.
    """
    tmpl_usage = []
    if header != None:
        tmpl_usage.append(header)

    for page in generator:
        page_str = page.get()
        tmpl_list = pywikibot.textlib.extract_templates_and_params(page_str)

        for tmpl in tmpl_list:
            if template_name in tmpl:
                page_title = page.title().split(":")[1]
                property_link = "{{{{P|{}}}}}".format(page_title)
                tmpl_usage.append([property_link, tmpl[1]["list"]])
    return tmpl_usage

def create_table_string(data, highlight=(True, False, False, False),
                        table_class='wikitable', style=''):
    """
    Takes a list and returns a wikitable.

    @param data: The list that is converted to a wikitable.
    @type data: List (Nested)
    @param highlight: Tuple of rows and columns that should be highlighted.
                      (first row, last row, left column, right column)
    @type highlight: Tuple
    @param table_class: A string containing the class description.
                        See wikitable help.
    @type table_class: String
    @param style: A string containing the style description.
                  See wikitable help.
    @type style: String
    """
    last_row = len(data) - 1
    last_cell = len(data[0]) - 1

    table = '{{| class="{}" style="{}"\n'.format(table_class, style)
    for key, row in enumerate(data):
        if key == 0 and highlight[0] or key == last_row and highlight[1]:
            row_string = '|-\n! ' + '\n! '.join(cell for cell in row)
        else:
            row_string = '|-'
            cells = ''
            for ckey, cell in enumerate(row):
                if ckey == 0 and highlight[2]:
                    cells += '\n! ' + cell
                elif ckey == last_cell and highlight[3]:
                    cells += '\n! ' + cell
                else:
                    cells += '\n| ' + cell
            row_string += cells

        table += row_string + '\n'
    table += '|}'
    return table


site = pywikibot.Site("wikidata", 'wikidata')

template_name = "Constraint:Units"
tmpl_gen = list_template_usage(site, template_name)

header = ["Property", "Constraint:Units"]
tmpl_usage = gather_template_usage(tmpl_gen, template_name, header)

table = create_table_string(tmpl_usage)

write_page = pywikibot.Page(site, "Wikidata:Database reports/List of properties with Constraint:Unit templates and their values")
write_page.save(summary="Updating Table", watch=None, minor=False,
                botflag=False, force=False, async=False, callback=None,
                apply_cosmetic_changes=None, text=table)