User:Geertivp/training/QuickStatements

From Wikidata
Jump to navigation Jump to search

With QuickStatements you can automatically edit Wikidata in batch mode.

  • You can prepare a (large) transaction file using Excel, LibreOffice Calc, or OpenRefine
  • Possibly from a Wikidata Query
  • Then execute transactions via https://quickstatements.toolforge.org (copy/paste; format V1 or V2)
  • You can create items, and amend statements
  • Support for OpenRefine

Transactions

[edit]

You can either generate transactions via:

  • Wikidata Query
  • OpenRefine
  • CLI tools
  • any tool generating a list of items, and statements (property/value pairs)

or by using other sources/tools (or manual input ⇒ but then you might use the Wikidata application directly, unless you want to create a lot new items?).

After loading the input file, you can choose between online (slow with more control), or offline (batch) mode (faster because no roundtrip delays).

Formats

[edit]

There exist two formats:

  • V1: TAB file with one statement per line
  • V2: CSV file with pivot statements (qid in first column, P-numbers in the header, Q-numbers or values in the cells); the loading process translates the pivot into linear transactions, one statement after the other.

V2 string formats

[edit]

Double quotes (when using Excel input files you must double all double quotes and add one extra " before and after the string value.

  • "single double quotes" for Lxx, Dxx, and Axx
  • "xx:""triple double quotes""" for language dependent properties
  • """triple double quotes""" for other strings

Techniques

[edit]
  • Run a Wikidata query
  • Detect and add missing data
  • Detect and correct wrong data
  • Detect and resolve constraints
  • Generate a transaction file
  • Verify the transaction file
  • Import the transaction file
  • Run QuickStatements (interactive or batch)
  • Review errors
  • Correct errors manually
  • Amend any remaining problems manually

Authentication

[edit]
  • To use QuickStatements the user account needs to be autoconfirmed.
  • You need WiDaR to authorize your QuickStatements session (Wikimedia account)
  • Transactions are logged under your userID (username is visible in the application)
  • You are responsible for any messing up, and eventual cleaning-up, of Wikidata

Attention points

[edit]

Caveat

[edit]
  • First try with one example; verify the results before executing 1000s of transactions
  • You can pause, stop, and resume the script
  • The order of execution in QuickStatements is extremely important since for every language the combination Qid/Lxx/Dxx must be unique

Error handling/post processing

[edit]
  • Any left-over errors/inconsistencies/conflicts you should handle manually via the interactive Wikidata editing tool (verify the history of transactions)
  • If you have only a few transactions, you might use (only) the standard Wikidata edit functionality instead of the tool
  • You might better use OpenRefine for better control on selective execution

Pitfalls

[edit]
  • This is a (very) dangerous tool - you are responsible to correct any errors caused by your batch transactions
  • Take care; avoid mistakes; double verify your transactions
    • Pay attention to proper use of Properties
    • Do not create duplicate items/statements
  • Activities are logged on your account
  • Run one command of the list, then interrupt, verify, and resume if OK
  • Be prepared for negative feedback
  • When creating a new subject, have at least the "is an" and other statements; otherwise it is considered to be an empty item; risk for subsequent deletion
  • When creating items, you could better do it interactively, then you could immediately amend the new Qnumbers
  • Better run in batch mode: faster, and you do not have to keep your laptop connected to the network

Versions

[edit]

There exist a new version of the tool; see Q29032512. This version is more easy to use, it allows for CSV import, and allows deleting statements by prefixing them with a "-".

Known problems

[edit]

You can better use V2 of the applicaton instead of the obsolete V1.

For version 1:

  • Click away the HOWTO to see the log file
  • Use the Lxx and Dxx separately (otherwise only the first operation is executed...)
  • Screen logging does not scroll down automatically -- Use Ctrl-End to see the current transaction
  • Network problems could stop the processing; when the network connection is established again only process the rest of the file

For version 2:

  • The labels are in English only (not translated into user language)
  • TAB is not automatically converted to a comma for CSV input format (although this should be transparent) ⇒ use notepad to change TAB to comma

For both versions:

  • Some edits might result in an (constraint) error
  • Some manual corrections might be needed
  • Wikidata Query runs on a replica of the live database, so can be a couple of minutes behind the live update of Wikidata edits/QuickStatements (to verify your resules with Wikidata Query you might wait up-to 5 minutes). Verify with "View history" to be sure.

You should see "All done!" at the end.

  • Under certain circumstances LibreOffice Calc is generating “” instead of "" which is causing verbatim “” inserts in Wikidata text colums; please be very careful...

Invalid entity ID

[edit]

You must use uppercase for the commands, otherwise you get an error:

Q17277055	Dnl	"kerkgebouw in Aarlen, België"
Q17277055	Lfr	"Église du Sacré-Cœur d'Arlon"
Processing Q3581386 (Q3581386 dnl "kerkgebouw in Aarlen, België")
ERROR (set_string) : Invalid entity ID.
Processing Q17277055 (Q17277055 lfr "Église du Sacré-Cœur d'Arlon")
ERROR (set_string) : Invalid entity ID.

Duplicate Description

[edit]

The combination Qid/Lxx/Dxx must be unique. When assigning a Description you might get a duplicate key.

Processing Q27959405 (Q27959405 Dfr "édifice religieux belge")
ERROR (set_desc) : Item Q22668173 already has label "église Sint-Martinus de Zaventem" associated with language code fr, using the same description text.
  • Correct one of the Labels, or Descriptions
  • Merge the 2 Qid when they are linked to the same Item

Replication delay with Query

[edit]

Query is running on a replicated database. After updating the live database with QuickStatements, it can take minutes before your updates are visible in Query.

Do not reexecute your updates => duplicate updates.

Excel drops leading zero's

[edit]

Pay attention when loading a TSV or CSV file into Excel. Reinstate the leading zeros as necessary.

Geographic coordinates do not load properly

[edit]

Reason?

Empty cells are not skipped

[edit]

You must execute your transactions in separate batches... unhandy...

False errors

[edit]

You can ignore the following error:

No success flag set in API result
The transaction was executed, but the status was lost. Do not re-execute the statement, because it got executed anyway.

Support

[edit]

See also

[edit]
[edit]