Shortcuts: WD:LOD, w.wiki/87CA
Wikipodatci:Radni sljed povezanih otvorenih podataka
There are many considerations when contributing data, media or other assets to Wikimedia projects. This chart attempts to list the tools and scripts in the linked data workflow, which is especially useful to GLAM institutions. This is based on the data and media partnerships chart on Outreach Wiki. Public shortcut to this page: w.wiki/87CA
PREPARE and normalize source data and media | RECONCILE with Wikimedia modeling and coverage | INGEST data, media, and free content | ANALYZE, correct, and enrich | RE-USE content intra-wiki and externally | REPORT and measure impact |
---|---|---|---|---|---|
|
|
|
|||
| |||||
NotesTry finding a similar project or collection set on Wikidata or Commons to see how it has been done in the past.
Ask questions at the main Project chat on Wikidata or the Village Pump on Commons.
Those donating content should ensure assets are released under a free license or that copyright has expired. An easy way to prepare images for Commons is to upload collections to Flickr and set the proper license for the images (CC0, CC-BY, CC-BY-SA). Do not use non-commercial (NC) licenses. Wikidata uses a CC0 license: any contributed data must be dedicated as CC0 or public domain. |
NotesFor Wikidata, usually a "crosswalk database" is needed to map terms from the uploading data set (a CSV file or records from an API) to Wikidata terms. This can be achieved with OpenRefine, a custom mapping using Google Spreadsheets, or both.
Check to see what entities and properties already exist in Wikidata and what categories and templates are used for Commons. Find out how items are modeled in Wikidata, in order to set the proper "instance of" (P31) and "subclass of" (P279) properties for new items. Need case study here. |
NotesTry uploading small test batches before doing large data sets. When ingesting collection metadata and media files to Wikidata and Commons, you need a way to make sure they are correlated. Inventory or accession number (P217) is often used for objects, with a qualifier for collection (P195) and institution. A Commons best practice for filenames is to incorporate the institution/source, inventory number and possibly a descriptive title.
Putting P217 in a Wikidata item description may help distinguish item names that are very similar (eg. Untitled, or Still Life with Flowers) Need case study here. |
NotesDepending on the success of the import and uploading process, you may need to deal with duplicates or conflicts with other editors.
For Commons, you may need to move files around or add additional categories.
You may want to create special custom maintenance queries to keep track of your contributed content over time, or to keep adding properties and metadata beyond the initial contribution. |
NotesScripts and templates can generate on-wiki content such as tables and infoboxes from Wikidata.
If identifiers/authority control records have been imported, then Wikidata can act as a crosswalk database to explore mappings among many different databases. |
NotesShow the impact of contributions by tracking metrics on files used or impressions over time. For partnerships, this can help validate the work being done or to encourage more collaboration.
Some tools are on-demand (GLAMorgan) and some are regularly reported based on Commons categories of GLAM institutions.
You may also want to use Wikidata Query to make some custom reports on coverage or usage. |
Tools and scripts
Convert PDF files to structured data. If your source data is not well formatted, try a scraping tool like [<tvar name=1">https://tabula.technology/</tvar> Tabula] |
Tools and scriptsOpenRefine video tutorial from GLAM WIKI 2018 conference with Sandra Fauconnier |
Tools and scripts
Pattypan is the most popular way to do batch media uploads using a spreadsheet to gather needed metadata for each file. Find the correct template for artwork, photos or other media and identify the proper categories for organizing files.
Quickstatements takes spreadsheet generated CSV directives to create Wikidata statements.
The Mediawiki API provides a programming environment with Python (PyWikibot or PAWS) to do advanced work. |
Tools and scripts
Tracking property completeness: InteGraality - User:Jean-Frédéric script to generate custom dashboards of property coverage for a given part of Wikidata. Properties dashboard for Metropolitan Museum of Art Wikimedia Commons Data Roundtripping project and report |
Tools and scriptsInfobox tutorials: Wikidata:Infobox_Tutorial - how to create Wikidata-powered infoboxes or other templates for Wikipedia and other projects connected to Wikidata Wikidata-driven infoboxes on Commons categories: Template:Wikidata Infobox - created by User:Mike Peel |
Alati i skripteWikidata Queries to show stats on Met Museum open access contributions to Wikidata: PAWS notebook by User:Fuzheado |
Case studies
- Add yours here
Links
- Data and media partnerships workflow - General considerations for data and media partnerships, including a series of tools for Wikidata and Wikimedia Commons.
- Content Partnerships Hub/Software/Tool prioritization survey end 2022