Wikidata:Partnerships and data imports/Archive/2016/02

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Page name and description

If you have suggestions for improving the name or description of this page please do contribute.

Thanks

John Cummings (talk) 17:09, 10 February 2016 (UTC)

Creating guides for importing data and for potential partner organisations

Hi

I've started two guides for the logistics of importing data into Wikidata and a guide for potential partner organisations to find out more about Wikidata and why they may want to add their data.

Guide to import data into Wikidata.
Guide to Wikidata for potential partner organisations.

Thanks

John Cummings (talk) 17:00, 10 February 2016 (UTC)

National datasets on IUCN listed protected places provided by Protected Planet

I'm importing data from http://www.protectedplanet.net/ which is a UNEP and IUCN project to catalogue all protected natural sites in the world. I'm going to try to import just the relevant data to Wikidata from the UK sites (the full database is 72MB). I thought it would be useful to do this because the information can be used on Wikipedia and also could form part of the sites list for Wiki Loves Earth.

I started off by downloading the data set and uploading it to Google Docs and deleting some of the columns I knew wouldn't be needed.

Google sheet

I then looked at all the columns to see which would be relevant to Wikidata and to see which extra properties may need to be created, no new properties needed to be created.

I then looked at all the 'Instances of' items that would need to be created and made a list here and created the extra items here

Google docs

I then realised that the url structure of the website would easily let me reference each claim because the ID number is the same as the second half of the url. So I've made two columns in the spreadsheet that combined to create references for each entry.

I would still like to import data on which county in the country each place is as this data was included for only some sites.

The next step is to import the dataset into Mix n' Match for matching.

--John Cummings (talk) 10:40, 15 February 2016 (UTC)

Importing species interaction data from GloBI

I just met with jhpoelen, who's interested in importing the Encyclopedia of Life's Global Biotic Interactions dataset into Wikidata. The easiest way to visualize this data is to look at their Darwin Core Archive file, whose 'association.csv' file contains 1,344,425 sourced statements of predation, parasitism and other food-related relationships between organisms. Would this be a good time in Wikidata's evolution to try importing this data? I could write a bot, but if there are better ways to import CSV-type data, please let me knowǃ -- Gaurav (talk) 17:17, 18 February 2016 (UTC)

Do the properties exist that you would need to describe these interactions? That would probably be where to start. If the properties are there, and if you can automatically match the data you have to wikidata item id's then a bot would likely be best - unless your data is simple enough for the QuickStatements tool (you would pull that from a spreadsheet). QuickStatements probably isn't appropriate for millions of statements though. If automatic matching of wikidata items will be difficult, the data might be suited to the Mix'n'Match tool - see the mix-n-match importer for what you would need there. Good luck! ArthurPSmith (talk) 21:47, 18 February 2016 (UTC)

Case studies

Are there any case studies of organisations using Wikidata to release and then curate data? Richard Nevell (WMUK) (talk) 13:52, 18 February 2016 (UTC)

Wikidata:BEIC? Nemo 10:18, 22 February 2016 (UTC)

Does anyone know a way we could import all the item names and ID numbers from protectedplanet.net?

Hi all

Protected Planet is the most comprehensive global database on terrestrial and marine protected areas so would be very useful information to have on Wikidata, each site on their website has area information and I think it would be a very base for any countries wanting to start Wiki Loves Earth. Additionally they currently offer a link to the English Wikipedia article for every site, if we could index every site they could perhaps use Wikidata as an index to offer people Wikipedia articles in other languages.

Thanks

John Cummings (talk) 11:02, 26 February 2016 (UTC)

We already have WDPA ID (P809) which is used on 4800 items. However, copying the rest of their database would be clearly against their strict terms of use ( (a) the WDPA Data are not downloadable and (b) the proper attribution is clearly visible) --Pasleim (talk) 12:05, 26 February 2016 (UTC)

Importing ship data from National Library of Wales

Jason Evans, Wikipedian in residence at NLW has proposed importing some of the data that has been made avaialble as part of the NLW Data initiative. The data proposed for import relates to 19th century commercial shipping from Aberystwyth (raw data is available here).

I have started an discussion in WikiProject ships, where some inital decisions have been made regarding what data is notable enough for import. This is outlined below:

What data do we need?

✓ Good for import

  • Every ship in the dataset can have a Wikidata item (involves matching them to Wikidata items where they exist already, then creating the missing items). For all ships we can get, the following data should be added where possible.
  • Ship Official Number
  • Port of registry ("Aberystwyth" in each case, but ideally with start and end dates for the registration)
  • Any other basic specs for the ships then that are available - e.g. length overall, gross tonnage, the shipyard which built it, and vessel type (of some sort, e.g. sail, steamer, etc.).


 Out of scope

  • Crew members (although we can link to the data on the Library website as a source so people can find it if they need)

Action plan

  1. All of the data needs to complied into in a single spreadsheet, with each row representing a single unique ship. Columns should be added for each type of data that is available (e.g. 'official number', "port of registry", "date of registry", "ship type" etc). Cells can be left blank when the data is not available for a particular ship.
  2. Match the ships to Wikidata items and add the Q numbers as a new column in the spreadsheet.
  3. Add statements from data in the spreadsheet using QuickStatements tool - missing Wikidata items can be created at the same time.

NavinoEvans (talk) 13:02, 12 February 2016 (UTC)

Comments / questions

Brilliant, thanks Jason. Let me know if any questions pop up. NavinoEvans (talk) 17:28, 13 February 2016 (UTC)
@Jason.nlw: Gest ti'r 'single spread sheet'? ON Dw i hefyd wedi gadael neges isod ond diawch o ddim yn digwydd! Araf iawn ar WD! Llywelyn2000 (talk) 08:58, 14 March 2016 (UTC)
@Llywelyn2000: Spreadsheet Shipping recs fod dod ataf yn fuan... Eithaf araf yn y Gen hefyd!Jason.nlw (talk) 12:08, 14 March 2016 (UTC)

Importing ship data from National Library of Wales

Jason Evans, Wikipedian in residence at NLW has proposed importing some of the data that has been made avaialble as part of the NLW Data initiative. The data proposed for import relates to 19th century commercial shipping from Aberystwyth (raw data is available here).

I have started an discussion in WikiProject ships, where some inital decisions have been made regarding what data is notable enough for import. This is outlined below:

What data do we need?

✓ Good for import

  • Every ship in the dataset can have a Wikidata item (involves matching them to Wikidata items where they exist already, then creating the missing items). For all ships we can get, the following data should be added where possible.
  • Ship Official Number
  • Port of registry ("Aberystwyth" in each case, but ideally with start and end dates for the registration)
  • Any other basic specs for the ships then that are available - e.g. length overall, gross tonnage, the shipyard which built it, and vessel type (of some sort, e.g. sail, steamer, etc.).


 Out of scope

  • Crew members (although we can link to the data on the Library website as a source so people can find it if they need)

Action plan

  1. All of the data needs to complied into in a single spreadsheet, with each row representing a single unique ship. Columns should be added for each type of data that is available (e.g. 'official number', "port of registry", "date of registry", "ship type" etc). Cells can be left blank when the data is not available for a particular ship.
  2. Match the ships to Wikidata items and add the Q numbers as a new column in the spreadsheet.
  3. Add statements from data in the spreadsheet using QuickStatements tool - missing Wikidata items can be created at the same time.

NavinoEvans (talk) 13:02, 12 February 2016 (UTC)

What next?

Ok, I have created a spreadsheet for the collection and matched up the fields i want to include with Wikidata statements & qualifiers, but i am confused about how i can create Wikidata items for each ship without manual adding each one. Is there a tool to do this? Any help would be greatly appreciated. Jason.nlw (talk) 15:17, 27 April 2016 (UTC)

If you use QuickStatements (Q20084080), you need to convert the spreadsheet to something like:
CREATE		
LAST	Len	"Nautilus"
LAST	P31	Q2811
For existing items, LAST would be replaced by its QID and CREATE isn't needed.
--- Jura 15:40, 27 April 2016 (UTC)
Hi Jura, I will give this a try and get back to you. Thanks for your help.