Wikidata:Data collaborators/historical

This page is currently inactive and is retained for historical reference. Either the page is no longer relevant or consensus on its purpose has become unclear. To revive discussion, seek broader input via a forum such as the project chat.

This page is for collecting information about potential partners for Wikidata who want to provide their data. The ultimate decision which data goes into Wikidata and which doesn't is up to the community that is forming around Wikidata. This is a page to help make this decision and see what is available and who is interested in collaborating with Wikidata.

Are you interested in adding data to Wikidata ? Please see Wikidata:Data donation.

How to add a new partner[edit]

If you want to add a new partner, just copy the following lines and fill out the sections.

== Title == 
=== Summary === 
...
=== Contact person === 
...
=== License ===

...
=== Link ===
...

Astrochemistry (radio wavelength observations)[edit]

Calibrated molecular line survey observations[edit]

Summaries of observations and calibrated astrochemical surveys with the Robert C. Byrd Green Bank telescope and the Arecibo radio telescope are provided. The data format is described along with python and IDL procedures for accessing the data.

Contact person[edit]

Glen Langston, Ph. D.

License[edit]

These data are available for free public use, if the data source is properly sited. A new citation will be available shortly, but in the interim please cite:

http://adsabs.harvard.edu/abs/2007ApJ...658..455L Langston, G. and Turner, B. 2007, "Detection of 13C Isotopomers of the Molecule HC7N", Astrophysical Journal, Volume 658, Issue 1, pp. 455-461

Link[edit]

https://science.nrao.edu/facilities/gbt/ http://www.naic.edu

GBT-Taurus Molecular Cloud: Q-band

BlueForge (business software wiki)[edit]

Description of data[edit]

BlueForge is a software wiki. It's mission is to map the business software world. For each software, we are also recording metadata such as versions, programming language, main developers and so on. Also, we are researching the features of the programs in BlueForge. Currently, we are working on a solution for representing the features of a software in order to make them comparable. The backend for this will be based on Wikidata. We are setting up our own instance at the moment to be able to experiment with a data scheme. I hope, though, that this will mostly end up in Wikidata.

Contact person[edit]

Markus Glaser

License[edit]

Creative Commons Attribution/Share-Alike License 3.0 (unported)

Link[edit]

http://www.blueforge.de

Documenting crony capitalism[edit]

Description of data[edit]

v:Documenting crony capitalism is an initiative to try to crowdsource research and investigative journalism to expose more of the details of the corrupt political system in the United States with the hopes that it can help improve the effectiveness of citizen activists in their efforts to control this.

The project is still getting started, and suggestions are eagerly sought on the best way to try to achieve this goal. An outline of the current concept of how this might work is contained in v:Documenting crony capitalism#Documentation: How. The basic idea is to describe specific examples of apparent corruption with standard data elements provided in tables that would typically describe the dollar value of governmental favors for each of several years with the money spent on lobbying and campaign contributions in those years plus money spent on advertising to provide incentives for the commercial media to underreport what's happening.

The current idea is to document each example in a separate Wikiversity article included in "[[Category:Documenting crony capitalism]]". It should then be fairly easy to crawl through the list of all articles under that category, look for the standard data elements, and create an inventory and summaries.

However, I'm not married to this and would eagerly consider other suggestions. I currently have two questions:

MAJOR: How can I specify common data elements?
Secondary: Does the standard Mediawiki software supports any kind of tables other than w:Help:table? These tables seem to me to be quite clumsy. I'd feel more comfortable with a WYSIWYG table like MS Excel.

Thanks.

Contact person[edit]

User:DavidMCEddy

License[edit]

CC-BY-SA 3.0

Link[edit]

http://en.wikiversity.org/wiki/Documenting_crony_capitalism

Europeana[edit]

Description of data[edit]

Europeana gives people access to millions of digitised objects from Europe's museums, libraries, archives and audiovisual collections. Europeana acts as a single point of access to cultural objects (books, museum objects, paintings, films and archival records) that have been digitised throughout Europe. Europeana offers descriptive metadata for more than 20 million cultural objects under the terms of the Creative Commons CC0 1.0 Universal Public Domain Dedication (CC0). The data is represented using a new Semantic Web-inspired data model: the Europeana Data Model. The data can be accessed

via an API
as Linked Open Data: an old subset at data.europeana.eu and europeana.ontotext.com incl SPARQL
OAI/PMH. This and full up-to-date LOD will be launched Jan/Feb 2015

Contact person[edit]

Valentine Charles (valentine.charles(at)kb.nl); David Haskiya (david.haskiya(at)kb.nl).

Vladimir.Alexiev at ontotext.com

License[edit]

Commons CC0 license

Link[edit]

Gene Wiki[edit]

Description of data[edit]

The Gene Wiki is an existing Wikipedia initiative. See Portal:Gene_Wiki. As part of this project, bots maintain the structured data in the infoboxes of each gene page. (See for example, the box on the right of this gene article). We'd like to add this and some additional structured data about human genes to the wikidata base.

Contact person[edit]

Andrew Su (talk)

Ben Good

License[edit]

Unrestricted

Link[edit]

Python bot code repo

IMSLP[edit]

Description of data[edit]

From Wikipedia: The International Music Score Library Project (IMSLP), also known as the Petrucci Music Library after publisher Ottaviano Petrucci, is a project for the creation of a virtual library of public domain music scores, based on the wiki principle. Since its launch on February 16, 2006, over 220,000 scores and 21,000 recordings for over 61,000 works by over 7,800 composers have been uploaded. The project uses MediaWiki software to provide contributors with a familiar interface. Since 6 June 2010, IMSLP has also included public domain and licensed recordings in its scope, to allow for study by ear.

In my opinion it would be interesting to have an entity for each composer/work and crosslink it with Wikipedia data.

Contact person[edit]

Edward W. Guo (User:Feldmahler)

License[edit]

Public domain.

Link[edit]

Imslp.org

Machine-readable Wiktionary[edit]

Description of data[edit]

Wikokit project presents machine-readable version of English Wiktionary and Russian Wiktionary (MySQL and SQLite formats). The texts of Wiktionary entries were converted into tables and relations in a relational database schema (see paper). Now you can work with the database of the machine-readable Wiktionary:

directly via SQL-queries (see examples)
via developed Java API.

The modular architecture of the developed Wiktionary parser allows to add new software modules and extract data from other Wiktionaries (see architecture).

Entry wikt:ru:airplane. Screenshot of the Android dictionary "kiwidict-ru 0.091" which uses data of the machine-readable Wiktionary extracted from the Russian Wiktionary.
Entry wikt:boat. Screenshot of the Android dictionary "kiwidict 0.096" with the machine-readable Wiktionary extracted from the English Wiktionary.

Contact person[edit]

Andrew Krizhanovsky (Andrew dot Krizhanovsky at Gmail)

License[edit]

Distributed under EPL/LGPL/GPL/AL/BSD multi-license.

Link[edit]

MegaJoule.org[edit]

Description of Data[edit]

MegaJoule.org is an online energy information-sharing resource with a decided emphasis on numbers. Our goal is to make it as easy as possible for experts, policymakers and the public to access transparent, democratically-vetted, research-quality numerical estimates about the past, present and future of energy technology. MegaJoule.org is home to a structured, searchable database of well-documented estimates, e.g. "The levelized cost of electricity of fixed-axis, utility-scale photovoltaics in 2020 = $0.10/kWh (NREL 2012 PV)". This statement is made up of several components. The value and units are "$0.10/kWh". To give this number meaning, we need plenty of qualitative information, much of which is specified in the rest of the statement. The technology in question is "photovoltaics", the application the technology is being used for is "fixed-axis, utility-scale", the metric is "levelized cost of electricity", the year in question is 2020 and the source is abbreviated "NREL 2012 PV". In estimates in MegaJoule.org, each of these individual bits of information, along with many others not contained in the statement above, are entered and stored as separate fields to make it easier to search through, compare and manipulate estimates. Technologies are categorized within a three-level class hierarchy. Contributing users are encouraged to specify uncertainty information and assumptions in estimates they enter.

Data in MegaJoule.org is community-sourced, mainly from experts in the field. Users who wish to add, rate or comment on estimates, sources, technologies, etc. are asked to identify themselves to encourage accountability. Users can browse estimates in searchable tables or in a graphing engine, implemented in Lumina's Analytica software. MegaJoule.org's relational database was implemented as an EnterpriseWizard knowledgebase. You can explore the beta version of MegaJoule.org at www.megajoule.org. If you have any questions, comments, or relevant data, feel free to shoot us an email.

Contact Person[edit]

Evan Sherwin: evan@lumina.com

License[edit]

Not finalized, likely CC-BY

Link[edit]

www.megajoule.org

Monumenten Inventarisatie Project[edit]

Description of data[edit]

nl:Monumenten Inventarisatie Project

Between '86 and '95 the institute for culturale heritage to the Netherlands made an inventory of buildings that might be elligible for the status of national monuments.

Contact person[edit]

RCE

License[edit]

CC-zero

Link[edit]

Years ago the goverment had it published somewhere online. This file has been circulating among Dutch Wikipedians:

SQL database file

MusicBrainz (recorded music metadata)[edit]

MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.

MusicBrainz aims to be[edit]

The ultimate source of music information by allowing anyone to contribute and releasing the data under open licenses.
The universal lingua franca for music by providing a reliable and unambiguous form of music identification, enabling both people and machines to have meaningful conversations about music.

Like Wikipedia, MusicBrainz is maintained by a global community of users and we want everyone to participate and contribute.

The MusicBrainz data includes core information such a list of artists, their releases, and the tracks on the releases. We have barcodes, ISRCs, release dates, release countries and tons of other data for releases. We also have information about musical works and music labels. For all of these pieces of data, we also have Advanced Relationships that relate all of these pieces of data to web resources or other pieces of data in our database. For instance, take a look at our page on U2 and all the related data we have.

The data can be accessed via the web service API or downloaded to your own machine.

Contact person[edit]

Robert Kaye (rob at musicbrainz org)

License[edit]

The core data (artists, releases, recordings, etc.) is CC0, effectivly released into the public domain.
Derived data (annotations, tags, search indexes) along with a complete edit history and non-personal user data is available under the CC-By-NC-SA 3.0 (Attribution-NonCommercial-ShareAlike) license.

(Details).

Link[edit]

musicbrainz.org

Open Food Facts[edit]

Description of data[edit]

A free, open collaborative database of food facts from around the world, which aims to help consumers make better choices about what they put in their body, as well as motivating industry to take more care over the production of food.

Contact person[edit]

contact@openfoodfacts.org
Teolemon

There are also OFF admins on WD. We're interested in active collaboration. User:Teolemon--Teolemon (talk) 12:06, 3 January 2015 (UTC)[reply]

License[edit]

The Open Food Facts database is available under the Open Database License. The individual contents of the database are available under the Database Contents License. The product photos are available under the Creative Commons Attribution ShareAlike licence. They may contain elements that are subject to copyright, tradermark rights or other rights, that may in certain cases be reproduced ("fair use", information or quotation rights).

Link[edit]

http://world.openfoodfacts.org/

Open Football Data - football.db[edit]

Description of data[edit]

Public domain football datasets in plain text for national team competitions (World Cups, Euro, Gold Cup, Copa América, etc.) and football club tournaments and leagues (English Premier League, Champions League, Bundesliga, etc.)

Contact person[edit]

Gerald Bauer

License[edit]

Public domain.

Link[edit]

football.db Project Site

OpenSeaMap - Watersport-Wiki[edit]

OpenSeaMap would like to build a worldwide multilingual Watersport-Wiki, connected by geo-references to the nautical chart. Sailors, motor boaters, divers, surfers, fisherman, kayakers, rafters and canoeing peoples can find there all necessary and useful informations about harbours, marinas, anchoring places, sea areas, rivers, whitewater sections, dive spots, fishing grounds etc. and a lot of detailed informations around.

data we would like to share with WikiData
we cooperate already with Wikipedia:
all geo-referenced Wikipedia articles are linked on the map, as marker or as gallery
we expect >200 k articles
there are already 5 k harbours and marinas, and 40 k navigational lights
the Watersport-Wiki use Commons for sharing pictures

As we are newbies in SMW, we hope for help:

design (multilingual, user groups)
technics (interface to WikiData, usability)

Contact[edit]

Markus Bärlocher markus@openseamap.org
Speach about at WikiCon

License[edit]

same as OSM (CC-by-SA > ODbL)

User:DrTrigonBot[edit]

Description of data[edit]

Any kind of data that are free and can be retrieved from web pages by URL or mailing list by automatic e-mail distribution. The data can be provided in text (e.g. XML, CSV, ...), excel, ods or zip format at the moment (more can be included on request). The bot is currently running on several wiki's in order to provide data to templates there, e.g. look at SUL Info. The bot can freely be configured from wiki to adopt to any data source format by use of regex and python code stored in the wiki (not hard coded in the bot!). For further information please confer w:en:User:DrTrigonBot/Subster and feel free to write an email or drop me a message on w:de:User talk:DrTrigon.

Contact person[edit]

Dr. Trigon: Special:EmailUser/DrTrigon

License[edit]

Bot is Public Domain, Data depends on source used.

Link[edit]

m:User:DrTrigonBot, m:Wikidata/Bots#User:DrTrigonBot (SubsterBot), User:DrTrigonBot

Public Domain Project[edit]

Description of data[edit]

Public domain music files (OGG, FLAC) with description page and free music encyclopedia. Our archive includes over 50,000 78 rpm records, cylinders and discs of Edison and Pathé that we have to clean and digitize.

Contact person[edit]

Carl Flisch (fuchur(at)pdproject.org); Christoph Zimmermann (nuess0r(at)pdproject.org)

License[edit]

Public domain

Link[edit]

Schindlerjuden[edit]

Description of data[edit]

The two lists of Oskar Schindler include 1098 persons (separated by male / female) who were saved during the Holocaust. Would it be ok to add 1098 new person records with attributes is on Schindler's list (any better suggestion?)?

Contact person[edit]

Do we have to ask

Yad Vashem? (owner of 2 copies, one from April 18, 1945 at Brünnlitz)[1])
International Tracing Service (ITS)? (owner of a copy, written by Mieczyslaw Pemper upon the prisoners´ arrival October 21, 1944 at Schindler´s Brünnlitz factory[2], see ITS Data protection principles)
United States Holocaust Memorial Museum? (one copy from April 18, 1945 at Brünnlitz [3], data at Holocaust Survivors and Victims Database)

License[edit]

To be clarified?

Link[edit]

Schindlerjuden in the English WP

VISS[edit]

Description of data[edit]

VISS (Water Information System Sweden) is a database containing all major lakes, rivers, goundwaters and coastal waters of Sweden. VISS main purpose is to provide comprehensive information regarding water to people working with water daily. The information is also available for the general public to encourage participation in the work to improve the status of our waters.

Contact person[edit]

Niklas Holmgren (niklas.holmgren@lansstyrelsen.se)

License[edit]

CC0

Link[edit]

The database: http://www.viss.lansstyrelsen.se The Open API: http://www.viss.lansstyrelsen.se/api A Raw RDF is available as well.

WorldCat.org Open Linked Data[edit]

Description of data[edit]

Open, Linked, RDF bibliographic data of WorldCat.org entries. Using schema.org and library extension vocabularies.

Contact person(s)[edit]

Max Klein, Merrilee Proffitt at OCLC (Online Computer Library Center).

License[edit]

ODC-BY

Link[edit]

http://www.oclc.org/data.html

World University and School[edit]

World University and School is like Wikipedia with MIT OpenCourseWare, for people-to-people, open teaching and learning, and planned for all 3,000-8,000 languages and for < 205 countries. Free, online, bachelor, Ph.D., law and M.D. degrees, accrediting on MIT OCW + in many, many countries, engaging the conference method, are planned. Free, online, MIT OCW-High School-centric, International Baccalaureate (I.B.) degrees in at least the 6 United Nations' languages are also planned. An all-instrument, in all languages, Music School, for collaborative, real time music-making is also planned.

Description of data[edit]

Open, Linked, in all languages and countries; Universal Translator, perhaps with a context-focus; all symbols; Music School with all instruments, each a wiki, subject page, and in all 3,000-8,000 languages; physics-, virtual world- and brainwave headset- data, for example, are also planned for, as the internet and these develop.

Contact person[edit]

Scott MacLeod; worlduniversityandschool at gmail.com; worldunivandsch at scottmacleod.com

License[edit]

already CC-BY, and likely CC-BY

Link[edit]

http://worlduniversity.wikia.com/wiki/World_University