Wikidata:WikiProject Wikidata for research/EINFRA-9-2015
Contributions to the "Wikidata for research" project (including Wikidata:WikiProject Wikidata for research and all its pages) are dual licensed under CC BY-SA 3.0 (the Wikimedia default) and the Creative Commons Attribution 4.0 license.
Contributions by the project to the item and project namespaces of Wikidata shall be under CC0.
About[edit]
This page hosts post-submission updates about a grant proposal that was drafted in public and submitted on January 14, 2015, in response to the Horizon 2020 (Q13583472) EINFRA-9-2015 call.
Submitted version[edit]
- Title: Enabling Open Science: Wikidata for Research (available via ZENODO: doi:10.5281/zenodo.13906 under a Creative Commons Attribution 4.0 License). Due to many copyrighted letterheads, the letters of support from the Associate partners could not be made available under an open license, which precluded inclusion in the Zenodo repository or upload to Wikimedia Commons. They are available separately.
-
Part B, sections 1-3
-
Part B, sections 4-5
-
The proposal Enabling Open Science: Wikidata for Research (Wiki4R) (Q26707522) as published in Research Ideas and Outcomes (Q20895800).
Background[edit]
Explanation in lay terms[edit]
The project proposes to create a "virtual research environment" (VRE) on Wikidata supporting both the science communities and the open-knowledge Wikidata community. Wikidata has already become a major focus point to openly share scientific information. However, the existing infrastructure for data lookup needs enriching to enable new kinds of interaction with professional science organisations in a mutually beneficial way.
The proposal does not aim to develop a virtual research environment in the sense of an "information silo": feature-complete, secure, self-contained, but also isolated and typically a discipline-specific "remote-desktop" system. Rather, it is based on the realization, that the web itself, in the form of globally interconnected data (linked open data) and services is the VRE of the future. With limited resources, the proposal will therefore focus on investigating and developing the functionality of Wikidata for professional scientific research.
Professional scientists and researchers as well as citizen scientists (including "citizen data scientist") will be able to use this environment. A popular application of this will be searching the intersections of data collections, e.g. linking public (governmental) data with research data. An example, would be to combine epidemiology data for a disease (by country, by year) with public sales data of products like drugs or food or events like concerts or movies. These forms of analyses are something that Wikidata is designed to do, but where the current service interfaces for humans and machines are insufficient as yet. With this virtual environment, one would be able to make any of these requests:
- Wikidata, please graph the relationship between...
- number of hospitals per population and the incidence of tuberculosis cases in cities in England from 2000-2010
- the obesity rates of people age 25–40 against the number of schools in a city limit
- incidence of annual flu cases versus the number of professional sports events in a city
There is no suggestion that the correlations which Wikidata will be able to graph will lead to conclusions about causation, but having this kind of power in public hands and especially having the power to tie everything that is already reported to everything else which is already reported will become the basis of much future research.
This proposal is significant because no other open collaborative project – “virtual research environment”, in EU parlance – can connect the free databases in the world across disciplinary and linguistic boundaries. With the inclusion of Freebase into Wikidata in 2015, the project will be capable of providing a unique open service: for the first time, that will allow both citizens and professional scientists from any research or language community to integrate their databases into an open global structure, to publicly annotate, verify, criticize and improve the quality of available data, to define its limits, to contribute to the evolution of its ontology, and to make all this available to everyone, without any restrictions on use and reuse.
Scientific content[edit]
Open-knowledge projects have created a remarkable knowledge infrastructure in the past years, consisting, e.g., of the Wikipedias, the structured DBpedia, and (gold) Open Access publication infrastructures. The EU is a leader in these activities.
Open-knowledge infrastructures are of great societal importance: they have become a basic utility in the discourse of an educated public with science, commerce, and politics. It is in the interest of society to facilitate access to such infrastructures for both humans and machines, and to use them to their full potential in research by involving the public much more in the creation and curation of knowledge than has been possible.
Information is often harvested from qualified and open scientific data sources and uploaded by automated processes ("bots") to Wikipedia in a one-way information flow. DBpedia, harvesting and aligning Wikipedia data, has made lots of data from the Wikipedias re-usable. Because of its breadth and its ability to connect many information sources, it is a central node in the Linked Open Data Cloud. However, the relation so far has been one-sided: it is difficult for scientific data providers to efficiently interact with traditional open knowledge and citizen science curated information systems. Furthermore, in the age of linked open data, where the web itself has become a globally integrated knowledge source, identifiers to concepts have become a basic form of language and a requirement for scientific discourse.
The advent of Wikidata is a game-changer in this respect. It allows for data to be curated in a structured database in a service-oriented architecture, where humans and API-driven algorithms can interact efficiently. Its expanding collection is not an extraction of volunteer's work: it is tailor-made by and for a community that continuously improves its data model, infrastructure and content. Wikidata has thence gradually emancipated from the traditional triples (subject-predicate-object) to a very rich information scheme (that also includes references and qualifiers).
The goal of the proposal is to strengthen the interaction between citizen scientists working in the context of open knowledge projects and the professional scientific community. The focus will be on data about things or concepts that are relevant to societal dialogue, such as names of biological organisms or agents, features or traits, chemicals, historical facts and social interactions.
Timeline[edit]
2015[edit]
- On December 22, the H2020 proposal "Enabling Open Science: Wikidata for Research (Wiki4R)" was formally published.
- On December 4, "Natural Language Processing pipeline that harvests structured data from raw text and produces Wikidata statements with reference URLs" was announced as one of fourteen Individual Engagement Grants selected for funding by the Wikimedia Foundation.
- A series of Op-Eds in The Signpost from November 25 through December 9 brings the project to the attention of the wider Wikimedia community: Wikidata: the new Rosetta Stone by Kippelboy, Whither Wikidata? by Andreas Kolbe, and Wikidata: Knowledge from different points of view by Lydia Pintscher.
- On November 12 and 15, the preprints Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes (Q21503281) and Wikidata: A platform for data integration and dissemination for the life sciences and beyond (Q21503284) were posted on bioRxiv (Q19835482).
- On September 2, drafting started on a follow-up proposal. It did not make it beyond the early draft stage, though.
- On May 11, the rejection letter for the EINFRA-9-2015 proposal was received.
- On May 5, the paper Utilizing the Wikidata system to improve the quality of medical content in Wikipedia in diverse languages: a pilot study (Q21503276) was published (cf. accompanying blog post).
- March 27, 2015: VIAF now crosslinks with Wikidata (instead of the English Wikipedia)
- A meetup in Bern on February 25-28 on taxonomic information in Wikidata, as part of the Open Cultural Data Hackathon
- A meetup in Berlin on February 23 on a follow-up to the EU proposal
- February 13, 2015: Wikidata is mentioned prominently in a Europeana strategy paper
- A commentary from January 15, focusing on the potential of the project for citizen science
- The proposal is submitted on January 14.
- A special story in the GLAM newsletter for December presents the project to the cultural heritage community.
2014[edit]
- A News and notes article in The Signpost from December 31 brings the project to the attention of the wider Wikimedia community
- A commentary from January 3 on what it means for Wikidata to scale up
- A commentary from December 21 puts the proposal into a wider perspective.
- A follow-up from December 19 introduces the project partners and the main elements of the workplan.
- A blog post from December 5 introduces the idea.
- A blog post from October 22 introduces the idea of Wikidata as a research hub, which this project builds on
Related information[edit]
- Denny Vrandečić, Markus Krötzsch (2014) Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM, Vol. 57 No. 10, Pages 78-85. DOI:10.1145/2629489
- File:Gene Wiki - Expanding the ecosystem of community intelligence resources (R01 GM089820).pdf
- File:Entwicklung eines Verfahrens zur automatischen Sammlung, Erschließung und Bereitstellung multimedialer Open-Access-Objekte mittels der Infrastruktur von Wikimedia Commons und Wikidata.pdf
- WikiBrainTools
- Opening up research proposals
- A list of publicly available grant proposals in the biological sciences
- http://blogs.plos.org/biologue/2014/12/30/media-response-forecasting-diseases-using-wikipedia/
- Wikidata:Wiki Loves Open Data