Wikidata:Schema.org

From Wikidata
Jump to navigation Jump to search

Intro[edit]

Copied here from Project Chat

Schema.org provides a vocabulary for structured data markup on the Web. Structured markup using this markup can be found on more than 30% of pages in a sample of 10 billion. Schema.org is used by a wide range of organizations: from the New York Times to the WWF, whether global organizations like Greenpeace or UNESCO, or establishments like many local cinemas or pubs, Schema.org is used by more than 10 Million organizations worldwide to publish data on the Web. You can read more about Schema.org in CACM or on Wikipedia.

A major cost factor for applications using this data is in aggregating data about a given entity from different sources. Whereas the vocabulary is standardized - Schema.org defines properties and types - the identifiers for the individual items were not. This was done by design, to make it easier for publishers (see the aforementioned CACM paper for details).

In order to reduce the cost for applications consuming Schema.org markup, in particular smaller organizations and individual developers, to aggregate fragments of Schema.org markup from different sources, Schema.org is considering to encourage the use of Wikidata as a common entity base for the target of the schema:sameAs relation (not to be confused with owl:sameAs).

There is also a class of entities, that are intermediate in generality, between very high general terms such as Person and birthDate and very specific concepts such as individual persons or movies, that may be standardized. This includes lists such as the list of languages and countries. The idea is to use SPARQL queries in order to produce and publish easy to use URIs for those items, e.g. https://lists.schema.org/Country/France. These would be published by Schema.org with a mapping to Wikidata as part of the normal Schema.org release process. The necessity for these arise from the fact that they will be easier to use and reuse than the Q-ID based Wikidata URIs.

This will allow anyone to grab a bunch of data from different sites, and integrate them with much less effort than currently. To name just one example: IMDB publishes data about movies using Schema.org. Wikidata uses these pages as references. By having IMDB using Wikidata identifiers, scripts like the one developed by Adam Shoreland, will be able to much easily compare the existing data in Wikidata with such external data sources - on many more sites.

Schema.org would like to discuss this step with the Wikidata community before implementing it, in order to discuss potential issues early and prepare for them. So I am here to open this discussion. --Denny (talk) (wearing his Google hat) 17:57, 4 May 2017 (UTC)[reply]

Context[edit]

Issues, comments and feedback[edit]

Please see Wikidata talk:Schema.org