Wikidata:ProWD

From Wikidata
Jump to navigation Jump to search

ProWD (Profiling Wikidata) is an external tool for analyzing data completeness for sets of entities. It is still in the development phase, the current prototype can be tested at http://prowd.id.

Motivation

[edit]
Example completeness profiling for MoMA paintings in ProWD - overview

Suppose a MoMA employee wants to survey and improve the quality of the structured data about items in the museum's collection. The museum employee might proceed in the following order:

  1. The employee first wants to find what kind of items from the museum's collection are in Wikidata. The employee will thus design a CFA-profile containing these entities (possibly without giving thought to attributes?)
  2. The employee will want to drill in on the returned entities, to select the specific slice of interest for working on (e.g., "This week, Dutch paintings of the 18th century connected with the theme 'ocean'")
  3. The employee will see which properties are already present and which ones should be added
  4. The employee will go over the items one after the other, adding the information


Example completeness profiling for MoMA paintings in ProWD - instances

Showcase profiles

[edit]
  1. MoMA paintings by genre
  2. German judges by gender
  3. German state courts

How to use it

[edit]
  1. Open the tool at http://prowd-prototype.herokuapp.com
  2. Choose Browse to open an existing profile (skip to step 3), or Create to construct a new one.
    1. Give the profile a meaningful name
    2. Select an entity class, e.g. human (Q5), or public university (Q875538)
    3. Set a filter, e.g., place of birth (P19) = Bolzano (Q6526) (filters are important as ProWD accesses data from the live SPARQL endpoint, therefore can only process profiles with up to ~5000 entities)
    4. Define facets, e.g., sex or gender (P21)
    5. Define attributes, e.g., country of citizenship (P27), languages spoken, written or signed (P1412), educated at (P69)
    6. Press Create
  3. Completeness information is computed from the live SPARQL endpoint (be patient)
  4. Once numbers show up, you can inspect the following fields:
    1. The entity completeness distribution in the first box (bar chart)
    2. The completeness per attribute in the second box (circle diagrams)
    3. The completeness of individual entities in the third box (tabular view)
  5. You can look at facets of the data by choosing a specific facet value in the first box, then clicking Post Query

Contact

[edit]

Contact:

  • Avicenna Wisesa - avcwisesa@gmail.com
  • Fariz Darari - fadirra@gmail.com
  • Simon Razniewski - srazniew@mpi-inf.mpg.de
  • Werner Nutt - nutt@inf.unibz.it