Wikidata:WikiProject Reference Verification

This is a research and development project aimed at helping Wikidata editors check the quality of external references based on various types of AI/ML models. The method heavily relies on an academic paper called ProVe^[1].

We are developing a backend to deploy AI/ML models and to be used as an inference server. Furthermore, we aim to launch feasible tools such as a Wikidata gadget, dashboard, and bot-based worklist update page.

Info You can use the AutoEdit tool to quickly add label and description on WikiProject Reference Verification in many languages.

Introduction

Motivation

Wikidata is a repository of information that gathers data from many different sources and topics. It stores this data as semantic triples, which are used in various important applications on the modern web, including Wikipedia infoboxes and search engines.

Wikidata mainly serves as a secondary source of information. To be trustworthy and useful, it needs well-documented and verifiable sources for its data. However, checking the quality of these sources, especially whether they actually support the information in Wikidata, has mostly been done manually. This manual process doesn't work well as Wikidata grows larger.

ProVe aims to solve this problem. It's an automated system that checks whether a piece of information (a triple) in Wikidata is supported by the text from its listed source. This approach can help ensure the quality of Wikidata's content more efficiently as it continues to expand.

Challenges

ProVe implementation for helping Wikidata editors faces several challenges:

How to best support Wikidata editors' workflow based on ProVe results
How to design system architecture for ProVe to support AI/ML inference and integrate with Wikidata tools using Toolforge and gadgets
What is the most effective method to present ProVe results for reusability
How to handle claims or triples that lack references

Worklists (Under Development)

These are worklists of priority items that need their references checked, for example using ProVe. These are based on the number of incoming and outgoing links of the item, and their relevance to various use cases.

- https://kclwqt.sites.er.kcl.ac.uk/page/worklist/generationBasics

This Wikidata Item Verification Table ranks various Wikidata items (e.g., countries, concepts, or entities) based on how well their claims are supported by external references. Each item has several associated claims, and this table shows how many of those claims are supported, refuted, or lack sufficient information. The table also includes a metric indicating the number of external sites connected to each item. The items in this table are ranked based on their connection to external sources. Items with more claims that are supported by external sources are ranked higher, while items with many refuted claims or with not enough information are ranked lower. The Number of Connected Sites is also a factor in ranking, as items with more connections generally have more sources for verification. Here is an example of how the table might look with different Wikidata items and their verification results:

Wikidata Item (QID)	Not Enough Info	Refutes	Supports	Number of Connected Sites
United States of America	429	42	177	406.0
Turkey	124	39	13	402.0
Japan	243	12	9	401.0
Russia	152	21	8	401.0
Italy	152	34	9	392.0

- https://kclwqt.sites.er.kcl.ac.uk/page/worklist/pagePileList

This Wikidata Claim Verification Table designed to verify claims in Wikidata by checking external sources using AI/ML models. Each claim is transformed into a natural language sentence (referred to as Final Verbalization) and compared to external references to determine if the claim is supported or refuted. Claims are ranked based on the confidence scores and text entailment results from the AI system. Claims in the table are ranked based on the AI system’s confidence in determining whether a claim is supported by external sources. Claims with high SUPPORT scores are ranked higher, while claims that are REFUTED or have NOT ENOUGH INFO are ranked lower. The number and relevance of extracted sentences from the external source also influence the ranking. Here is an example of how the table would look with claims and their verification details:

Final Verbalization	Source URL	Extracted Sentences (NLP)	Sentence Scores (NLP)	Top Matching Sentence	Evidence TE Probabilities	Final Decision (Claim Label)	Wikidata Item (QID)
The semicircular canal is described by source in Gray's Anatomy (20th edition).	Gray's Anatomy Reference	['The osseous labyrinth consists of three parts: the vestibule, semicircular canals, and cochlea.', 'Another sentence from the text...']	[0.2726, 0.1543]	The osseous labyrinth consists of three parts: the vestibule, semicircular canals, and cochlea.	[0.85 SUPPORTS, 0.10 REFUTES, 0.05 NOT ENOUGH INFO]	NOT ENOUGH INFO	Semicircular canal
The vestibular system is studied in the field of audiology.	British Academy of Audiology	['Audiology professionals are involved in helping to diagnose problems with the vestibular system.', 'Another sentence from the text...']	[0.3424, 0.1678]	Audiology professionals are involved in helping to diagnose problems with the vestibular system.	[0.88 SUPPORTS, 0.07 REFUTES, 0.05 NOT ENOUGH INFO]	SUPPORTS	Vestibular system

We plan to publish a set of RDF triples derived from ProVe results.
This will allow Wikidata editors and users to access a subset of Wikidata items with ProVe results that need to be addressed via SPARQL query.
Worklist pages

Participants

Elena Simperl
Albert Meroño
Odinaldo Rodrigues
Miriam Redi
Yiwen Xing
Yihang Zhao
Jongmo Kim
So9q (talk • contribs • logs)
salgo60 (talk • contribs • logs)

References

↑ Amaral, G., Rodrigues, O., & Simperl, E. (2022). ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources. Semantic Web, (Preprint), 1-34.

[1] Amaral, G., Rodrigues, O., & Simperl, E. (2022). ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources. Semantic Web, (Preprint), 1-34.

[1]

Wikidata:WikiProject Reference Verification

Contents

Introduction

Motivation

Challenges

Worklists (Under Development)

Participants

References

Navigation menu

Wikidata:WikiProject Reference Verification

Introduction

Motivation

Challenges

Worklists (Under Development)

Participants

References

Navigation menu

Search