Shortcut: WD:WPR

Wikidata:WikiProject Redundancy

From Wikidata
Jump to navigation Jump to search

WikiProject Redundancy

The primary aim of WikiProject Redundancy is to reduce the amount of Wikidata's data—without reducing the amount of information in Wikidata!—for the well-being of Wikidata, its community, and its downstream users.

Join us!

Motivation

[edit]

Wikidata's growth in recent years has sparked concerns about the likelihood of the collapse of its Query Service (WDQS) and the increasing inability to edit many of its larger items. Much of it stems from a considerably large amount of data within it being stored unnecessarily, both when this information is not actively used elsewhere on Wikidata and when the information represented by this data can be readily and reliably computed in other ways.

This WikiProject seeks to keep the amount of information on Wikidata constant while reducing the overall size of its data, both in terms of the lengths of item pages on wikidata.org and the number of RDF triples in WDQS. It will distinguish between several types of action that may be taken, including 1) what can in principle be done right now without affecting existing workflows, 2) what is also possible now but may require acceptable changes to queries for accommodation, and 3) what is not currently feasible since it necessitates software changes and possibly entirely new storage units. It is expected that some of the proposed actions may be controversial, but we hope to foster discussion about these taking into account Wikidata's site health, community health, and usability.

It is hoped that, depending on the types of action described, participants will be inspired to either take these actions directly or encourage those who develop Wikibase, its Lua interface, and WDQS to make appropriate changes and improvements so that those actions can later be taken.

Data size aspects

[edit]

There are two ways to measure Wikidata's size:

  • the number of RDF triples (of relevance to WDQS); and
  • the size of the Wikidata dump (whether in JSON or TTL; of relevance to external users).

The main difference between these is that adding a reference, quantity, time, or coordinate that exactly duplicates another elsewhere in Wikidata adds relatively more to the dump size than to the RDF triple count.

For statistics regarding the number of RDF triples in WDQS, cf. User:AKhatun/WDQS Triples Analysis (2021) and User:Mahir256/Triples (2022).

Actions that can be taken

[edit]

Editable data on Wikidata

[edit]

Ongoing and uncontroversial

[edit]

Will likely require consensus

[edit]

Deprecated statements

[edit]

Possibility of moving data to external databases

[edit]

Technical fixes needed

[edit]

For Wikibase

[edit]
References

For Wikidata

[edit]

For Commons

[edit]

Participants

[edit]
[+] Add yourself to the list

The participants listed below can be notified using the following template in discussions:
{{Ping project|Redundancy}}

References

[edit]
  1. Mentioned in meta:Community Wishlist/Wishes/Fix main bugs in Wikidata and WDQS handling of dates.
  2. 2.0 2.1 Mentioned in meta:Community Wishlist/Wishes/Improve Wikidata handling of duplicate references in model and UI.
[edit]
Other WikiProjects
Presentations
Essays