Wikidata:Filtering statements

From Wikidata
Jump to navigation Jump to search

Filtering of statements is an attempt to describe some of the problems associated with filtering down statements into some kind of sane minimum. This is a writeup of some observations from a subproject and may not reflect consensus on this project. (You be warned!)

There are other attempts on creating groups, note for example ...

Filtering methods mostly differ in how they identify which entries to keep or reject. Often this is done by some kind of sorting of the entries on priority, and then using a cut-off on the entries that should be retained. Often entries will not only have a priority, but also a cost factor. The priority is how important it is to include the entry, while the cost is how many entries can be collected before the limit is reached.

Depending on the grouping the cost will be different, which makes it difficult to calculate the correct cost. In a filtering of entries for use in an infobox-like setup some of the entries will trigger a new group with an additional title. This title adds to the overall cost. Due to the addition the cost for that entry will be higher than for other entries.

Often the additional cost for new groups are small compared to the entries themselves and can be neglected.

Truncated list[edit]

The simplest solution is to list the entries in sorting order and then truncate the list. This is simple to implement, but a little useless for the readers. More often than not they must open the list to find what they are looking for.

The implementation is usually to sort the list and to render it in a non-visible way. The reader will then make the list visible.

Prioritized list[edit]

The best way to filter down the list of entries is to prioritize them. Usually the method to prioritize them is to sort them on some measure of importance, and then to truncate the list. Sorting in this sense is not as they would otherwise be rendered, like in cronological or alphabetical order.

The following are examples on how a privatization could be done, and is in no way complete

Language intelligibility[edit]

One way to build a prioritized list of languages is to calculate a metric between two languages through subclass of (P279). The language under test and the target language both traverse upwards through the parent languages until a similar language is found, or the language chains are exhausted.

The language chain for the target language, ie. the content language of the client, can be precalculated or in fact be manually entered. It is also possible to cache previously calculated values in Lua, but note that they must be cached on each call through {{#invoke}} as each call will clear out the previous invocation.

A simple measure of language distance as done for Norwegian can be found at Module:Language distance.

Language distance[edit]

Another way to build a prioritized list of languages is to calculate a metric between two countries through official language (P37). The country under test and the target country should then both share one or more languages or parent languages. Because we

See also[edit]