Wikidata:WikiProject Ontology/Issues

From Wikidata
Jump to navigation Jump to search

This is an overview of the main ontology issues found in Wikidata (the following classification is copied from File:WikidataCon 2021 - Overview of ontology issues.pdf).

Classification[edit]

  • semantic drift
  • structural bugs
    • "subclass of" cycles
    • mix-up of meta levels
    • redundant relations
      • redundant classification
      • redundant generalisation
    • exchanged sub-/superclasses
  • upper level ontology is messy
  • conceptual ambiguity
  • inconsistent modeling
  • overgeneralisation
  • conflicting real-world models
  • unclassified items

Semantic drift[edit]

  • subclass of (P279) is assumed to be transitive: it holds between different levels of the class hierarchy
  • Semantic drift shows when the inferences turn out to be wrong
  • Individual subclass relations might be acceptable, but the combination is not
  • Caused by concepts having different aspects that are merged into one:
    • e.g. mason the person vs. mason the profession

Structural bugs[edit]

"Subclass of" cycles[edit]

  • Created if class A has a subclass B and B is a superclass of A
  • Make it impossible to determine which items are meant to be more specific or general than others
  • Amounts to declaring that the classes A and B in a hierarchy are equivalent

Mix-up of meta levels[edit]

  • Occurs when, through inconsistent use of instance of (P31) vs. subclass of (P279), the same item is simultaneously a class and a metaclass, or similar.
  • Brasileiro et al. (2016):
    • Z is both instance of and subclass of A
    • C has direct superclasses A and B such that B is instance of A
    • C is instance of both A and B, B is instance of A

Redundant relations[edit]

  • Redundant classification
    • an item is both an instance of a class and one of its super classes.
    • If A is instance of B, which is subclass of C, then A instance of C is redundant
  • Redundant generalisation
    • an item is both a subclass of a class and one of its super classes.
    • If A is subclass of B, which is subclass of C, then A subclass of C is redundant
  • Locality of editing: not seeing all the consequences of one's actions
  • Potentially competing needs: sometimes the “shortcut statement” may be needed

Upper level ontology is messy[edit]

  • Upper ontology is hard
  • The top-class entity (Q35120) has 59 direct subclasses (in 2021)
  • Messy connections in the upper ontology lead to:
    • issues with automated inferencing
    • nonsensical conclusions
  • People care more about local ontologies

Conceptual ambiguity[edit]

  • Is caused by conceptual overloading of entities
  • Makes it hard to understand what statements refer to
  • Partly inherited from Wikipedia
  • Partly created to integrate viewpoints
  • Easier to keep overloading than to split (convenience)
  • Alternative would be worse (significant increase in the number of items)

Inconsistent modeling[edit]

  • Occurs when similar kinds of data is modelled in different ways
  • Observable both across domains and within a single domain
    • e.g. mauve an instance of color and a subclass of one of its instances; what are colors?!
  • Lack of common domain understanding?
  • Several different ways to model the same data
  • Very different design decisions taken for different domains

Overgeneralisation[edit]

  • Instances are too high in the class tree
  • Classification is too general
    • e.g. Club-Mate (Q53) is a trademark, but it would be better classified as a "food brand", which is a "brand", which is "trademark", too.

Conflicting real-world models[edit]

  • Real world is a mess
  • Different groups have different views on the world
  • May lead to overlapping and conflicting classifications
  • Qualifiers to the rescue?

Unclassified items[edit]

  • Items with no classifying statements
  • Not connected to existing ontology
  • Often happening when new items are automatically created for new articles in Wikimedia projects

Presentations[edit]