Wikidata:WikidataCon 2017/Notes/Creating a collaborative ontology for Wikidata

From Wikidata
Jump to navigation Jump to search

Title: Creating a collaborative ontology for Wikidata

  • Note-taker(s): Thiemo

Speaker(s)[edit]

Abstract[edit]

Creating a collaborative ontology for Wikidata is not exactly an easy task. External ontologies have provided guidelines, but also conflicting views on modeling or core concept trees, so Wikidata had to create its own ontology. Ontology problems have been constantly discussed in Project Chat and in Wikidata:WikiProject Ontology, so I would like a meeting to discuss how the creation process is going, how Wikidatans have collaborated (or not) to create one or several ontologies, how similar has been the process in Wikidata compared to other collaborative ontologies, and which are the most pressing issues that need to be addressed. Finally, it would be nice to hear as many opinions as possible as to how to move forward.

Collaborative notes of the session[edit]

Dealing with a lot of problem modelling data.

There was no study related to this. Studies about Wikidata are rare. None about ontologies.

One study found about ontologies: Collaborative ontology engeneering projects (Strohmaier 2012)

  • Foundings from said study:

For all ontology datasets, only very few users contribute a lot (power law). Assumption: Looks like it's like this on Wikidata. Preservation rate averages at 80~90%. Again, looks like it fits Wikidata.

  • Another paper cited:

Cluster analysis on users done, found heavy specialization.

Turns out the author of the cited paper is in the room.

Markus is mentioned as author of an other paper.

Wikidata is not a real ontology project. They work on the data, not on a model.

Wikidata vs. ontology: Wikidata makes it easy to work without much knowledge.

Problems
  • No experts.
  • No regulation or supervision
  • Open to everybody.
  • Users mostly come up with individual decisions.
  • Bots are messing a lot. A lot. :-(
  • Example: ProteinBoxBot keeps adding instance of to classes of diseases. :-(
  • Deleting doesn't help. :-(
  • Talking to bot author might help, until the next bot comes. :-(
  • "Concept" is not properly used.

There are items "Crew of 1", which is an instance of "Crew member", which is a subclass of "Social group". All made up. No references. Probably well meant, but hurts.

Later it turned out this was a Russian user, and if this makes sense or not might have something to do with the language.

Wikidata:WikiProject Ontology/Modelling is mentioned.

Problem is that we need to discuss and agree.

Question to the audience: What to do? Where to start? Is it already too late?

Advice from the audience: Create a WikiProject and document decisions. You can point later editors to this decision. It works. Slowly, but it works.

You must discuss. You can not go alone. Form a group that has the same goal.

Come to the IRC channel. We will talk and explain, and even block people if necessary.

You really need to explain to every single user why you are doing what you are doing.

I even have pre-recorded messages for specific mistakes people constantly do (e.g. misunderstanding with name properties).

Remark from the audience: A lot grew naturally.

People are in fact alone, or in very small groups.

Some properties are just there because very few asked for it, and nobody really discussed.

Suggestion: Show the consequence of an edit (e.g. adding an instance of) more directly.

Q: Why would somebody add "wrong" stuff? Because she truely believes it is correct.

Suggestion: Don't delete wrong stuff, mark it as deprecated instead, and even add a qualifier leaving a visible reason for the deprecation.

Q: That user adding wrong stuff, was he using the wikidata.org interface? Yes. One user is already blocked. The other only adds "dubious" statements.

Q: What to do with the bots? Report and ask for a temporary block. Bots are only part of the problem. Users can use stuff like QuickStatements.

Q: My main issue is the complexity of ontologies on Wikidata. Heavy barrier for new users. Can we please keep the ontology simple?

Discussion about instance of and subclass of and how they currently form bad ontologies. There are circles of instance of.

Remark: Without a use case people can not understand what the consequences of their modelling are.

Classical example: are colors instances or subclasses?