Wikidata talk:WikiProject Reasoning

From Wikidata
Jump to navigation Jump to search

Classes, taxonomy and inferences[edit]

A big case popped up in Project Chat recently corresponding to subclass of (P279), "parent taxon" and taxonomy. Taxons can easily considered are classes in modeling technologies such as OWL2 DL, yet taxonomists prefers the more specific "parent taxon" property. To deal with this

  • A first idea was to use the "subproperty" feature to link the two properties, which could draw some kind of useful inferences to use taxons as classes, but OWL2 DL is not really happy with subproperties of reserved vocabulary
  • So questions occurs : can Wikidata leave with this anyway and use a rule of inference based on subproperty of subclass of (P279) considering the semantics drawn by inferences are not to be OWL compatible (and that the exports could use precocessing steps to be exported as useful OWL, such as substituting "parent taxon" with "OWL" ? Is there a way of creating useful inferences such as considering "parent taxon" and "P279" as synonyms ? author  TomT0m / talk page 09:20, 28 August 2015 (UTC)[reply]
Wikidata can just go with 'parent taxon' <subproperty of:subclass of> and just accept that OWL2 isn't very happy. There are good practical reasons to do this. We will just have to wait for OWL3 and hope it agrees with us. Joe Filceolaire (talk) 22:28, 2 September 2015 (UTC)[reply]
@Filceolaire: I don't really think this works that way :) author  TomT0m / talk page 08:37, 3 September 2015 (UTC)[reply]
  • Two general responses to two distinct parts of the question:
    1. The rules consider "subclass of" to be a normal Wikidata property just like "parent taxon". Anything you can do for one can also be done for the other, so one could realise a kind of hierarchical reasoning there. It is not necessary to relate "subclass of" and "parent taxon" to get this. One just has to write similar rules for both cases. This fits to the decision to make "subclass of" a regular Wikidata property (the original design considered a special type of "subclass of statement" that does not support qualifiers but that can be mapped to OWL, but this was dropped).
    2. The relationship to OWL is not as straightforward as the question suggests. OWL is based on RDF-like data (triples), not on Wikidata statements. We can call a property "subclass of" but that does not make it OWL. To use OWL, one first needs to translate Wikidata content into RDF data. This has been done (for the upcoming SPARQL query service), but to do this, a single statement is translated into many triples. There is not even a single RDF property that corresponds to the Wikidata property "subclass of". Therefore, it is not possible to simply use OWL vocabulary here to give an OWL semantics to subclass of (P279). One could express the intended inferences with rules, but unfortunately these rules cannot be expressed in OWL 2 axioms. Even something as simple as symmetry of a Wikidata property cannot be expressed in OWL 2 based on the RDF translation. There are several RDF translations (see our recent paper), but none of them are well-supported by OWL.
Summary: OWL does not care about whether you make "parent taxon" a subproperty of "subclass of" or not, since neither of these Wikidata properties mean anything to OWL, but if you have rule-based reasoning support for "subclass of", then you don't need to declare such a relationship to get the same reasoning for "parent taxon" :--Markus Krötzsch (talk) 11:55, 3 September 2015 (UTC)[reply]
Thanks for the answer. I can't help to be a little unsatisfied by it though :) Even if the RDF translation for Wikidata is not directly usable in an OWL framework there is a lot of conceptual overlap and I'm pretty sure that in a number of cases subsets of the OWL ontology could be used almost as is in OWL. For example subclass and instance of statements qualified with period of times could be flattened by considering a certain point in time for some applications. On the other hand when using tools like protege to write an ontology (the property proposal process is imho far from being optimal for this) OWL could be a pretty useful tools if connected some way to Wikidata to add at some points the statements realated to an ontology automatically ... So i'm under the impression that it should not be an "all or nothing" kind of relationship between Wikidata and OWL and I'd like to see how they could be interconnected. author  TomT0m / talk page 13:52, 3 September 2015 (UTC)[reply]
Yes, you are right. The conceptual relations are very important. I think that Wikidata can also still learn a lot from the experiences of OWL users when it comes to taxonomic modelling. In semantic terms, the core of OWL is a first-order, open world semantics, that we should also aim for on the statement level. As you say, some applications of Wikidata could also consume simplified versions of our data as if they had been OWL (we are creating such simplified exports for subclass-of, for example: see e.g. the Wikidata RDF exports from 20150817). I am just saying that within Wikidata, where we have to deal with the statements in their full complexity, we will need a solution that is beyond pure OWL (but which can also be simpler in other respects!). We should aim at something that is compatible enough to support the kinds of exchange you mention. --Markus Krötzsch (talk) 16:34, 3 September 2015 (UTC)[reply]

Inferred sources[edit]

Let's say we have the following inference rule:

If item ?A has a statement ?S1 with property P1 and value ?B,
then item ?B has a statement ?S2 with property P2 and value ?A.

If statement ?S1 is sourced with reference ?R1, can we infer a reference ?R2 for statement ?S2 ? I see currently multiple ways how this is handled, e.g by using Q20651139 [1] or by using ?A as source of ?S2 [2]. None of this ways is convincing me, can we find a better solution? --Pasleim (talk) 12:30, 28 August 2015 (UTC)[reply]

In this simple case, where one statement S2 follows from one other statement S1, one could copy the references given for S1 to become references of S2. However, this breaks down as soon as a statement follows from more than one premise (the use cases gave examples for this).
Therefore, I would prefer references that indicate how a statement was derived. So the reference should specify which inference rule and which premise statements were used. A technical problem is that one would need to refer to specific statements, which is something that we do not have a good facility for. Alternatively, one could just point to the items from which the statements were taken (this would usually be enough). The time/revision number of the inference could also be recorded. Of course, all of this only matters when we add inferred statements to the wiki; there are probably also cases where this is not planned. --Markus Krötzsch (talk) 12:59, 3 September 2015 (UTC)[reply]

Reasoning conflict[edit]

Proposal: if there's a conflict between a sourced statement and a different, non-sourced statement, the sourced one should trump over the non-sourced one. Pikolas (talk) 13:07, 31 August 2015 (UTC)[reply]

Is this related to "reasoning" or is this maybe a general proposal on how to deal with conflicting statements? We could use rules to detect conflicts, but we could also have other ways to detect them (for example, a bot may "know" that certain statements are in conflict without using any rules). When we use reasoning, we have to consider that there can be more than two statements involved. If a rule finds that three statements are in conflict (being a Anglican priest, being female, being ordained before 1970) then only one of them would need to be removed to solve the conflict. But it might not be wise to remove the one without references blindly. For example, "being female" rarely has a reference. Moreover, it can be that a particular statement causes many conflicts, while another causes only one or two; then maybe we should remove the one that seems to be responsible for more conflicts. You see it is a tricky topic. A lot of people have studied this (see: reasoning under inconsistencies, knowledge base repair, paraconsistent logics, ...). I think we will need human approval before deleting any statement in any case. --Markus Krötzsch (talk) 16:12, 3 September 2015 (UTC)[reply]

Using properties instead of templates[edit]

I've outlined on Wikidata:WikiProject_Reasoning/Use_cases#Subclass_of_is_transitive and Wikidata:WikiProject_Reasoning/Use_cases#Offices_of_heads_of_government how we could use statements to define rules for reasoning. This gives us something that is

  1. Internationalised
  2. machine readable
  3. familiar to users

with no extra work.

Anyone object to me rewriting the proposal in this way? Markus ?

Joe Filceolaire (talk) 22:22, 2 September 2015 (UTC)[reply]
Hi. I just moved your proposals from Use cases to a new page Rule format. I think it is easier to explain and discuss them in one connected section. Please check if I have copied everything in a logical order that preserves the intended meaning. On Use cases, we should just collect use cases (if we mix in the possible formats, it will become too crowded there ;-). I have also added some more structure to Rule format, so that we can discuss the general idea of how to encode rules in statements or templates, respectively. I have already added some generic comments, but I will add more detailed questions (I don't fully understand the proposal yet). --Markus Krötzsch (talk) 16:03, 3 September 2015 (UTC)[reply]

Reasoning in wikidata vs reasoning in RDF[edit]

From the above discussion it occurs to me that maybe we ought to distinguish where we expect inferencing to happen. If (some) wikidata inferencing is done outside of or along with any export to RDF, I think it resolves some of the above issues. For example, if OWL can't use "subproperty of" with "subclass of", then why not apply the inference X <property> Y && property <subproperty of> <subclass of> => X <subclass of> Y before exporting to RDF. The <subclass of> claim doesn't have to be instantiated anywhere in wikidata itself, but should be inferred before any RDF export. Yes that means wikidata can't use OWL directly for reasoning, but I don't see how it can anyway with the existence of qualifiers etc ArthurPSmith (talk) 15:22, 4 September 2015 (UTC)[reply]

This would make the RDF export a non exact representation of Wikidata. It would more be a transformation Wikidata -> RDF export -> OWL compatible export, the last step beeing done thanks to tools maybe like Wikidata Toolkit. author  TomT0m / talk page 15:28, 4 September 2015 (UTC)[reply]

As an experiment, I have created a new Wikidata:Classification noticeboard page for people to be able to discuss particular issues with the classification tree for particular items, and specific subclass chains that appear to break transitivity and thus lead to nonsense query results -- which may be of interest here, given subclass transitivity as such a key example of a rule to enable reasoning. A longer, chattier anouncement can be found posted at at Project Chat. Jheald (talk) 16:40, 6 September 2015 (UTC)[reply]

Wikimania 2016[edit]

Only this week left for comments: Wikidata:Wikimania 2016 (Thank you for translating this message). --Tobias1984 (talk) 12:08, 25 November 2015 (UTC)[reply]

A little nitpicking[edit]

Inference happen when you have the rules; you traverse from value to value until you have a result. Reasoning happen when you have a value and a proposed result, but lacks the rules, and you then try to find them.

You may also say that inference is what you do, and reasoning is why you do it.

An even other description is that inference is operation on information, while reasoning is operations on knowledge.

Just me ranting. Jeblad (talk) 21:51, 17 December 2019 (UTC)[reply]

One more; the page seems to be more about constraints on inference, than reasoning. Perhaps just me, coming from another background. Jeblad (talk) 11:51, 5 January 2020 (UTC)[reply]

Vector representation[edit]

For some use, that is neural nets, it might be interesting to have a vector representation of the subject and value. Learning such a representation is pretty hard, and in some cases (most of them) a property even create vectors with rank >> 1. We might hope that most of the properties can be represented by the datatype, and adapted during training, but I'm not quite sure that would be the case.

Given that it is a big undertaking to recreate Wikidata as a vector map, it could be wise to do this with some official support. It consists of at least two parts; first is to create the necessary sparse representation (it is a programming problem), and second is to train the representation so it can be represented on a vector form (it is a processing problem).

As a property more often than not will have a rank >> 1, we will end up with vectors with rank >> 20k, thus the vector map must be partitioned somehow. We should probably make different partitions available for download, both partitioned on vector length (aka included properties) and map location (aka subjects). Jeblad (talk) 22:14, 17 December 2019 (UTC)[reply]

Wrote a little at Wikidata:WikiProject Reasoning/Vector representation. Jeblad (talk) 01:04, 18 December 2019 (UTC)[reply]
Removed the page. It is unlikely to spark any interest in its current form. If anyone is interested, then contact me. Jeblad (talk) 10:59, 5 January 2020 (UTC)[reply]

bad example of non-symmetric qualifer[edit]

From the project:

Moreover, there are cases of symmetric relationships where some qualifiers are not symmetric (should not be copied to the inferred statement), as in the case of diplomatic relation (P530), which uses a non-symmtetric qualifier diplomatic mission sent (P531) to specify the embassy of the source item in the country of the target item. Clearly, just copying all quantifiers for symmetric properties would not work either.

But this doesn't follow. There is no reason why diplomatic relation from Russia to America couldn't have a diplomatic mission sent being the US embassy in Moscow, either along with the Russian embassy in Washington, or even on its own. -- Peter F. Patel-Schneider 18:55 UTC, 6 August 2020‎

The 'on it's own' part is why we say there is asymmetry; the existence of a delegation from state A to state B doesn't imply the existence of a delegation from state B to state A. And it certainly doesn't point us to the item page for said delegation, which we probably want to see who was on it at what times, and so on. Arlo Barnes (talk) 14:39, 5 October 2020 (UTC)[reply]

measuring language dependence in modelling[edit]

I'm not sure how to go about it, but it would be interesting to see if a property (for example) has slightly different usages by editors editing Wikidata in primarily different languages from one another, which one could gauge based on which label fields are edited. I've often thought that a weak point of the data model is that the items and relations between them are supposed to be represented structurally, being the link network and the 'flavour' of links (which properties are doing the linking, statement ranks and qualifiers, and so on), but most editors are going by the linguistic descriptions of items and properties to understand them in order to edit them, rather than the structural relations (which might not exist yet, since the editors are editing for a reason). Arlo Barnes (talk) 14:45, 5 October 2020 (UTC)[reply]

This is and always will be a problem. It can be ameliorated by:
  1. Making labels and descriptions as unambiguous as possible
  2. When new items and properties are created, adding statements and constraints to limit differences of interpretation before lots of language labels and descriptions are added
  3. If there is any change to intended use, e.g. a change of scope of a property, making sure it is propagated to all languages for which there is a label or description. When in doubt, IMO, a blank label or description is better than one that doesn't align with the other languages; it invites replacement. Swpb (talk) 18:25, 6 October 2020 (UTC)[reply]