Wikidata talk:WikiProject Chemistry/Archive/2014

From Wikidata
Jump to navigation Jump to search

Collaboration with PubChem

While visiting NCBI recently to discuss ways in which they could collaborate with the Wikimedia community (see my notes), the idea came up to explore specifically how their database PubChem might fit with Wikidata. This has been discussed in an initial meeting with PubChem yesterday, in which they did indeed express an interest in finding out what Wikidata might offer to them, what kind of information we might be wishing to get from their site, and possibly in how well the information in their database matches with what we have (including on Wikipedia). They are working on exposing their data via RDF (scheduled release is in January; preliminary site is here) and open to inquiries, suggestions or other forms of feedback from the Wikidata community, including on the vocabulary they used and why. For a start, I'd suggest to collect such feedback here. I have also posted to the Wikidata mailing list. --Daniel Mietchen (talk) 06:18, 5 December 2013 (UTC)

@Daniel Mietchen: Thank you for your proposition. I was just thinking about an initiative in order to import the PubChem data in Wikidata , see ChemID initiative. The main purpose is to collect data from the different free databases and to match the corresponding chemicals between the databases in order to create an unique list of all data available from thoses databases.
Right now I am afraid we can't propose something to PubChem: we have first to match the Q items of our chemicals with PubChem ID. Then we can propose this list to Pubchem in order to allow them to create a link from their chemical pages to the corresponding item in Wikidata: this will give them access to the future data for each chemical in Wikidata. Snipre (talk) 18:34, 5 December 2013 (UTC)
@Snipre: Sorry if this is a dumb question, I am new to Wikidata. How do you get the Q-numbers from the Wikipedia articles with chembox templates? I just viewed the source of w:Methane, for example, and don't see any cross-reference from there to wikidata. Klortho (talk) 04:23, 9 December 2013 (UTC)
@Klortho: There is no direct way to find all those articles. There are 9348 transclusions (https://en.wikipedia.org/w/index.php?title=Template:Chembox&action=info#mw-pageinfo-transclusions). Once we gather all the identifiers from those infoboxes, a query (e.g. http://208.80.153.172/wdq/?q=claim[662]) would show all the q-items. --Tobias1984 (talk) 08:53, 9 December 2013 (UTC)
@Klortho:@Snipre:@Tobias1984: Wait, there _must_ be a way to get the Wikipedia-to-Wikidata mappings, right? (Embarrassed, I should know the answer from our similar effort on human genes and proteins...) But from w:Methane, the left-hand nav bar --> Languages --> "Edit links" clearly links to methane (Q37129), right? Despite my incredulity, in my experience Tobias1984 usually ends up being right about these things... Andrew Su (talk) 02:05, 10 December 2013 (UTC)
https://www.wikidata.org/wiki/Special:ItemByTitle?site=enwiki&page=Methane&submit=Search redirects to methane (Q37129). Klortho (talk) 07:15, 23 December 2013 (UTC)
@Daniel Mietchen: I saw that you have a bot. perhaps can you have a look at that request which is the first step to collaborate with other databases. Snipre (talk) 14:06, 6 December 2013 (UTC)
This would be a great idea. I know some of the PubChem people personally, and although we've talked about working together I've usually had other things keeping me away. If we have a group of people committed to working on this, we should seize this opportunity now! I'm very busy with final exams right now, but in 10 days or so I'll be able to commit some serious time to it - let me know how I can best help. Thanks for taking the initiative! Walkerma (talk) 01:25, 8 December 2013 (UTC)
@Snipre: I would be interested in helping with the bot, but since we do not have a Wikidata Toolkit yet, someone else would have to take the technical lead. --Daniel Mietchen (talk) 01:38, 10 December 2013 (UTC)
@Daniel Mietchen: Hi, you don't need to do that directly in Wikidata but just extract the data like here and we will work from that. By the way if you have contact with PubChem guys, perhaps can you ask them how they get the agreement from chEBI, CHEMBL and KEGG databases to import some of their data into PubChem database. They are some uncompatibilities between the licences. Snipre (talk) 17:58, 23 December 2013 (UTC)
@Daniel Mietchen: I enthusiastically support this idea. Scanning the PubChem record on methane, I think the identifier and descriptor mappings are no-brainers, as are the physiochemical properties. If we can figure out the links to other Wikidata entries based on the "Biomolecular Interactions and Pathways" section, I think that would be awesome. However, we should _not_ attempt to import all of the data in the "Biological Test Results" section. That is beyond the scope of what I think Wikidata should be (but obviously that's up to the community to decide). More generally, I think the rate-limiting factor in getting this done is developer time. There's probably some relevant code in our WikiDataGeneBot repository, but we're still looking for someone to maintain/develop it full time as well... Cheers, Andrew Su (talk) 02:05, 10 December 2013 (UTC)
@Andrew Su: Due to licence compatibility we can't import third part data from PubChem. Right now we can only import data like SMILES, InChI, InChIKey, formula and CID. Snipre (talk) 18:01, 23 December 2013 (UTC)
I just had a good Skype discussion with someone from PubChem about working together, in a similar way to how w:WP:CHEM worked with CAS and ChemSpider to check IDs, and then to look at what data can be shared. I agree that the licence compatibility is an issue, but it seems PubChem keeps good track of provenance so we could perhaps select from sources that share data openly, or perhaps use data as part of a validation program. His concern was that PubChem just has so much data - changes run to terabytes per week - so we need to be able to be selective for just the data we need.
I think we also need to feed data INTO PubChem - the data should flow both ways. We've discussed how this might be done, and I'll be sending over a template Excel-type file for him to look at. We'll start with comparing identifiers, and maybe we can grow things from there to include physical properties. During this transition period it's going to involve people from both Wikidata and Please let me know your thoughts. I'll also cross post on WP:CHEM on the English Wikipedia. Thanks, Walkerma (talk) 17:04, 25 March 2014 (UTC)
@Walkerma: I already started something like that: see Wikidata:WikiProject_Chemistry/ChemID and for the excel file see that. The list of chemicals in the excel files correspond to the chemicals in the WP:fr. Perhaps you can do the same for WP:en. I know that WP:en has an Excel file with identifiers. The only thing to do is to add to each chemical on that list the Q number of wikidata. Snipre (talk) 19:49, 25 March 2014 (UTC)
And about what we can import from PubChem are th InChI, InChIKey, Smiles, PubChem CID and IUPAC name. Ifyou already privide that data to all chemicals in WD, we will reach a good objective. Snipre (talk) 19:53, 25 March 2014 (UTC)
Thanks! I was following the ChemID project, and was hoping it would form part of that, but I hadn't seen your Excel sheet! That's perfect! What I think we need to do is to combine the English WP data with this one, then we can share it with PubChem. On the English WP we had a validation project to ensure that the above were correct (all except PubChemID and maybe SMILES), and you can see it is patrolled by bot (if someone vandalises data we indicate it with a red X). Many thanks! Walkerma (talk) 04:35, 26 March 2014 (UTC)
The best thing will be to have a third list of chemical with PubChem CID and Q number from a third wp (typically the german one) and then we can perform a comparison analysis to finally obtain a final list.
If you can get the english list of chemical and put it in a public server, please put the address in the Chem ID initiative page under the "Progress" paragraph. Snipre (talk) 10:26, 26 March 2014 (UTC)
I got in touch with the german WP to obtain the list of PubChem ID with the Q number for chemicals (a bot request was created). I got in touch with Beetstra in the WP:en to see if it is possible to get the english list of identifiers. Snipre (talk) 06:52, 27 March 2014 (UTC)
@Walkerma: I got the list of articles with CAS number, PubChem CID and Q number for WP:de and WP:en, see fr:Utilisateur:Snipre/Infobox Chimie/en and fr:Utilisateur:Snipre/Infobox Chimie/de. The french list is here. I have no time to start the analysis now but if someone wants to work in them feel free. Snipre (talk) 19:24, 9 April 2014 (UTC)

Constraint violations

Finally we have some good constraint violations for chemicals. I already went through part of the list:

Hi, Tobias1984, can you help me figure out what the table of "unique value" constraint violations means? I would think that it "unique value" means that at most one item on Wikidata is allowed to have a particular value for the PubChem CID (P662) property. So, that would mean that for any of the items on this list, the value for this property must be duplicated somewhere, right? But consider, for example, lavendamycin (Q1808882), which is given a value of "100585". If my understanding is correct, then there must be another (at least one) Wikidata item with this same value for this property? But, I'd expect that other item to also show up on this list, but searching for "100585", it only shows up once. So, I am confused. What am I missing? Thanks! Klortho (talk) 20:28, 25 January 2014 (UTC)
Hi @Klortho:. You are right. Unique value means only one item can have the same string. The reason why your example doesn't have a second item is because I already merged that pair. But I didn't merge this pair yet: benzoyl peroxide (Q411424) and benzoyl peroxide (Q15633266). The important thing is that we merge into the lower Q-number and list the other item for deletion. See Help:Merge. --Tobias1984 (talk) 23:26, 25 January 2014 (UTC)
Ah, I missed this in the header, "Some may already be fixed since the last update". Thanks! Klortho (talk) 03:16, 26 January 2014 (UTC)
Wikidata:Database_reports/Constraint_violations/P231 I went through the CAS-ID. Lots of Russian pages that are not connected to the rest of the wiki-world. Some duplicates are also from copy-pasted infoboxes where one of the IDs wasn't updated. We should make a habit of it to try to also correct the value on the respective Wikipedia. At least until a bot can do that on a regular basis. --Tobias1984 (talk) 09:59, 27 January 2014 (UTC)

Germanium subclass tree

I was working on the classification of Germanium compounds and isotopes a bit. What do you think of this structure:

Most of the subdivisions are also present in the Wikipedia categories. If you find this satisfactory we could model the rest of the chemical compounds in a similar way. --Tobias1984 (talk) 22:50, 14 February 2014 (UTC)

I'm hesitant to agree that compounds are subclasses of a substance. It seems to me that a compound would better be modeled as having the components (relation: has part) of that element.
Maybe I'm crazy though. :) --Izno (talk) 00:50, 15 February 2014 (UTC)
You're not crazy :) I'm still thinking if I made the right choice with the isotopes being a subclass of the element. - The tree also splits in germanium compound (Q15727447) and goes to Germanium and to "chemical compound". We could also remove the link to Germanium. The tree for "chemical compound" looks like this:

http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q11173&rp=279&lang=en&method=list

Currently is is 99 % incomplete though, because it only has the minerals and the germanium compounds. --Tobias1984 (talk) 10:26, 15 February 2014 (UTC)

I would say no: if this structure can not be applied to all chemicals this is not interesting form classification point of view. Instead of this imported classification new properties in order to describe element composition. And we can do the same for functions. All other classifications will be too complex to be used by contributors without a deep knowledge of it.
But for isotope I agree. Snipre (talk) 10:46, 15 February 2014 (UTC)
The subdivisions of germanium compounds into germanes, organogermanium-compounds and germanates are pretty standard. There might be some more obscure classes for germanium-compounds which we can still debate. - Why do you think that we can't apply this to all compounds? --Tobias1984 (talk) 11:36, 15 February 2014 (UTC)

(editconflict)

⟨ isotope of germanium (Q2288723)  View with Reasonator View with SQID ⟩ subclass of (P279) View with SQID ⟨ germanium (Q867)  View with Reasonator View with SQID ⟩

Not sure I agree, I would better see germanium (Q867) as a class of classes. germanium-73 (Q2437511) is for sure a subclass of germanium (Q867) as a germanium73 atom is for sure a germanium atom, it's more dubious that he is (all alone) a germanium isotope. It make more sense to mark germanium (Q867) as

⟨ isotope of germanium (Q2288723)  View with Reasonator View with SQID ⟩ subclass of (P279) View with SQID ⟨ germanium (Q867)  View with Reasonator View with SQID ⟩

 : <germanium isotopes> is the class of all classes which regroups isotopes atoms which have the same numbers of neutrons. TomT0m (talk) 12:14, 15 February 2014 (UTC)

Sounds good. Please make the changes. We should try to find a good tree for this element and model the other elements accordingly. --Tobias1984 (talk) 12:23, 15 February 2014 (UTC)
OK, here is my attempt :
  • My bad, I got mixed up in those <censorded>*–$"</censorded> Q numbers. So, I leave it (
I would also add
⟨ germanium (Q867)  View with Reasonator View with SQID ⟩ subclass of (P279) View with SQID ⟨ atom (Q9121)  View with Reasonator View with SQID ⟩
  • The subdivisions on a textbook are not aligned to the subclass property meaning. The nuclide (Q15730548) item is maybe a little overkill but I created it for element to be an instance of: We have several levels : the individual atom level (0). We regroup atoms in classes like hydrogen (1): the hydrogen class is, as the germanium class, a class of atoms with the same atomic number, which we call elements (3). We also regroup atoms in other kind of classes we call isotopes. Finally isotopes and elements are units we use to class atom interesting sets (4)
plus as obviously the isotopes of germanium are a special kind of isotopes that share a property. TomT0m (talk) 16:34, 15 February 2014 (UTC)
We have to exclude "isotope of XXX" from the classification tree: it is a useless concept. Better create for each element a general item for the element and consider all isotopes of the element as subclasses. We have to be more systematic than wikipedia articles and even if it exists some articles for some isotopes you don't need to use this classification. And each isotope item has to be defined as "subclass of": "isotope" and all general element items as "subclass of": "element". I don't understand the need of the nuclide (Q15730548): we can define "chemical element" as "subclass of": "atom". As we are speaking about concept and never about specific atom "instance of" is not relevant. Snipre (talk) 17:09, 15 February 2014 (UTC)
Your tree is included in my suggestion of ontology. Apart from that I don't see how expressing a little more is useless, we're in a project whose pirpose is to structure datas, let's stucture datas. Plus it's difficult to make something more systematic than this. The „isotope of X” items are for example useful with simple queries. I'm not aware of other kinds of atom subclasses but we can't exclude there is, let's build something robust, a little bit of redundancy does not harm. For the most abstract item, I think it's not a bad habit to try to put an instance of property on every item. Wikidata, in general, will have to class things according to several ponts of view, this item can be a help for that as it's an entry point for querying the standard classes of atom in chemistry. But I went a little far, we're still in experiment time :) Concept classification is imho a very important feature in a project who aims to represent the sum of all knowledge (no less than that … :) ) TomT0m (talk) 17:39, 15 February 2014 (UTC)
For And each isotope item has to be defined as
⟨ isotope item ⟩ subclass of (P279) View with SQID ⟨ "isotope" ⟩
No, this would mean that an atom item (an instance of an isotope item) is also an isotope item (hence a class of atoms,) which does not make sense.
⟨ isotope item ⟩ instance of (P31) View with SQID ⟨ isotope ⟩
, which means it is a member of the set of all isotopes. TomT0m (talk) 18:07, 15 February 2014 (UTC)
@TomT0m: The "isotope of X" can be expressed as a double queries: item defined as "subclass of": "isotope" and as "subclass of": "X". So as the query can do the job why do we want to complexify our classification tree ? For me "isotope of X" item is the same as "X" item because all atoms of an element are isotope. So your proposition nixes classification and query. And why do we need extr levels in the classification when no need is defined ? For me nuclide (Q15730548) and isotope of germanium (Q2288723) are good examples of useless levels in classification: we have to create levels and branches when needed not because we don't know. And experiments are not good idea because experiment means someone will have to clean up and cleaning is not always well done. If you really want to do experiment use the test server. Snipre (talk) 18:25, 15 February 2014 (UTC)
For the debate instance of vs. subclass of, I don't care but if you really want a clear definition of the diference speak with user:Emw about the semantic standards: according to Emw instance of should be used only a specific isotope you can trace at position x at time t, so one atome you follow and for each you can give the position at any time. A labelled atom like the dog of your aunt which has a name and is clearly identified along many other dogs of the same specie. Snipre (talk) 18:32, 15 February 2014 (UTC)
(edit conflict)Mixing classification and query does not make sense. As said in another place, if a class has the same instances (claims) that the result of a query is supposed to return, it's an opportunity to add a consistency check, so not necessarily a bad thing. For the definition, I'll quote french Wikipedia : Un élément chimique désigne l'ensemble des atomes caractérisés par un nombre défini de protons dans leur noyau atomique. It appears to exactly match he definition we have. We can define isotopes the same way, and we should to keep things consistent and rigorous. And soucable, otherwise this is a POV. TomT0m (talk) 18:41, 15 February 2014 (UTC)
Emw changed his mind, OWL2 allows to use a class item to be an instance of another class throw Punning, he was referring to an old version of the standard in which this was possible but would have made query undecidable. This allows to class classes cleanly, which is fortunate. TomT0m (talk) 18:47, 15 February 2014 (UTC)
So in summary the two classifications according to instance of/subclass relations between germanium-73 and atom are:
1) germanium-73 -> germanium -> chemical element -> atom
2) germanium-73 -> isotope of germanium -> germanium -> chemical element -> atom kind class -> atom
For the simplicity it is clear which version is the best, redundancy and checking are not necessary if data import is well organized with a bot in order to do the classification in one step and in a very short time. For me including already now some checking structure is stupid because who is doing that checking and according to which format ? If we have to do something it is according the current state of the tools and of the wikidata organization because nobody can say how a check system will work in the future. Perhaps all the things proposed here won't match the future specifications so this is again useless at that point. Snipre (talk) 19:23, 15 February 2014 (UTC)
I don't understand your arrows. The relations make sense independently from the need of redundancies or not and are really no big deal in this case. Robustness is important far after the initial import as it can help to spot errors or vandalism in editions. I don't understand which specifications you are talking about. TomT0m (talk) 19:53, 15 February 2014 (UTC)
One example of things for which the isotope item is interesting for right now, and I did not planned this : reasonator on this item
Personally, I would reject use of anything related to something like "atom kind class". We need to have a centralized discussion about things like that, as the only person I've seen pushing the point of view that that would be useful is TomTom. (For better or worse.) It still seems evident to me that it duplicates information already implicit in the P279 claim that something is a subclass of an element/atom. Additionally, I find it highly unlikely that we would find literature to use to specify those claims. I really think we should hold off on doing anything like that for now where it doesn't make sense, and right now, it doesn't feel like it makes sense here because it simply sounds wrong. --Izno (talk) 03:43, 16 February 2014 (UTC)
(For better or worse.) :) I did not looked much into that direction, but I would not be surprise we are touching here some kind of upper ontology concept. There already were some discussions about that on project chat, I'll dig this a little. TomT0m (talk) 10:19, 16 February 2014 (UTC)

Participants

Hi, I'd like to suggest that you add some more information to the Participants section, outlining the way the project works and how someone becomes a participant. I can guess that I could add my name to the list of participants but that in itself would be a really only change the length of the list and doesn't practically make me a participant. I'm happy to dive in and start discussions but others may not be. --The chemistds (talk) 16:47, 4 April 2014 (UTC)

@The chemistds: You are more than welcome to start any discussions about the chemistry-data here. Adding your name to the list has the advantage that we can ping all the participants to alert them about discussions which are not on anybodies watch list. Tobias1984 (talk) 17:29, 4 April 2014 (UTC)
@The chemistds: Done Snipre (talk) 18:10, 4 April 2014 (UTC)

Atomic composition

The description of the atomic composition of a molecule can be done using has part(s) (P527). See ethanol (Q153) as example. Snipre (talk) 08:55, 17 April 2014 (UTC)

Salt classification

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry: How can we classify salts ?

Two PubChem CIDs in cobalt(II) cyanide (Q2620039)

Which is correct?--GZWDer (talk) 09:41, 6 July 2014 (UTC)

For cobalt cyanide, both are correct in the PubChem database. Best is to contact the database to see which is the difference. Snipre (talk) 14:29, 7 July 2014 (UTC)

To-Do for elements

toollabs:ricordisamoa/period has been hugely updated, now it shows periods and groups correctly, with labels in the user's language. But, for many items, the software couldn't find the correct position: they are reported on the top of the page. Some of them are missing atomic number (P1086), some others subclass of (P279) with a valid item for a group. Please add correct and sourced statements. Thanks in advance, --Ricordisamoa 02:06, 12 August 2014 (UTC)

Thanks for the report: it helps to see the missing data. Snipre (talk) 08:04, 13 August 2014 (UTC)
Basic support for lanthanides etc. has been also implemented. --Ricordisamoa 20:14, 18 August 2014 (UTC)

Royal Society of Chemistry - Wikimedian in Residence

Hi folks,

I've just started work as w:Wikimedian in Residence at the w:Royal Society of Chemistry. Over the coming year, I'll be working with RSC staff and members, to help them to improve the coverage of chemistry-related topics in Wikipedia and sister projects.

You can keep track of progress at w:Wikipedia:GLAM/Royal Society of Chemistry, and use the talk page if you have any questions or suggestions.

How can I and the RSC support your work to improve Wikipedia? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:17, 24 September 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Pigsonthewing: Hello, thanks for your proposition. Some ideas:

  1. create the links between the ChemSpider IDs and the Wikidata Q numbers. I think it is a good think for the chemspider to not focus on WP:en only but to offer a link to all WPs through the WD Q number instead.
  2. offer support to create an ontology about chemistry based on the instance of, subclass, part of properties in order to generate a structure between the chemical definitions.
  3. share some data about chemical properties (when numeric datatype with unit will be available)
  4. take part to the matching of all different IDs used to identify chemicals.
  5. for WP articles, propose a generic template for chemical article.
  6. help to develop fundamental articles like chemistry, iupac nomenclature,...
  7. export biographic data from the RCS database to WD.

We need expertise of some persons working in chemistry field having the possibility to provide data or information which are beyond the access or knowledge of Mister X. I know that the WP system is not the best one for expert because you can see your contributions modified by any user without having the possibility to oppose your expertise. Perhaps start to create a global account for the RCS staff in order to allow a better recognition of other contributors. Snipre (talk) 16:57, 25 September 2014 (UTC)

Thanks, Snipre - I've numbered your points for ease of response. #1 I've already suggested, and #4 is under discussion. #5 is en:Template:Chembox, surely? Or have I misunderstood? #7 would probably not be allowed, under UK data protection law. I'll discuss the rest with my new colleagues. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:24, 25 September 2014 (UTC)
For #5 I think more about an evaluation of the current structure of the article but of the chembox too from data visualization point of view: can a scientist use the data presented in WP articles or some important informations are missing (conditions of measurement, references,...) ?
From my perspective #2 is the more challenging and interesting and some scientists are working in the field of ontology I think this will help us to find examples in the literature or to develop through discussions. Snipre (talk) 08:52, 26 September 2014 (UTC)


2. offer support to create an ontology about chemistry based on the instance of, subclass, part of properties in order to generate a structure between the chemical definitions.
Andy, are you familiar with ChEBI? It is the most widely used chemistry ontology. Background papers are here, here and here. ChEBI is developed by the European Bioinformatics Institute and, to my knowledge, endorsed and used by The Royal Society of Chemistry. I think any proper Wikidata chemistry ontology development should have as a requirement straightforward interoperability with ChEBI.
Colin Batchelor at RSC does a lot of ontology development and comments on ChEBI mailing lists, so I imagine he would be helpful to consult on developing a Wikidata chemistry ontology that's interoperable with major existing ontologies in this domain. Emw (talk) 12:38, 27 September 2014 (UTC)

Ontology

@Pigsonthewing, Emw: The ChEbI ontology is a specialized one: we have to keep the relation as simple as possible to ensure the use of a common set of properties for whole wikidata. Currently we have only "instance of", "subclass of" and "part of" relations and even if we can create more relation/properties, too complex structure we will be difficult to maintain and to use for new/occasional contributors. Snipre (talk) 12:00, 29 September 2014 (UTC)

I have filled in the table with some ChEBI-Wikidata mappings. I'll elaborate tomorrow. Emw (talk) 04:03, 30 September 2014 (UTC)
ChEBI relation Wikidata property Example In ChEBI Note
- instance of (P31) None in ChEBI ChEBI follows the practice of BFO- and RO-based ontologies and does not include instances, e.g. "this molecule of ethanol in a bottle" -- just like Wikidata w.r.t. chemical entities. See 'Background' in Relations in Biomedical Ontologies (RO), Smith et al. 2005.
is_a subclass of (P279) oxygen-18 (Q662269) subclass of oxygen (Q629) http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3a33815 "A has_subclass B = [definition] B is_a A.", 'Discussion' in RO. Note how all includes of 'is_a' are replaced by rdfs:subClassOf in chebi.owl (warning, big). P279 is mapped to rdfs:subClassOf on Wikidata.
part_of part of (P361) water (Q283) part of hydrate (Q462174) http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A35505 See 'Part_of' section in RO
has_part has part(s) (P527) Inverse of above. See above. 'Discussion' in RO
has_role None yet, see Wikidata:Property_proposal/Archive/25#has_role Example
is_conjugate_acid_of (and is_conjugate_base_of) ? Example
has_functional_parent ? Example
is_substituent_group_from part of Example
Example Example Example
Regarding the notion that ChEBI is "specialized", it's worth noting that ChEBI's domain is chemical entities of biological interest. This includes many types of subatomic particles, minerals, mixtures, pharmaceutical compounds, biological macromolecules, and more. So it is a domain ontology, but that domain is fairly general. ChEBI also has far fewer properties (i.e. relations) than Wikidata, even if we consider only chemistry properties. Notably, ChEBI's main properties -- is_a, part_of and has_part -- already exist on Wikidata as subclass of (P279), part of (P361) and has part(s) (P527).
So I don't think extra complexity will be a major issue in the effort to ensure straightforward compatibility with ChEBI. The issue will be adjusting usage of instance of (P31) and subclass of (P279) on Wikidata to be compatible with that in ChEBI and other scientific Semantic Web ontologies, e.g. by replacing virtually all statements like "instance of (P31) chemical compound" with statements like "subclass of (P279) chemical compound" (or some subclass of chemical compound). Emw (talk) 01:12, 2 October 2014 (UTC)
@Emw: By specialized I was meaning that ChEBI is working for chemistry but not for persons, plants,... Wikidata is larger that ChEBI so the ontology of WD should be applicable to the different fields of WD in order to avoid particular classifications according to fields. Contributors should be able to work in any fields without having to learn different systems. So classification of chemicals and others chemistry subjects in WD should by related more with other classification schemes used on WD that with ChEBI. WD is not a mirror of ChEBI so we have no interest just to copy ChEBI structure even if this is an authority in its field. WD is more a result of popularization of chemistry than a advanced classification system in a specific science. So ChEBI is not a reference just an example. Snipre (talk) 11:42, 10 October 2014 (UTC)
Snipre, I encourage you to read Relations in Biomedical Ontologies. The ontological relations described there -- instance_of, is_a and part_of -- align with generic properties widely used on Wikidata as outlined in the table above. They apply to persons, plants and beyond -- and also to chemical compounds. Thus, while ChEBI is specialized to the domain of chemistry, the foundational properties (i.e. relations) it uses are applicable to all domains of knowledge.
The problem here is that WikiProject Chemistry is using an idiosyncratic definition of "instance" which makes this project incompatible with not only the world's most widely used chemistry ontology, but also other reference domain ontologies based on the Relation Ontology -- e.g. Gene Ontology, Disease Ontology, Plant Ontology, etc. Emw (talk) 22:18, 27 October 2014 (UTC)

Relevant discussion on wikidata-l

Please see the discussion at https://lists.wikimedia.org/pipermail/wikidata-l/2014-September/004641.html and https://lists.wikimedia.org/pipermail/wikidata-l/2014-October/004682.html. It will likely affect how chemical compounds are classified on Wikidata. Thanks, Emw (talk) 12:54, 8 October 2014 (UTC)

Again I found no reason to compare Porsche 356 and ethanol: you can always define an instance of a Porsche 356 by some unique characteristics ( chassis number, events where the car was used, famous owners or not,...) but you can't do the same for a molecule of ethanol. You can describe some energy states of a molecule, its position but if you stop to follow it one moment and you try again to find it again you won't be able to find it again because a molecule can't have its own specific properties, that's a scientific rule. So again comparing a car model with a chemical is not correct. That why we don't have a "chemical model" or "chemical compound type" because this concept is not necessary in chemistry: doing the difference between ethanol as chemical and ethanol as molecule is a non-sense from a properties point of view.
Here we reach the heart of the problem: what is the definition of an "instance". For me an instance is an entity which has its own characteristics over the time. A Porsche 356 can have some unique characteristics over the time like its chassis number. But not a molecule of ethanol: you can specify its position or its energy level but this is not characteristic of this molecule (other molecules can have the same energy level and can have the same position at a different time. It is the same like creating different items for a country because its population is changing over the time. The specific population is not a characteristic of the country like the position or the energy level is not a characteristic of a molecule of ethanol. So even if its is possible to create an item for one molecule, this is not making sense because you can't provide unique characteristics for this molecule. Snipre (talk) 13:12, 10 October 2014 (UTC)
ChEBI subscribes to the Relation Ontology, which uses the definition of instance from the Basic Formal Ontology:
Instances are individuals (particulars, tokens) of special sorts. Thus each is a simply located entity, bound to a specific (normally topologically connected) location in space and time.
Thus, even if they are identical in every respect except location, two molecules of ethanol are indeed instances. The fact that they are merely spatiotemporally distinguishable is sufficient to make them instances of the class (aka universal, type) ethanol. This is an established practice in not only ChEBI, the world's most popular chemistry ontology, but also ontologies in many other domains of knowledge. You make rather bold claims ("...it's a scientific rule") without citing any scientific or ontological sources. Please provide some relevant literature or, even better, existing Semantic Web ontologies that support your idiosyncratic definition of instance.
Even though spatiotemporal distinguishabilty alone is enough to make something an instance per ChEBI, RO, BFO and a wide raft of established philosophy, there are other properties that can make two entities of ethanol distinguishable. For example, one ethanol molecule could have a different isotopic composition -- one molecule could have its hydrogens in the form of deuterium (hydrogen-2, i.e. be deuterated ethanol (Q1101193)) and the other could have its hydrogen in the more familiar form of protium (hydrogen-1). This is a prima facie argument that ethanol is a subclass of, not an instance of, chemical compound. Emw (talk) 12:51, 27 October 2014 (UTC)

Two new properties

@Pigsonthewing: Gmelin number (P1578) and Reaxys registry number (P1579) are ready. -Tobias1984 (talk) 17:00, 26 October 2014 (UTC)

Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:54, 26 October 2014 (UTC)

Launch of WikiProject Wikidata for research

Hi, this is to let you know that we've launched WikiProject Wikidata for research in order to stimulate a closer interaction between Wikidata and research, both on a technical and a community level. As a first activity, we are drafting a research proposal on the matter (cf. blog post). It would be great if you would see room for interaction! Thanks, --Daniel Mietchen (talk) 01:39, 9 December 2014 (UTC)

Precision in identifier mapping

The following is from an email discussion with Gang Fu of PubChem: "The challenge would be that different data sources have different chemical structural representation for the same drug ingredient. For instance, the following PubChem compounds were mapped to drug ingredient 'ETHAMBUTOL (NDFRT: N0000147838)':

My question now is how we should map that. To get things going, I have added all of these PubChem IDs to ethambutol (Q412318). I think most of these should be split off into separate items eventually, so I'd like to invite comments as to the granularity and thus precision we should be aiming at. --Daniel Mietchen (talk) 08:56, 17 December 2014 (UTC)

@Daniel Mietchen: The project is not so active to have reached a high definition level of the data structure. But in my opinion we have to create one item for each stereoisomer and each salt. We have to create even specific items for mixture of stereoisomers. So for dichloroethene we will have 3 items: one for the mixture, one for the Z and one for the E form. This represents a hug amount of item but we don't need to import all possible molecules in once and we have to concentrate for the most important molecules first. But for the case of stereoisomers, once one molecule is added, we should add the whole family in order to be sure that data can be added at the right place. Snipre (talk) 09:19, 17 December 2014 (UTC)

Free 'RSC Gold' accounts

I am pleased to announce, as Wikimedian in Residence at Royal Society of Chemistry (Q905549), the donation of 100 "RSC Gold" accounts, for use by editors wishing to use RSC journal content to expand articles/ items on chemistry-related topics. Please visit en:Wikipedia:RSC Gold for details, to check your eligibility, and to request an account. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:28, 18 December 2014 (UTC)