Wikidata talk:WikiProject Chemistry/Archive/2020
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Dagstuhl 2020
Hi all, next week I'm attending a Dagstuhl meeting (see https://en.wikipedia.org/wiki/Dagstuhl#Seminar_series) around metabolomics. Several people from chemistry databases will be there, like Evan E. Bolton (Q28194918) (PubChem (Q278487)) and David Wishart (Q27887604) (Human Metabolome Database (Q5937262)). Do we have open questions we want to ask them? Please add them here as comments.
Notified participants of WikiProject Chemistry --Egon Willighagen (talk) 13:48, 25 January 2020 (UTC)
Introduction round
Dear all, User:Wiljes and myself would like to introduce us, our backgrounds, and interests on how to contribute to Wikidata in the context of the WikiProject Chemistry. Would you all be fine with doing that on a separate page (e.g. Wikidata:WikiProject_Chemistry/Who we are, linked out from the main page, i.e. Wikidata:WikiProject_Chemistry? Would you also be interested in that? Thanks for your opinion! --Robert Giessmann (talk) 11:20, 17 June 2020 (UTC)
Notified participants of WikiProject Chemistry
- @Rgiessmann: Well it's probably best to introduce your background etc. on your user page here - I see User:Wiljes has done that nicely. You could also link to your Scholia page there, if you have one, which would provide links to your publications etc. As to "interests on how to contribute (in this context)" I think just another post right here would be fine, no? I don't think we need another page to check! ArthurPSmith (talk) 18:59, 17 June 2020 (UTC)
- @Rgiessmann: and you can also add your ORCID to your profile page, as I have done on my user page. --Egon Willighagen (talk) 10:03, 26 July 2020 (UTC)
Difference between CAS numbers (bis)
@Snipre: I answered the 7 cases you highlighted. Do you agree with my solutions? --SCIdude (talk) 14:03, 17 July 2020 (UTC)
- @SCIdude: Thank you for the information but I don't have the time to check your answers now. Let me one week. regards Snipre (talk) 20:20, 20 July 2020 (UTC)
- Thanks for looking and doing the work. I think we mostly agree on how to resolve things, for which there are sometimes different ways. --SCIdude (talk) 04:24, 29 July 2020 (UTC)
CAS 28519-04-2 vs. CAS 7134-06-7
We have two items (2-hydroxy-5-methylbenzenesulfonic acid (Q27285095) and 2-hydroxy-5-methylbenzenesulfonic acid (Q72461715)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)
- No different compound. Presumably different source in CAS without checking for duplicate substance. --SCIdude (talk) 14:12, 15 July 2020 (UTC)
- Done Merge. Snipre (talk) 19:43, 28 July 2020 (UTC)
CAS 40102-60-1 vs. CAS 1439-07-2
We have two items (S-trans-stilbene oxide (Q27121652) and trans-stilbene oxide (Q72508941)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)
- No different compound. Presumably different source in CAS without checking for duplicate substance. P.S.: trans-stilbeneoxide would be a pair of enantiomers (unspec. InChi key) . Interestingly the 2D structure of PubChem is wrong, shows cis-. One can debate whether to change the second item to the pair. In any case the CAS structure does not match the name/synonyms. --SCIdude (talk) 14:21, 15 July 2020 (UTC)
- stilbene oxide (Q27121647), CAS 17619-97-5
- trans-stilbene oxide (Q72508941), CAS 1439-07-2
- S-trans-stilbene oxide (Q27121652), CAS 40102-60-1
- (2R,3R)-stilbene oxide (Q97787600), CAS 25144-18-7
- cis-stilbene oxide (Q27121646), CAS 1689-71-0
- trans-stilbene oxide (Q72508941), CAS 1439-07-2
- stilbene oxide (Q27121647), CAS 17619-97-5
- Snipre (talk) 20:56, 28 July 2020 (UTC)
- Done Delete InChIKey Snipre (talk) 21:12, 28 July 2020 (UTC)
CAS 64047-16-1 vs. CAS 6588-17-6
We have two items (sodium 5-heptyl-5-methyl-2-oxo-2,5-dihydro-1,3-oxazol-4-olate (Q82968869) and sodium 5-heptyl-5-methyl-2-oxo-2,5-dihydro-1,3-oxazol-4-olate (Q82543970)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)
- The second CAS was obsoleted. Easy merge. --SCIdude (talk) 14:25, 15 July 2020 (UTC)
CAS 13455-34-0 vs. CAS 60459-08-7
We have two items (cobalt sulfate monohydrate (Q27263112) and cobalt sulfate x hydrate (Q72509228)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)
- The second CAS was obsoleted. Easy merge. --SCIdude (talk) 14:25, 15 July 2020 (UTC)
- Done Snipre (talk) 19:27, 28 July 2020 (UTC)
CAS 103-26-4 vs. CAS 1754-62-7
We have two items (methyl cinnamate (Q204178) and (E)-methyl cinnamate (Q72460898)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 20:28, 14 July 2020 (UTC)
- The first does not specify cis-/trans- anywhere, also not in synonyms. So again, two possibilities: a pair of cis,trans isomers with a wrong structure (then the CAS would have to be moved to a different item), or a duplicate. I tend to merge these cases because I think they (CAS) are too stupid to define a pair of cis,trans isomers. --SCIdude (talk) 14:31, 15 July 2020 (UTC)
- Done Snipre (talk) 21:52, 29 July 2020 (UTC)
CAS 1701-77-5 vs. CAS 7021-09-2
We have two items (alpha-methoxyphenylacetic acid (Q72517465) and alpha-methoxyphenylacetic acid (Q27283784)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 20:32, 14 July 2020 (UTC)
- The first CAS redirects to the second, i.e.: The first CAS was obsoleted. Easy merge. --SCIdude (talk) 14:25, 15 July 2020 (UTC)
- @SCIdude: Be careful: Dortmund data bank considers 1701-77-5 as the racemate. See here. Snipre (talk) 19:20, 3 August 2020 (UTC)
CAS 36393-56-3 vs. CAS 37577-07-4
We have two items ((±)-norpseudoephedrin (Q59628358) and (–)-norpseudoephedrine (Q6456100)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 20:39, 14 July 2020 (UTC)
- Both CAS have different keys. The first CAS key does not match the key of the item. So, the CAS needs to be moved to a different item, or the item key needs a different item. Actually, there is Q423797 where the CAS should be moved to. --SCIdude (talk) 14:47, 15 July 2020 (UTC)
- @SCIdude: Soory I don't understand why you propose to move 36393-56-3 to Q423797 as the CAS number for the item is 492-39-7.
- We have the following structure:
- phenylpropanolamine (Q97786582), CAS ?
- (±)-norephedrine (Q59627745), CAS 14838-15-4
- (-)-norephedrine (Q26840801), CAS 492-41-1
- (+)-norephedrin (Q413147), CAS 37577-28-9
- (±)-norpseudoephedrin (Q59628358), CAS 36393-56-3
- (+)-norpseudoephedrine (Q423797), CAS 492-39-7
- (–)-norpseudoephedrine (Q6456100), CAS 37577-07-4
- (±)-norephedrine (Q59627745), CAS 14838-15-4
- phenylpropanolamine (Q97786582), CAS ?
- So I propose to delete the InChIKey of Q59628358 and to define it as mixture of stereoisomers of norpseudoephedrin. Snipre (talk) 20:20, 28 July 2020 (UTC)
- After the CAS is moved we have still two items with the same InChi key. The only further information in (±)-norpseudoephedrin (Q59628358) is the German label (±)-Norpseudoephedrin and the ECHA (which refers to CAS 36393-56-3 which we just moved so the ECHA can be moved there as well). It looks like there is no item for norpseudoephedrine (the pair of isomers) so this item could serve. If we do this the InChi and key should be removed. --SCIdude (talk) 15:08, 15 July 2020 (UTC)
- Done Move InChIKey. Snipre (talk) 20:42, 28 July 2020 (UTC)
Beware, GZWDer flooding DSSTOX compound ids
Thanks to creation of a property that User GZWDer requested he felt obliged to play with his bot and add this ID to chemicals. How he identifies them is anyones guess, so have a lookout for what is to be expected. --SCIdude (talk) 05:44, 6 August 2020 (UTC)
- @SCIdude: It was discussed in the proposal that his data source provides a link to the existing DSSTox substance ID (P3117), this seems perfectly reasonable. Adding identifiers should be fine if they are from a curated source - and in this case it doesn't involve adding any new items. ArthurPSmith (talk) 19:44, 6 August 2020 (UTC)
Upcoming: ChEBI completions
Towards a more complete ChEBI coverage the following steps can be identified:
- substances: add references to InChi keys identical with those in ChEBI---85,715 references to add
- resolve conflicts if keys not identical; add from ChEBI if missing---844 items with conflict, 71 items without InChi key (e.g. peptides)
- for any (substance P31 class) add reference if directly supported by ChEBI ("is_a")
- for any (class P279 class) add reference if directly supported by ChEBI ("is_a")
- ChEBI import completion, full class hierarchy
- ChEBI import completion, all substances (pretty good already)
- ChEBI check all substances are in their classes
I'm ready to do 1) now, which I guess will not be controversial but, please, fire away with any arguments that come to mind! --SCIdude (talk) 17:39, 7 September 2020 (UTC)
- @SCIdude: No we should have a more strict approach before performing mass data imports.
- Step 1: for all WD items having InChI, InChIKey and ChEBI ID properties, check if corresponding ChEBI entry in ChEBI database has the same InChI and InChIKey.
- If yes, check if all 3 properties have an reference to ChEBI, and add or update the reference to ChEBI database (using retrieve date, ChEBI ID property, stated in = ChEBI (Q902623), title properties)
- If no (InChI or InChIKey or both are different from values in ChEBI database), put WD items in a list for further manual check.
- Step 2: for all WD items having InChI and ChEBI ID properties, check if corresponding ChEBI entry in ChEBI database has the same InChI.
- If yes, check if both properties have an reference to ChEBI, add InChIKey from ChEBI database, and add or update the reference to ChEBI database (using retrieve date, ChEBI ID property, stated in = ChEBI (Q902623), title properties)
- If no (InChI is different from value in ChEBI database), put WD items in a list for further manual check.
- Step 3: for all WD items having InChIKey and ChEBI ID properties, check if corresponding ChEBI entry in ChEBI database has the same InChI.
- If yes, check if both properties have an reference to ChEBI, add InChI from ChEBI database, and add or update the reference to ChEBI database (using retrieve date, ChEBI ID property, stated in = ChEBI (Q902623), title properties)
- If no (InChIKey is different from value in ChEBI database), put WD items in a list for further manual check.
- Then no importation of ChEBI ontology using instance of (P31) or subclass of (P279): if we do the same with 2 or 3 other databases, this will be a mess to understand the definition of the item based on instance of (P31) and subclass of (P279). From my point of view, ChEBI ontology should stay in ChEBI database, first for copyright reason (ontology is definitively an original work), then because WD should be able to define its own ontology. The purpose of WD is not to integrate all internet information but rather to generate the link between different information sources. Finally ChEBI ontology can change with time, leading to synchronization problems. Snipre (talk) 19:01, 8 September 2020 (UTC)
- Query of all WD item about chemical having InChI, InChIKey and ChEBI ID properties: here
- Query of all WD item about chemical having InChI and ChEBI ID properties but no InChIKey property: here
- Query of all WD item about chemical having InChIKey and ChEBI ID properties but no InChI property: here
- Snipre (talk) 19:11, 8 September 2020 (UTC)
- @SCIdude: Please follow the recommendations for references: Help:Sources#Databases. If everyone is adding its own structure for reference data, there will no possibility to extract using a common tool the reference data for display in WP for example. Snipre (talk) 19:52, 8 September 2020 (UTC)
- The copyright issue is unresolved. We are not importing a whole database, and the border where "substantial part" begins needs to be specified legally. The same with common class terms that are used in the literature since decades. What if I just import the hierarchy but under each subclass statement put a reference to an article, which makes it common knowledge?
- I appreciate the steps given. Here is the list of items where the ChEBI InChi key differs from the InChi key given. I'm going through these right now, and I also submit issues with ChEBI on github, as you can see, based on infomation from articles. --SCIdude (talk) 07:29, 9 September 2020 (UTC)
Edits from University of Cambridge
I have noticed many chemistry-related edits from IP addresses which belong to University of Cambridge. 131.111.225.4 (talk • contribs • logs) and 131.111.114.157 (talk • contribs • logs) are a couple of examples. Most of the edits involve creating new items for various polyketides. Presumably, this is some type of ongoing class project. There are also quite a few new creations of listings for polyketides from new accounts - they create the account, start one new Item, then never edit again. These are probably also students involved in the same class project. The reason I'm bringing this up, is that many (maybe most?) of the new Item creations are poorly formed. Q59295080 is a recent example. In particular, many are conflating data for chemicals with data for scientific publications in which they are mentioned. They could definitely use some training and/or guidance. Any suggestions on how to handle this? Edgar181 (talk) 14:09, 4 December 2018 (UTC)
- I've noticed some items like this one and corrected it (niuhinone A (Q58118804), smenopyrone (Q57391881), (5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)), but I did not think that this may be some sort of a class project — but you are probably right and it may be connected to [1], [2] (cf. the last page). Honestly, I'm not a fan of any class projects involving Wikimedia, but we could try to contact professor Goodman and offer his students a help page (subpage of this wikiproject) with editing info related only to this field (i.e. how to properly add statements, which properties should be used and that scientific article and chemical compound should be separated). I can also create better SVG structures for these new items. Wostr (talk) 14:40, 4 December 2018 (UTC)
- I think you have correctly identified the class project that is involved. Maybe we can ask them, at the very least, to provide Wikidata with a list of items that they have already created and to update it with new ones as they are created so that they may be reviewed. Edgar181 (talk) 15:22, 4 December 2018 (UTC)
- I sent an email, I will see if I get an answer. Snipre (talk) 20:45, 4 December 2018 (UTC)
- I think you have correctly identified the class project that is involved. Maybe we can ask them, at the very least, to provide Wikidata with a list of items that they have already created and to update it with new ones as they are created so that they may be reviewed. Edgar181 (talk) 15:22, 4 December 2018 (UTC)
- If anyone wants to have a look, it appears that all of the last several thousand edits from the IP range 131.111.0.0/16 (search results) are related to this polyketide classwork. Edgar181 (talk) 15:42, 5 December 2018 (UTC)
- I'll be happy to help in reformatting these items if you wish, later in the month when I have more time. I think these data are a valuable addition into Wikidate, as they represent manually curated, real information direct from the literature; as such they are probably the only independent source of open data on these compounds on the Web. I'll work with Dr. Goodman as needed. Walkerma (talk) 11:24, 6 December 2018 (UTC)
- I'd be very happy to meet any of the people involved. This could be a good way of adding data to Wikidata. Petermr (talk) 13:05, 3 January 2019 (UTC)
- Copied from Archive/2019 as this is not done yet. Wostr (talk) 18:28, 25 September 2020 (UTC)
List of items
This is a list of chemistry-related articles edited from this IP /16 subnet (edit: and from many other accounts/IPs), excluding items about scientific papers, but including redirects, because target items may need some clean-up. I'll try to check and correct these items.
List of editors
- Accounts
- Ac2061
- Ac995
- Acannie (Commons)
- Aced125
- AdamBartlett (Commons only)
- AdPiatt
- Ahj27
- Akbarijesus
- Al783
- Alexmayorov
- Apogeanaces
- ATB4444
- Bekimills
- BenDennes (Commons)
- Blitzkrieg 666
- Cetu (Commons)
- Charliemabbutt
- Danielandrews441
- Danieldrazen (Commons)
- Danielwjchin
- Davidmthomp (Commons only)
- Drageson
- Dsyw
- Eloisekidner
- Emb85 (Commons)
- Feh27
- Ffinian
- Franciscollie
- Grh37
- Gsj25 (Commons)
- Haynesmatt27
- Hjh48
- Hmg38
- IAmNotBernieSanders
- Iamtrelawney
- Ibcl2
- Izzys19
- J.c1 (Commons)
- Jab2791997]
- JackLangan
- Jae42
- Jemima G
- Jl97877
- Jlcquarry
- Jlf1357 (Commons only)
- Jt641
- Jwg21009 (Commons)
- Jz431 (Commons)
- Kate zator (Commons only)
- Kelvinwu94 (Commons only)
- Kl443
- Lab24
- Lalanyx
- Lb656 (Commons only)
- LDelaunay58
- Lp200303845793830 (Commons only)
- Lsketeris
- Marrc2 (Commons only)
- Mc944 (Commons)
- Meg31415926 (Commons)
- Mjp201
- Mjs287
- Mkj34Wiki
- Mtoholland
- Navodittedas (Commons only)
- Nigelthatchercameron (Commons only)
- Nms42 (Commons)
- Odgm2
- Oldham97
- Palpchem (Commons)
- Partiichem
- Partiidpat
- Polyketide123 (Commons)
- Polyketide1234
- Radu Bizga
- RPCambridge
- Rtewungwa
- Rw573
- SamByng
- Samueltmckee
- Sb2169.
- Sf573
- Sjdp2
- Sw19th
- Tadavidson97 (Commons only)
- Tc470 (Commons only)
- Tc475 (Commons only)
- Tgrub (Commons only)
- TJeeee
- Tk468
- Tlnewlove
- Tnbyadfi (Commons)
- Tristanliu123
- Wwl24
- Xl403 (Commons)
- Znt22 (Commons only)
- IPs