Wikidata talk:WikiProject Chemistry

From Wikidata
Jump to navigation Jump to search
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015, Archive 2016, Archive 2017, Archive 2018, Archive 2019, Archive 2020, Archive 2021.

Items with IUPAC Gold Book ID?[edit]

IUPAC Gold Book ID (P4732) is (to me) an important property: it links to a very important chemistry compendium (glossary/dictionary), Compendium of Chemical Terminology (Q902163). I don't remember the details, which is why I start this discussion, but I seem to remember a discussion on how to model this: should P4732 be used on items that represent a compendium entry (as in something like "described by encyclopedia page"). Indeed, compendium entries started to appear as images at https://commons.wikimedia.org/wiki/Category:Files_provided_by_IUPAC (I need to talk with Martin about this).

So basically, I see two options: the IUPAC Gold Book ID (P4732) is linked to the Wikidata item about the concept, as the property description says (and then the images from Wikimedia Commons would be linked to the term directly, e.g. like for glass transition (Q825643)). Second option is that each compendium item has a separate Wikidata item and it linked to the concept via "main subject" and/or "described by" (and then the images are linked to the item about the Compendium entry). These are alternative ways to model this. I like to model the preferred model as EntitySchema. What do you think? Egon Willighagen (talk) 05:57, 10 October 2023 (UTC)[reply]

My impression is the current usage of IUPAC as identifiers suffices. Very much like IEV identifiers. Sorry if I'm missing your point. Fgnievinski (talk) 07:00, 10 October 2023 (UTC)[reply]
I agree. It should stay as an external identifier to be used in Template:GoldBookRef (Q7204578). Regards Matthias M. (talk) 11:40, 10 October 2023 (UTC)[reply]
To be honest, converting the text into images is bad for accessibility. I think commons:Category:Files provided by IUPAC should not be used in Wikipedia. Matthias M. (talk) 11:59, 10 October 2023 (UTC)[reply]
There was a discussion somewhere (I think on this discussion page) about this problem as there were (and sometimes still are) entries about GoldBook definitions that were imported based on DOI. The result of this discussion was to delete such entries and use IUPAC Gold Book ID (P4732) only in items about the concepts.
c:Category:Files provided by IUPAC, however, seems very problametic due to accessibility issues mentioned above. These files should be included in Commons, but I'm not sure that they should be used anywhere in Wikimedia projects and I'd be against using these files in WD. Wostr (talk) 18:11, 10 October 2023 (UTC)[reply]
In my opinion we should continue to use the property as proposed, on the concepts themselves. These are the key hubs for linking the data across domains. I would not favour making a mass of items about the Goldbook entries themselves. Since we have the images, I have no problem with the ID also being used as structured data on those images in Commons. --99of9 (talk) 23:07, 10 October 2023 (UTC)[reply]
Thank you, everyone, for the great feedback. I will pass this on. --Egon Willighagen (talk) 09:48, 15 October 2023 (UTC)[reply]

@Egon Willighagen:, @Matthias M.: @Wostr:- Sorry I missed this discussion at the time, but I'd like to follow up because we are currently working on this, and in fact some images were recently uploaded (I believe in the wrong format). I'm mostly focused on Wikipedia for now, but it could affect Wikidata too. Concerns were raised above about the use of images in Wikipedia because of accessibility, and I'd like to try to address this if we can. Unfortunately the use of images was due to a copyright concern forced upon us by the Wikipedia Community and IUPAC. WP editors deleted dozens of IUPAC definitions, stating they were plagiarised from the IUPAC website (which, in effect, they were - with the full blessing of IUPAC!). The IUPAC chemists then persuaded the IUPAC leadership to clarify the license for sharing, but IUPAC naturally didn't want their definitions altered, so they insisted on a CC-ND license. That was of course unacceptable to WP, which led to an impasse. The solution we came up with, acceptable to both sides, was to have an image containing the Gold Book definition released under a CC-BY-SA, linked directly to the Gold Book. I think the assumption was that anyone visually impaired would click through to the Gold Book site which I presume is fully accessible. The only concern raised by WP chemists was that there should be a link to the image file, not just to the Gold Book entry.

I worked with the Gold Book web guru to compare different formats for the image - please see these w:User:Walkerma/sandbox on my Sandbox page - and we agreed to use the format Image with caption link. Is this the best that is possible, given our constraints? If not, can you suggest a viable option that improves accessibility, and maybe add that into my Sandbox page? We hope to add a lot more definitions over the coming years, and we want to get the format right. Your input is most welcome. Thanks, Walkerma (talk) 19:03, 24 February 2024 (UTC)[reply]

You fundamentally misunderstood how Wikipedia works. We write articles citing sources, with IUPAC Gold Book being one of the most valuable ones because it is an authority source, and it has stable DOI links. We don't copy the sources in inadequate image formats and paste that into related articles. This only creates additional work to clean up using mass deletion requests, which then also need to be discussed. Matthias M. (talk) 19:17, 24 February 2024 (UTC)[reply]
I'm not sure how this problem is related to Wikidata. I see no use of such images in Wikidata, in fact I see no use of them even in Wikipedia, but I'm not an editor of en.wiki, so I won't question this – although I would be strongly opposed to this on my home pl.wiki. My experience is that any quotes should be avoided if possible. So far, we have easily managed to write articles in pl.wiki in such a way that it was not necessary to quote IUPAC (or our Polish translations of their publications) in the articles. Appropriate writing of definitions in the article based on the IUPAC definition was completely sufficient and IUPAC definitions are always accessible through references. Wostr (talk) 19:38, 24 February 2024 (UTC)[reply]
I disagree. I do agree that Wikidata should work this way, but certainly en:Wikipedia and even pl:Wikipedia is full of short quotations of this type, such as in w:First_Amendment_to_the_United_States_Constitution. Right from the early days of the English Chemistry WikiProject, there were chemists actively adding IUPAC definitions into articles, and in fact there was a task force devoted to it. I also don't understand why it must be "inadequate" if designed properly, and that's what I was really asking for help with. But maybe I should just limit my query to Wikipedia, because I know the discussion of lesser interest to the Wikidata community. Walkerma (talk)
That probably results from different approach to copyright law in the US and in Poland, the fact that in pl.wiki fair use is not allowed and the view that even legal use of the right to quote can sometimes cause problems with further use of the content. Of course legal acts are not subject to copyright, so your examples are not adequate here, but we have many discussions in pl.wiki that quotes should be used only when necessary and to a very limited extent – and from my point of view, this is not the case at all here. IUPAC definitions are published under incompatible license and those images that already have a compatible license will not help at all – they have accessibility problems and are inconvenient to use (in order for the text to be comparable in size to the text of the article, they must occupy more than half of the width in the new Vector skin). I would say that since IUPAC defends its definitions so much, you should do like every other wiki in the world – simply create article definitions based on IUPAC definitions, and not try to include them in some strange way as a quote. Wostr (talk) 21:34, 24 February 2024 (UTC)[reply]
Thanks - that's helpful. The dilemma is that if we create our own definitions, or allow free editing of definitions derived from IUPAC definitions, we end up with original research (at best), and completely incorrect definitions (at worst). I came across a citation last month where someone had edited the text to say the opposite of the citation, and that is not uncommon. But if we use these official definitions verbatim in pure text, we fall afoul of plagiarism rules. I know images can be made accessible, and I can ask for the images to be made smaller and less bright, if you think that would be preferable - to look more like the first amendment article I cited. Cheers, Walkerma (talk) 22:27, 24 February 2024 (UTC)[reply]
I don't think I follow, because creating article text (so also definitions) based on sources is the core foundation of Wikipedia and it's a long way from OR. In pl.wiki we have FlaggedRevs so it is easier to catch any vandalism, but as far as I remember, there were boxes with IUPAC definitions in the articles in en.wiki that could be vandalised as easily as any other text. But leaving that aside, if anything is to come out of this discussion here, I have a few points:
1. I am using the new Vector skin, in my case e.g. in en:aerogel this image takes up about 60% of the width – maybe it should be placed in the middle? or in a separate section? maybe there should be a template that would expand this image after clicking on "official IUPAC definition [show]"?
2. accessibility – I do not think that the alt text in the form of "IUPAC definition for aerogel" is sufficient, because it in no way replaces an image, which is the purpose of an alt text. En.wiki is one of few Wikipedias that have an Accessibility Wikiproject – maybe they will tell how to best handle such images?
3. in many projects, not linking to the image page is only allowed for photos in the public domain. Similarly, it seems to me that external links should not appear here either. Wouldn't it be better to leave a link to the image page (where is a proper license information) instead of a link to the IUPAC website, and put "Official IUPAC definition" as the caption with a footnote (with DOI and other info)? Wostr (talk) 00:37, 25 February 2024 (UTC)[reply]
You actually put File:IUPAC definition for aerogel.png under a Creative Commons license where anybody can edit that definition. Then you are not an authoritative source anymore. I don't know which problem you are trying to solve. The IUPAC gold book is already very accessible and wildly cited, even without the image spam in English Wikipedia. Matthias M. (talk) 10:41, 26 February 2024 (UTC)[reply]

I now filed Wikidata:Requests for deletions#Bulk deletion request: Entry in IUPAC's Gold_Book. Regards Matthias M. (talk) 13:52, 3 March 2024 (UTC)[reply]

After three attempts I think we got all of them Wikidata:Requests for deletions/Archive/2024/05/01#Bulk deletion request IUPAC Gold Book. Purging mixnmatch:908 again. Matthias M. (talk) 16:05, 2 May 2024 (UTC)[reply]

Wikidata items for chemicals where Wikipedia may have more information[edit]

@AdrianoRutz and I worked out a federated SPARQL query with DBpedia to find Wikidata pages with a English Wikipedia with a ChemBox, but without a SMILES (can/iso/cx): https://w.wiki/8iUp This lists currently still over 500 Wikidata items that could have more information. I have started adding missing information, but despite my https://github.com/egonw/ons-wikidata/blob/main/Wikidata/createWDitemsFromSMILES.groovy script, it still is manual work for each item. So, feel free to help out. Egon Willighagen (talk) 08:34, 4 January 2024 (UTC)[reply]

We have e.g. 3 items: hydrogen (Q556) as chemical element (whatever it would be), dihydrogen (Q3027893) as one of simple substances consisting of it, hydrogen molecule (Q19822725) as one of type of molecules for them. In order to consistently model physical properties (temperatures, elasticity modules and so on) I want to propose to use them only at simple substance items, not at chemical element (which has e.g. atomic number (P1086) and electron configuration (P8000)) or molecules (which has ionization energy (P2260) and electric dipole moment (P2201)). As one of steps I propose to change constraints at melting point (P2101)/boiling point (P2102) from

⟨ subject type constraint (Q21503250)  View with Reasonator View with SQID ⟩ instance of (P31) View with SQID ⟨ type of chemical entity (Q113145171)  View with Reasonator View with SQID ⟩

to

⟨ subject type constraint (Q21503250)  View with Reasonator View with SQID ⟩ subclass of (P279) View with SQID ⟨ substance (Q10683158)  View with Reasonator View with SQID ⟩

. Is it ok? Infovarius (talk) 13:57, 9 January 2024 (UTC)[reply]

Having a quick look to what are subclasses of `substance` (see https://w.wiki/8njM), this looks like a not so good idea. AdrianoRutz (talk) 14:08, 9 January 2024 (UTC)[reply]
Somehwere in the archives of this discussion page there are entries about this problem. There was a time we had more items describing simple substances, but many of them were merged into items describing chemical elements. It is true we should have these concepts described in different items, however, this would require a thorough discussion on how to do it, which statements should be put in which items etc. Wostr (talk) 15:21, 9 January 2024 (UTC)[reply]
Well dihydrogen is the hydrogen molecule. Trihdyrogen also exists as a rarity, and tehre are ions, and there is a dihydrogen cation, and a trihydrogen cation. "hydrogen" could be the more general item, with included items, a distinction between atomic hydrogen and the other ionic or molecular forms. Graeme Bartlett (talk) 07:58, 17 February 2024 (UTC)[reply]

Physical properties[edit]

Related problem is that I can't build a table of physical properties for all simple substances as they are mixed with elements:

SELECT distinct * {
  {
    SELECT * {
      ?item wdt:P279|wdt:P31 wd:Q2512777 .
    } LIMIT 1000
  }
  OPTIONAL {?item wdt:P2101 ?melt.}
  OPTIONAL {?item wdt:P2102 ?gas.}
  OPTIONAL {{?item wdt:P1086 ?number.} UNION {?item wdt:P527/wdt:P1086 ?number.}}
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "ru" .
    ?item rdfs:label ?label . ?item schema:description ?description
  }
}
Try it!

. I got less than 50 results. --Infovarius (talk) 14:03, 9 January 2024 (UTC)[reply]

I am not sure to understand what you are trying to achieve, but it the issue is that simple substances are mixed with their corresponding elements, I would rather work in clearly distinguishing both and modelling them correctly than modifying the actual constraint? AdrianoRutz (talk) 14:16, 9 January 2024 (UTC)[reply]
I just want to see a comparison of all melting temperatures of "chemical elements" (colloqually saying). As I interpret it should be for simple substances. For example, when I want to know phase transition temperatures of xenon (Q1106) where should I look? Yes, this problem is a consequence of upper post. --Infovarius (talk) 15:48, 10 January 2024 (UTC)[reply]
There is the concept of allotropes that allows linking the chemical element to the simple compounds consisting of it. I agree we should not have physchem properties on elements. Egon Willighagen (talk) 13:05, 13 January 2024 (UTC)[reply]

Chemical Reactions[edit]

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

Dear all, I was trying to see if some people started linking chemicals to chemical reactions... So I started by looking at https://w.wiki/8uGb which give some results but it looks like there is quite a mess between P31 and P279 (as often) so it is hard to "easily" retrieve "true" chemical reactions. Further, it looks like many of them are not used at all (https://w.wiki/8uGj)

Is there someone experienced that could guide me?

Someone that could help cleaning up things?

Any good examples to follow to map such ideas, like the chemical reaction(s) leading to the dimerization of some dimeric alkaloids? AdrianoRutz (talk) 10:27, 20 January 2024 (UTC)[reply]

It would indeed be great to better document chemical reactions here ! Encoding them could be a first step. I have created a page for SMIRKS (Q124450357) and proposed the corresponding property https://www.wikidata.org/wiki/Wikidata:Property_proposal/Natural_science#SMIRKS. This could be a way to better detail the chemical reactions listed in your previous queries ? GrndStt (talk) 13:14, 7 February 2024 (UTC)[reply]

Saturation (chemistry)[edit]

I was pointed last night to saturation (chemistry) (Q1766412) which mixes up disambiguation, subproperties, etc. It's not an easy fix, and I will not get to cleaning this up in the next three weeks. I think it will cost a bit of time as there is some modelling involved and plenty of lookups what the sitelinks refer too. For example, the English WP sitelink redirects (should be removed?), the French WP lists a number of subproperties, it seems. Etc. --Egon Willighagen (talk) 14:54, 16 February 2024 (UTC)[reply]

I see at least 3 or 4 different concepts here (including disambiguation) and any attempt to solve this will probably be met with complaints from Wikipedia users that we are messing with sitelinks. Nevertheless, I'll try to clean this up. Wostr (talk) 17:37, 21 February 2024 (UTC)[reply]
I tried to clean this up a bit, all different concepts are now listed in different from (P1889), but it would be good to check these items after my edits. Wostr (talk) 18:39, 21 February 2024 (UTC)[reply]

Modelling: refine "group of stereoisomers"?[edit]

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

Shall we split group of stereoisomers (Q59199015): set of several stereoisomers into two items, namely "Group of fully undefined stereoisomers" and "Group of partially undefined stereoisomers"?

An example would be 20-Hydroxy-6,10,23-trimethyl-4-azahexacyclo[12.11.0.02,11.04,9.015,24.018,23]pentacosan-17-one (Q105173494): group of stereoisomers with the chemical formula C₂₇H₄₃NO₂ and (1R,2S,6R,9S,11S,14S,15S,18S,20S,23R,24S)-20-hydroxy-6,10,23-trimethyl-4-azahexacyclo[12.11.0.02,11.04,9.015,24.018,23]pentacosan-17-one (Q105173489): group of stereoisomers with the chemical formula C₂₇H₄₃NO₂ AdrianoRutz (talk) 08:31, 21 February 2024 (UTC)[reply]

I don't have an opinion right now. Some time ago we have e.g. pair of enantiomers metaclass (submetaclass of group of stereoisomers) – which I created with the thought that it would help organise these classes better, but it did not work at all. I also thought about a qualifier to subclass of (P279)group of stereoisomers (Q59199015) with a number of defined/undefined stereocentres. However, I did not even propose that, because of a number of issues: (1) we still have thousands of items incorrectly classified, because we lack a tool to properly analyse them (their structures), at least I don't have any knowledge of such tool, (2) even existing tools (that are not used here in WD) have potential problems with properly analysing some structures ([1], [2]) as chiral/achiral, (3) in the past I tried to add some group of stereoisomers metaclasses based on InChI, but it could be done only for those entities that have partially defined stereochemistry (i.e. have /b or /t layer with at least one ?) and proved to be much more complicated than I anticipated, with a bunch of errors, which I had to correct manually. That's why I can't support or oppose this right now – while we still don't have a proper solution to the existing problem of how to automatically distinguish structures that are isomerically defined from those which have at least one stereocentre undefined. Wostr (talk) 17:34, 21 February 2024 (UTC)[reply]
If we do split this, I would call the latter item "Group of partially defined stereoisomers". --99of9 (talk) 22:44, 21 February 2024 (UTC)[reply]

How to use P2874 identifier?[edit]

Please check out this thread. Thank you! Horcrux (talk) 09:36, 13 March 2024 (UTC)[reply]

The original discussion suggests two ways: one is as external identifier for an assay and the other as reference. How do you want to use it? Egon Willighagen (talk) 18:59, 13 March 2024 (UTC)[reply]

Modelling of canonical/isomeric SMILES on isotopes/isotopocules[edit]

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

Currently, canonical SMILES (P233) and isomeric SMILES (P2017) are mapped inconsistently on isotope (Q25276) and Isotopocule (Q115801582).

For examples of the actually different co-existing versions, see protium (Q12830437), deuterium (Q102296), tritium (Q54389), carbon-14 (Q840660) or CTK8F2337 (Q82300470). We should coordinate to have a single mapping?

I am happy to do some cleanup once we agreed on the best way to do it. AdrianoRutz (talk) 14:22, 22 April 2024 (UTC)[reply]

Maintenance query spotting Q113145171 P279 Q113145171[edit]

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

https://w.wiki/9$ae: Query spotting inconsistencies where a type of chemical entity (Q113145171) is a subclass of (P279) a type of chemical entity (Q113145171). These cases should not exist. Depending on the cases, one of the two items might be wrongly instance of (P31) type of chemical entity (Q113145171) and should be instance of (P31) group of stereoisomers (Q59199015), or the other should simply not be a subclass of (P279).

There are less than 500 to ideally sort out manually or by any semi-automated mean we do not necessarily have as partially stated in https://www.wikidata.org/w/index.php?title=Wikidata_talk%3AWikiProject_Chemistry&section=new&veaction=editsource#Modelling:_refine_%22group_of_stereoisomers%22? AdrianoRutz (talk) 21:10, 7 May 2024 (UTC)[reply]