Wikidata talk:WikiProject Chemistry/Archive/2022

From Wikidata
Jump to navigation Jump to search


Validation of using NDF-RT IDs

Hi all, last week I participated (together with Egon Willighagen and Andra Waagmeester) in the SWAT4HCLS conference, which ended Thursday with a hackathon. We worked on developing ShEx for chemistry data (see Github for a visual overview of potentially relevant properties (definitely not complete yet), and some ShEx we started to write up relevant for chemical modeling. While developing a ShEx for lipids (Sterols specifically), I noticed the following entry: deoxycholic acid (Q425680), and the duplicate info for the NDF-RT ID, which I actually believe are both incorrect. The "National Drug File - Reference Terminology" ontology has different classes for chemical ingredients, and the preparation thereof. In the example above, Q425680, the two IDs linked from the NDF-RT ontology were both for the preparation (N0000192570 and N0000147220 both are subclasses of "pharmaceutical preparations"), not the chemical itself, which is located under "chemical ingredients", N0000007073. I've now added the latter as the preferred rank, and demoted the other two to deprecated rank. I did not work with this ontology before, so I would value your opinion(s) ont eh following:

1. Which information from the NDF-RT ontology do we want to link to individual chemicals? e.g. link to the chemical itself (only), or also add terms related to preparation (or maybe model this with a different property?

2. Which ID should be considered to have the preferred rank? Or should we allow to add more IDs (which is not in line with the property constraints for NDF-RT ID)?

3. I couldn't determine from the history of the example entry deoxycholic acid (Q425680), by whom the NDF-RT IDs where added. The metadata suggest that the IDs were added on 2015 and 2018; the entries history goes back to 2012 but I can't locate the property or changes thereof (besides the ones I created manually). If this has been added by a bot, updates might be necessary depending on your ideas for questions 1 and 2.

4. There might be are more entries with violations constrains for the Property, so it might be relevant to add this ID to the how to contribute section.

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana
NadirSH
Matthias M.
S8321414

Notified participants of WikiProject Chemistry --Denise Slenter (talk) 11:19, 14 January 2022 (UTC)

Ad 1. We have different items about chemical compounds, different items about pharmaceutical preparations — mapping between WD and other databases should be 1:1 if possible. However, I'm not sure why these three IDs exist in this database, seems to me like duplicates.
Ad 2. None should be set to preferred if there is a 1:1 relationships between entries. If there is more than one entry, changing the constraint seems to me like a better option rather than the use of ranks.
Ad 3. [1], [2].
Ad 4. Constraints should be modified accordingly. If it turns out that all three IDs are in correct item, constraint can be modified with e.g. the use of separator (P4155). Wostr (talk) 14:41, 14 January 2022 (UTC)

Mass (again) and metadata (new)

Hi all, Regarding the validation of Mass entries (previous discussion 2018 here) including their source of origin through ShEx (Scheme: E340), I found this entry dehydroepiandrosterone sulfate (Q2505402), which included two masses "368.165745 dalton, source PubChem" and "368.489 dalton, no source known". I've added ChEBI as a source for the latter, however got several warning statements for my qualifiers.

Apparently, metadata from PubChem and ChEBI are modeled differently; PubChem can be added as a qualifier, whereas ChEBI should only be added as a reference. Why is there this difference? Also, the mass from PubChem is the monoisotopic one, whereas the ChEBI one is the "average mass". In the 2018 discussion, I saw some ideas on how to model the mass property with more specificity, and adding what type of mass was meant monoisotopic mass (Q3297559) vs average mass (which still doesn't have an entry in Wikidata), and how to link this information (determination method (P459) vs of (P642) vs criterion used (P1013), also not resolves if I'm correct). Hope to get your opinion on this matter.

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana
NadirSH
Matthias M.
S8321414

Notified participants of WikiProject Chemistry --Denise Slenter (talk) 11:56, 14 January 2022 (UTC)

Information about the source od data should always be included in the 'References' section, not as a regular qualifiers (cf. [3]). Most masses in WD are monoisotopic, because of an import from PubChem. A few years back I wrote that masses should always have a qualifier to specify if this is a monoisotopic mass or an average mass or what, but there was really no response to that and I can't do it myself with thousands of items. Wostr (talk) 14:12, 14 January 2022 (UTC)
I can. What qualifier should it have? Egon Willighagen (talk) 14:24, 14 January 2022 (UTC)
I think only criterion used (P1013) can be used here, as determination method (P459) has too restrictive value-type constraint (Q21510865). I think the first step should be to add criterion used (P1013)monoisotopic mass (Q3297559) to every PubChem-imported mass if the value in WD matches the monoisotopic mass in PubChem. Maybe other type of masses could be imported and properly qualified in the same time? Wostr (talk) 08:54, 15 January 2022 (UTC)
See also https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry/Archive/2018#(Molar?)_mass_of_compounds --SCIdude (talk) 15:12, 14 January 2022 (UTC)
I remember I wrote about this problem to a bot operator, probably the one responsible for the import of the PubChem data, but nothing has been done about it. Wostr (talk) 08:54, 15 January 2022 (UTC)

RFD: delete IUPAC GOLDBOOK entities "scholarly article"

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana
NadirSH
Matthias M.
S8321414

Notified participants of WikiProject Chemistry

Tobias1984
Snipre
Physikerwelt
Pamputt
Petermahlzahn
Jibe-b
Restu20
Daniel Mietchen
TomT0m
ArthurPSmith
Mu301
Sarilho1
SR5
DavRosen
Danmichaelo
Ptolusque
PhilMINT
Malore
Thibdx
Ranjithsiji
Niko.georgiev
Simon Villeneuve
Toni 001
Marc André Miron
DePiep
RShigapov
CarlFriedberg
Crocodilecoup
Mkomboti
Amorenobr (talk) 01:27, 3 August 2022 (UTC)
Valverde667 (talk) 16:07, 4 August 2022 (UTC)
fgnievinski

Notified participants of WikiProject Physics

The Source MetaData WikiProject does not exist. Please correct the name.

Consider dalton (Q483261) also known as the atomic mass unit or unified atomic mass unit. It has IUPAC Gold Book ID (P4732) that's formatted as a DOI link https://dx.doi.org/10.1351/goldbook.D01514 -> https://goldbook.iupac.org/terms/view/D01514. All of this is useful and valid.

Now consider dalton (Q61068243): that's considered a "scholarly article" and has only a name and DOI (pointing to the same GOLDBOOK entry), not even a "main subject" link to the real "dalton".

--Vladimir Alexiev (talk) 15:27, 13 April 2022 (UTC)

  •  Support Makes sense to me; what about added a DOI (P356) with the DOI value to the item too though? ArthurPSmith (talk) 17:12, 13 April 2022 (UTC) (i.e. that would hopefully prevent SourceMD from re-importing it again?)
    Arthur, but you should recognize the privileged status of these DOIs: "physical and chemical concepts have DOI in GOLDBOOK" and "English people have DOI in ODNB".
    There may well be encyclopedic or biographical articles about these domain concepts in other encyclopedias or biographical dictionaries.
    So it's much better to have specific external-IDs like IUPAC Gold Book ID (P4732) and Oxford Dictionary of National Biography ID (P1415) about these domain concepts.
    At the same time, in practical terms I support relaxing the constraint that only creative works should have DOI, and applying DOIs to domain concepts directly. Vladimir Alexiev (talk) 06:17, 14 April 2022 (UTC)
  • I've merged many such items with items describing specific concepts in the past. Items about GoldBook 'articles' are really useless. Wostr (talk) 17:35, 13 April 2022 (UTC)
    @Wostr excellent! This query finds 1906 items to be deleted:
    *:SELECT ?goodItem ?goodItemLabel ?badItem ?badItemLabel ?doi ?goldbook WHERE {
    *:  ?goodItem wdt:P4732 ?goldbook
    *:  bind(concat("10.1351/GOLDBOOK.",?goldbook) as ?doi)
    *:  ?badItem wdt:P356 ?doi
    *:  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    *:}
    *:
    
    Try it!
    This wider query unfortunately times out (because there are probably 50M things with DOI)
    *:SELECT ?badItem ?badItemLabel ?doi WHERE {
    *:  ?badItem wdt:P356 ?doi
    *:  filter(regex(?doi,"goldbook","i"))
    *:  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    *:}
    *:
    
    Try it!
    Vladimir Alexiev (talk) 06:27, 14 April 2022 (UTC)
  • I support merging too. Personally, I'd include the DOI in the merged item. --99of9 (talk) 07:29, 14 April 2022 (UTC)
  • +1 Snipre (talk) 19:49, 14 April 2022 (UTC)
  •  Support a cleaning + merging bot is no problem, as long as there is a list of items to merge into. So this would apply to the first SPARQL query above.- --SCIdude (talk) 16:08, 28 July 2022 (UTC)
    Bot merged nearly all 1,912 items from first query in about 55 minutes. I kept the DOIs for now, they also help with the failed merges which need to be done manually. Thanks to project members for the consensus. --SCIdude (talk) 18:23, 28 July 2022 (UTC)
    @SCIdude: if you keep DOI in these items, soon we will have a bigger problem as some bot will populate these items with crappy data about 'scientific article'... Wostr (talk) 09:44, 29 July 2022 (UTC)
    They are gone now. --SCIdude (talk) 15:10, 29 July 2022 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. SCIdude (talk) 18:23, 28 July 2022 (UTC)

pKa values

Hello, I’m active on the Wikipedia WP:Chem but this is my first time on the WikiData equivalent. I'm trying to figure out why so few pKa’s for strong acids are listed here. When I use the query tool to find pKa's <-1 I only get 9 results. These include only one of the examples from superacids.

I had assumed that wikidata automatically harvested all data from ChemBox – is this not the case? --Project Osprey (talk) 12:29, 30 March 2022 (UTC)

Data harvested from Wikipedia infoboxes are worth nothing, the example of ru.wiki import shows it (every statement imported from this project will have to be manually curated, right now such data have very limited reusability). In Wikidata pKa data should have a reference (not the indication that it was imported from a Wikipedia), temperature, optionally the determination method and uncertainty qualifier. There is not much pKa data, because no one added it yet. Wostr (talk) 08:03, 31 March 2022 (UTC)
That risks turning one question into many. Some pKa's on en.wiki are referenced. Can they be selectively extracted and would that be worthwhile to do? Some existing values returned by my search are clearly wrong (Q242627, Q899683, Q26963) these are pKa's for the protonated form of the compound, not the compound itself. More generally, what would help? --Project Osprey (talk) 13:22, 31 March 2022 (UTC)

CXSMILES

Hi all, since more and more people are using types (instance of (P31)) for groups and classes of chemicals and we have polymers, etc, I think it's time to starting using CXSMILES to represent them. CXSMILES is supported by several open source tools, tho not strictly speaking an open specification. I made a property proposal: https://www.wikidata.org/wiki/Wikidata:Property_proposal/CXSMILES Your thoughts are welcome! Egon Willighagen (talk) 08:26, 20 April 2022 (UTC)

PubChem deposit completed

Over the past four weeks I picked up depositing the Wikidata chemicals in PubChem again. I'm working with the PubChem team and the upload went live yesterday. Of course, it struggles with inorganics, but it also does give a good bit information during the standardization process. For example, I found about 26 thousand incorrect SMILES strings. At least some of those came my from hand, and I fixed these. The problem was trivial: instead of a single \ the SMILES had a \\. Likely an escaping issue, but it was also easy to fix. Other problems include R groups, which reinstated my interest in CXSMILES (see above). There is a long tail of other warnings and errors that I get back, but I still need to update my script from 2019 to process the files with this information into Wikisource so that I can upload that somewhere here. All in all, PubChem now has 941,013 live substances from Wikidata, up from around 150 thousand in 2019 :) -- Egon Willighagen (talk) 07:07, 30 April 2022 (UTC)

@Egon Willighagen: Great work! --99of9 (talk) 02:24, 5 May 2022 (UTC)

Documenting the model about elements

I think this project needs a page explaining the modeling choices, and the different possible models, about chemical elements, the concepts and there relationships. This would help settle down edit war if everybody can reach an understanding on the content, and point newbies to a place where everything

Something along the lines that I wrote on the previous section, but clarified probably, could be a start.

What do you think ? author  TomT0m / talk page 11:07, 14 June 2022 (UTC)

ECHA Substance Infocard: datamodel check

 Resolved
I think this ECHA datamodel at WD can use a check. Involved:

ECHA Substance Infocard ID (P2566) -- the ID property
Currently: Wikidata item of this property (P1629) is ECHA Substance Infocard database (Q59911453) (now: ECHA Substance Infocard (Q65325050) -DePiep (talk) 13:20, 15 June 2022 (UTC))

About Items and Properties: ECHA Substance Infocard (Q65325050) has little tracking or maintenenance. Only one lang-wiki article (scowiki). The ID property ECHA Substance Infocard ID (P2566) is defined to be tied to the database. Though, I think it should be tied to the Infocard. (Incidentally: also add P31 Wikidata property to identify substances (Q19833835)? Or is does a "Data Sheet ID" not qualify?).

About Labels: ECHA names the sheet "Substance Infocard" (including this titlecasing), consistently AFAIK: see what_is_an_infocard_en (pdf, Sep 2021). That is not "ECHA InfoCard" (any more?). Shall we label the ID Property "ECHA Substance Infocard ID"?

FYI, en:Template:Chembox and en:Template:Infobox drug, in 19K articles together, directly read this property from WD (as opposed to local parameter input e.g., CAS RN).

Proposal
1. For ECHA Substance Infocard ID (P2566), change Wikidata item of this property (P1629) into ECHA Substance Infocard (Q65325050)
2. To ECHA Substance Infocard ID (P2566), add P31 Wikidata property to identify substances (Q19833835)
3. Use label "ECHA Substance Infocard [ID]" primarily

-DePiep (talk) 05:46, 21 May 2022 (UTC)

I support DePiep's proposals, including that the name in proposal #1 should be "ECHA Substance Infocard" for consistency. The direct link to the latest ECHA leaflet is this link (pdf). A brief write-up about this database is the section I wrote for en:European Chemicals Agency#Substance infocard. Michael D. Turnbull (talk) 10:17, 21 May 2022 (UTC)
✓ Done. See the items listed in top. -DePiep (talk) 13:20, 15 June 2022 (UTC)

ChemSpider IDs: ionic vs covalent structure

Today's case with monosodium acetylide (Q17417384) motivated me to write about this issue. We have about 1,7k 'single value' constraint violations with ChemSpider ID (P661), at least some of them are caused by the fact that ChemSpider has different entries about the same chemical entities, one with covalent bonding, one with a ionic structure (some examples: nickel potassium cyanide (Q4456691), rhenium(III) bromide (Q4096889), sodium methanethiolate (Q6553913), stannic sulfide (Q205021), potassium ferricyanide (Q408810), tungsten hexachloride (Q421188), titanium tetrabromide (Q411616)). In some cases it can be easily fixed by deprecating one statement, because we are sure that a chemical entity is a ionic compound (or at least that ionic bonding is the best way to describe this structure). However, in some cases we may not be sure (or maybe that's just me?) which ChemSpider ID would be better to have as a main ID (= not deprecated or preferred). What would be the best solution here? I see some options like (1) deprecating one statement, (2) make one statement preferred, (3) use separator (P4155) with a proper value and keep both IDs with a normal rank? Wostr (talk) 17:15, 27 June 2022 (UTC)

This problem is not specific to ChemSpider. PubChem has a similar problem: see stannic sulfide (Q205021): 15238661, 73977. Snipre (talk) 18:57, 18 July 2022 (UTC)

Charting the isotopes

Researching and curating the isotopes in Wikidata. -DePiep (talk) 06:50, 28 July 2022 (UTC)

WD isotopes 4765 isotopes (as of 31 Jul 2022)
COUNT '4765' per the '4765' query
  • AME2020 (II)

Queries and listings

(talk moved to this new subthread) 06:50, 28 July 2022 (UTC)
  • Egon Willighagen can you tell me how to contact the Scholia creators? Github? I am missing some isotopes... and even with HL=1s, they don't disappear completely ;-) -DePiep (talk) 12:31, 27 July 2022 (UTC)
    Yeah, just tell me :) I think the SPARQL query was written when people started working on the allotropes. I may need to update it. So, let's use GitHub. --Egon Willighagen (talk) 04:59, 28 July 2022 (UTC)
    Background. BTW, I have shifted in Scholia from allotropes to isotopes topic here.
(1) For oganesson (Q1307), Scholia lists zero isotopes: [4]. But Wikidata has three.
(2) The same Scholia query made generic (=for all elements) [5] lists 2995 isotopes (still zero for Og). There is no query-overflow warning.
(3) Same query stripped down to more basic output [6] lists 4765 isotopes (3 for Og). This number is more like the NUMBASE number, but not checked for such completeness yet.
(4) Conclusion: Looks like the Scholia query is incorrect, most likely by overflowing (without message).
FYI, my laboratory is at en:Template:Infobox element/symbol-to--overview-isotopes. DePiep (talk) 06:19, 28 July 2022 (UTC)
Scholia, Bi lists 43 isotopes: no isomers (m), only ground states. Systematic? (WD has ca 1322 m-isotopes in the 4765 hits). -DePiep (talk) 11:07, 1 August 2022 (UTC)
Scholia, Bi lists & counts same isotope double eg bismuth-214 (Q18888217); bismuth-216 (Q18888228). -DePiep (talk) 18:45, 1 August 2022 (UTC)
Abandoning Scholia: For these three reasons, I won't use the Scolaria queries any more for curating the ~5k isotopes. That is: Og: all isotopes missing; Bi-list m-isotopes missing (systematically though ;-)); Bi-214: same isotope repeated in the list. -DePiep (talk) 07:44, 2 August 2022 (UTC)
Wow! A lot to discover. DePiep (talk) 20:48, 28 July 2022 (UTC)
The '4765' query (Jul 2022)
SELECT ?isotope ?isotopeLabel ?element ?elementLabel ?protons ?neutrons
WHERE {
  ?element wdt:P1086 ?protons .    #P atomic number (Z)
  ?isotope wdt:P279 ?element ;     #P subclass of
           wdt:P1148 ?neutrons ;   #P neutron number (N)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?protons ?neutrons
Try it!
atomic number (P1086)
subclass of (P279)
neutron number (P1148)

Incidental edits

Informative. Incidental edits in isotopes that work towards standardisation. Assumed trivial ie non-controversional. -DePiep (talk) 06:38, 1 August 2022 (UTC)

--todo?: ch label into standard "magnesium-28"(but see duplicate note below), add alias
todo: duplicate of magnesium-28 (Q18844485)? also: Q..615 does not show in query "all 4765 isotopes"?! Mg has no -m isotopes (in WD).
Q27285615 now Redirects into magnesium-28 (Q18844485). -DePiep (talk) 07:48, 1 August 2022 (UTC)
  • nickel-54 (Q18845090) -- rm claim excitation energy (P9998)=0±0 electronvolt: ground state; also use P9998<>0 as excitation (m) flag.
-DePiep (talk) 06:59, 1 August 2022 (UTC)

Datamodeling: Identifying isomers

IST:

A simple isotope (Q25276) (in ground state (Q4480008)) can exist as nuclear isomer (Q846110): a distinct isomer with same Z+N=A mass number (Q101395), but different (non-zero) excited state (Q215328)excitation energy (P9998)excitation energy (Q568593). In real life, formally marked (identified) by "m" in isotope id: 99mTc.
The code "m", "m1","m2" can also show as ‘m’, ‘n’, ‘p’, ‘q’, ‘r’ and ‘x’ (The NUBASE2020 evaluation of nuclear physics properties (Q113353102), p17/181).
Example Pu: technetium (Q1054) has technetium-99m (Q2373354) with technetium-99m (Q2373354)excitation energy (P9998)142.6836±0.0011 keV
Example Tc: plutonium-241 (Q7205705) has plutonium-241m2 (Q18888945)excitation energy (P9998)[missing], plutonium-241m2 (Q18888945)excitation energy (P9998)[missing]

Numbers involved (NUBASE 2020):

3340 ground state
1938 excited isomeric
 218 unobserved, estimated; ground state
  45 unobserved, estimated; excited state
————— +
5541 isotopes

SOLL in Wikidata:

(1) let's establish that identification "m", "m1","m2", .. is used. (question: does this cover the side-states for p, q, r that NUBASE mentions?)
(2) All isomers should have property excitation energy stated (eg, from NUBASE2020)
(3) Question: how do we identify (query for) isomers, as distinct from the ground state isomer? So far, only route is to check for statement excitation energy (P9998) being present (see (2) though); even when this indirect statement is completed it does not solve the ordering issue "m1, m2".

-DePiep (talk) 09:05, 30 July 2022 (UTC)

@DePiep: Yeah, I ran into this when I was doing some isotope imports - I seem to recall at least a few of our standard isotopes are actually not ground states but the m2 or some other state (because that's how they actually appear most often in nuclear decay chains or wherever they are generally found). Should we have a class "isomer", subclass of "isotope", perhaps? And I'd love to see standardization on the main labels for these. ArthurPSmith (talk) 19:49, 31 July 2022 (UTC)
@ArthurPSmith: I am curating the isotopes: checking current representation & datamodels applied in WD (on and offline). Now completing stable & complete? classifications (see below; #Incidental edits). Later to be checked against sources (like NUBASE2020 and AME2020). Meanwhile learning the ropes of WD-classification, SPARQL, isotopes, bots/automation. Also involves #Data structure for elemental substance and element. Ultimate test & goal: WD and (hometown) enwiki could be mutually complete & same... For now, I don't propose modelings.
If you find "standard isotopes are actually not ground states .. m-isomers": please post cases (or a sparql) for me to cure. -DePiep (talk) 07:37, 2 August 2022 (UTC)
As of 1 Aug 2022, useful, stable classifications I met are:
  1. The WD isomers are well classified by P31 nuclear isomer (Q846110); using m, m1, m2, .. m6 ID in labelsuffix. (~1323 items today)
  2. Generic isotopes are in P31 isotope of tin (Q687521) guidance system (P624) Kirchhundem (Q10906) (118+ <elements>; P31/P279 <element>, total ~4765 today including the 1300+ WD M-isotopes)
  3. Not crosschecked(/not fit for classification?) yet: N, Z, halflife, eV's, completeness vs. NUBASE/AME 2020/(2016!), ground state being an m-isotope actually (as APS noted above), P31 stable isotope (Q878130), excitation energy (P9998), ...
  4. Not usable for curation: Scholia listings.
DePiep (talk) 07:37, 2 August 2022 (UTC)

Pentitol

pentane-1,2,3,4,5-pentol (Q81977867) and pentitol (Q74705706) seems to me like duplicates? Or maybe there is a difference which I don't see? Wostr (talk) 17:23, 11 August 2022 (UTC)

According to [7], Q3090510 (L-fucitol) is a pentitol but it is not a pentane-1,2,3,4,5-pentol. SCIdude (talk) 09:27, 12 August 2022 (UTC)
Ok, I see now the problem, I will report it to ChEBI. Fucitol is not a pentitiol as it is a reduction product of fucose, which is not a pentose, but a hexose. However, I see now also hexitol (Q71381967) plus hexane-1,2,3,4,5,6-hexol (Q27215890) and hexane-1,2,3,4,5-pentol (Q105254737), which in ChEBI have incomplete relations, so it has to be addressed as a wider issue, I'll look for other sources and then come here. Wostr (talk) 13:47, 12 August 2022 (UTC)

Data structure for elemental substance and element

I was starting in the past to list the differnet items defining the chemical elements and the ones defining the elemental substance. This was a concept used to differentiate the elements from the allotropic forms.

Example:

oxygen (Q629) vs dioxygen (Q5203615) and ozone (Q36933)

Then I found that some contributors created different items for the metallic element and for the metal substance.

Example:

platinum (Q880) vs platinum (Q27882222)

So I listed all items I could find in one table, see Wikidata:WikiProject_Chemistry/Tools#Chemical_element_and_corresponding_simple_substance. However now I see that other contributors are merging the items representing the substance with the one representing the element, so I would know which is the good startegy in term of data organization. Snipre (talk) 21:14, 13 June 2022 (UTC)

Items about chemical elements and about simple substances should not be merged. 'Chemical element' is an abstract term, simple substance is a class that has individual representations in reality. And without such differentiation, we'd have inconsistency between data in items about chemical elements that have different allotropic form vs those that don't have. Wostr (talk) 23:45, 13 June 2022 (UTC)
There is two definition of « chemical elements » in the UIPAC goldbook ( cf. https://goldbook.iupac.org/terms/view/C01022 )
They are distinct :
  1. A species of atoms
  2. A pure substance
Following the first model leads to a scheme like the one in this diagram :
Three levels
  • The concrete levels, where atoms instances leaves.
  • The « class » level, equivalently « species » in the UIpac definition. Atoms can be instances of some species of atoms. An hydrogen atom is an instance of the hydrogen species. « Hydrogen » si a subclass of « atoms » because all hydrogens atoms are atoms. « Deuterium » is a subclass of « hydrogen » because all deuterium atoms are hydrogen atoms.
  • the « metaclass » level, which is used to « tag » species. Deuterium is an instance of isotope. Hydrogen is an instance of Element.
We need the « meta » level because no hydrogen atom is an example of « element » by itself. Hydrogen is an element, not an hydrogen atom. So the « element » concept is indeed more abstract. I know there is more ways, less relevant maybe such as classifying by the number of nuclides.
For the definition 2 we can avoid the meta level. We have substances, like « water » and concrete example of substances, hence substance instance, includes, « what is in my bottle right now ».
  • « pure substance » is a subclass of substances :  substances can be either pure or composite. The content of my bottle is composite, we have
  • ⟨ elemental substance (= element) ⟩ subclass of (P279) View with SQID ⟨ substance ⟩
    (elemental substance is referred as « substance pure » in french, « pure substance » is english is something different, it refers to substances made of one type of atoms or one type of molecules, so we need to be careful)
  • ⟨ elemental substance ⟩ subclass of (P279) View with SQID ⟨ substance ⟩
and even
⟨ substance ⟩ disjoint union of (P2738) View with SQID ⟨ list of values as qualifiers (Q23766486)  View with Reasonator View with SQID ⟩
list item (P11260) View with SQID ⟨ elemental substance ⟩
list item (P11260) View with SQID ⟨ composite substance ⟩
as a substance is either pure or composite.
This needs to be sorted out because the definition of « substance » can be different, it seem that « substance » in french = « pure substance » in english terminology ?
Nevertheless the two models are linked, of course we have
⟨ elemental hydrogen ⟩ has part Search ⟨ hydrogen ⟩
As the objects are distinct, they need different items, although there is some kind of parallelism through the « part of » statements.
One way may be to treat the « element » item as inherently ambiguous and make it even more distinct than both of these models. But this would duplicate even more by adding a third set of items …
The problem we have, I guess, is to commit to a model for the « main » items : is the item hydrogen (Q556) the first or the second model ? If we read frwiki, it’s clearly the first definition. If we read other language editions, I’m not sure. This would advocate for treating those items as inherently ambiguous. We would also have to decide what to do with items such as hydrogen atom (Q6643508) who seems to be duplicate if we commit to the model of the first definition. author  TomT0m / talk page 10:25, 14 June 2022 (UTC)
In the graph, may I expect a relationship between "Element" and "Isotope"? DePiep (talk) 05:55, 16 June 2022 (UTC)
@DePiep It’s metasubclass of (P2445) View with SQID on Wikidata, as any isotope-atom is also an atom of a specific element. It may not be worth explaining in that specific graph as it might be a lot of information for beginners.
It may be better to explain before why it can’t be « subclass of » in this model as someone would naively expect. author  TomT0m / talk page 06:22, 16 June 2022 (UTC)
@TomT0m: I think the IUPAC definition, with its two separate meanings, greatly confuses things for us unless we follow your suggestion to allow for a third set of items representing elements as defined by IUPAC (a more abstract generic meaning). There are only 120 or so elements so creating those additional items would not be a huge burden - especially compared to what we've committed to in the similar case of written works and FRBR. Then "IUPAC element" "has part" (or some other appropriate property) "atom class", but also "has part" "elemental substance", with separate items for those two, and (notable) allotropes of the elements then subclasses of the more generic "elemental substance" classes. That would also sort out the hydrogen atom (Q6643508) presumably, if we take hydrogen (Q556) and other existing elements to mean the IUPAC meaning, and add entries for atoms and substances as needed? ArthurPSmith (talk) 13:56, 14 June 2022 (UTC)
@ArthurPSmith It’s more to see, definitely not as a lexical definition, but as a dictionary. There is quite often different definitions for the same word in dictionaries, this does not mean that we have an item for the world. The biggest problem is not the UIPAC, I think, it’s that culturally different Wikipedia may attach articles with different definitions to the same item. The UIPAC is just noting that there is two usages (the « substance one » probably predates the atomic one, historically.)
I think we should do a review of the definitions used on Wikipedia articles to have a clearer view. author  TomT0m / talk page 14:04, 14 June 2022 (UTC)
Nevertheless, « has part » is not appropriate for a property. I suggested a very long time ago we should have a property « vague notion which can be precised has » for ambiguous items. author  TomT0m / talk page 14:06, 14 June 2022 (UTC)
@TomT0m: I'm going to add some notes here on some randomly selected elements from enwiki:
  • en:Mercury (element) - discusses pure substance (metallic liquid), compounds & ores (mercuric sulfide etc), uses and dangers. Then isotopes, history, abundance, oxidation states, more on compounds and details on other aspects, regulations. I would say this describes much more than the pure substance, though the pure substance does feature broadly. And it is clearly not only about the atoms, but does describe the atoms somewhat (particularly in the infobox with electron configuration).
  • en:Ruthenium - mostly discusses relation to platinum at first; substance (hard white metal), use as catalyst, location of deposits, compounds and oxidation states, details of the atom, isotopes, production. Again, seems to be covering all aspects of both substance and atoms of the element.
  • en:Lutetium - silvery white metal; discovery, radioactive isotope, relation to yttrium, hardness. One sentence at the top clearly discusses both atom and substance and their relationship: "The lutetium atom is the smallest among the lanthanide atoms, [...] and as a result lutetium has the highest density, melting point, and hardness of the lanthanides". Compounds, oxidation states, history, occurrence, applications, precautions. Mostly about the substance, but a lot of atom-related info as well.
  • en:Selenium - relations to other elements particularly sulfur, arsenic. Found in metal sulfide ores, replacing sulfur. Applications, including by organic life and related toxicity. Several allotropes and amorphous forms with different colors. Isotopes, history, production, etc. Primarily NOT about the pure substance forms here, though they are discussed.
  • en:Rutherfordium - radioactivity, synthesis, uncertainty about chemical properties. Discovery, naming, isotopes, predicted properties - "properties of rutherfordium metal remain unknown". Primarily about the atom and isotopes and predicted chemical properties, almost nothing about the "substance".
  • en:Vanadium - "hard, silvery-grey, malleable transition metal". Discovery, natural presence in many minerals. Characteristics of substance, isotopes, compounds, occurrence, production, applications, biological role and safety. Primarily about alloys and compounds, not the pure substance.
  • en:Sodium - free metal not naturally occurring, abundant in many minerals, sodium ion everywhere! Physical: about sodium metal, but also spectroscopy which is about the atomic states. Compounds: mostly sodium as an ion, but also intermetallics. Almost none of this is about the pure substance.
Does this give a clearer view? It seems at least in English the articles discuss both substance and atomic properties (and often focus on specific aspects like compounds or ions) but rarely are about just one or the other. ArthurPSmith (talk) 14:54, 15 June 2022 (UTC)
@ArthurPSmith I also noted there seem to be a kind of « long term » edit war on the « chemical element » article on enwiki. The current RI is purely « pure substance », but when you look at this 2019 version it is just « atom species ».
By contrast the frwiki article seem to have totally abandoned the « pure substance » definition and mentions the substances as fr:Corps_simple. The aluminum article also talks of the substance but is clear on the terminology « Le corps simple aluminium est un métal malléable … » (the simple corpse aluminum is a malleable metal …)
One point I don’t like with the « substance » definition is that it does not play well with using « part of ». « Water » has clearly part of Oxygen and Hydrogen, but an example of simple substance of them are Dioxygen or dihydrogen, and none of them are really part of water, neither the substance nor the molecule. This does not seem really consistent. author  TomT0m / talk page 15:28, 15 June 2022 (UTC)
Note that I have now edited the enwiki article opening sentence to be clearer to follow the IUPAC definitions. It was mixing both together in the body of the article, confusingly and inconsistently in my view. There's still some cleanup needed there I think. ArthurPSmith (talk) 13:12, 16 June 2022 (UTC)
TL;DR: enwiki element articles do combine element concept & element substance. To keep, while encyclopedic treatment could be improved. No need for extra classes in the tree.
Long form: Me from working at enwiki in this. Yes, the en:element articles do explicitly describe both element concept and Real Life existence, notably the diatomic nonmetal (Q19753344)s. For example en:fluorine handles difluorine (Q1963030), in its infobox too; though notable substances have their own article like en:diamond. I would not call this an "long term, edit war" ;-) but sure more clear and systematic treatment is welcome. There is no need for such a split AFAIK ("en:F element" and "en:F substances"??). For now, this current setup looks best we can get: 1. Conceptual element, mention/describe RL substance, list/link "allotropes" (ouch, make that Simple Substances?), fork notable substances (like pure minerals). I appreciate the current en:Chemical element opening line (APS edit 'IUPAC', with D66 refinement de-philosophic-ised). Maybe we can transport this opening line setup to the individual articles (but not just cramming more info in there please).
Apart from the welcome concept/real-life-substance distinction (to keep), I see no arguments from enwiki into this modeling plan. IMO, enwiki does not call for extra classes. -DePiep (talk) 08:10, 3 August 2022 (UTC)
I stroke: at this data modeling level, I am not sure if a split (into element-concept and element-substance (.. atom)) on en:wiki is advisable/nonadvisable. I found en:Helium atom (helium atom (Q3877353)); complicating and confusing for the reader this way. Also: en:Category:Atoms. -DePiep (talk) 10:08, 3 September 2022 (UTC)
What about linking using facet of (P1269)? So we could have:
hydrogen atom (Q6643508) facet of (P1269) hydrogen (Q556) (currently these are linked with manifestation of (P1557) which seems wrong) and also
dihydrogen (Q3027893) facet of (P1269) hydrogen (Q556).
Similarly allotrope of carbon (Q622460) facet of (P1269) carbon (Q623),
though atomic carbon (Q866179) seems to treat the atom as a substance which is weird. So should deuterium (Q102296) subclass of (P279) hydrogen atom (Q6643508) or hydrogen (Q556) (which it does now)? The enwiki article there is also partially about D2 as a substance, though it's mainly discussing deuterium the isotope as a type of atom. ArthurPSmith (talk) 19:51, 15 June 2022 (UTC)
facet is pretty meaningless. We should be able to express easily that dihydrogen is made of two hydrogen atom.
⟨ dihydrogen (molecule) ⟩ has part Search ⟨ dihydrogen atom ⟩
quantity (P1114) View with SQID ⟨ 2 ⟩
⟨ dihydrogen (substance) ⟩ has part Search ⟨ dihydrogen (molecule) ⟩
That should be pretty simple.
Maybe we should just do linking enwiki to « substance », frwiki to « atom », and sort out the interwiki manually by traditional interwiki or by an automated template. author  TomT0m / talk page 20:03, 15 June 2022 (UTC)
No, I don't think that's a good solution. The enwiki articles aren't mainly about the "substance" aspects of the elements though they do mostly discuss that part (see my notes above). Hydrogen is a bit special anyway; what would you propose for carbon or selenium, etc. ? ArthurPSmith (talk) 13:16, 16 June 2022 (UTC)

List of disputed elements and their substances

Listed and maintained in /Tools now:

-DePiep (talk) 09:32, 15 August 2022 (UTC)

Simple substances and allotropes

Full table listed and maintained in /Tools now:

-DePiep (talk) 09:36, 15 August 2022 (UTC)
(talk on isotopes moved to #Queries and listings) -DePiep (talk) 06:50, 28 July 2022 (UTC)

Allotrope or allotropes?

There are 11 items labeled "allotrope(s) of <chemical element>":

  1. allotrope of boron (Q4733226)
  2. allotrope of carbon (Q622460) (eg diamond (Q5283), graphite (Q5309); from carbon (Q623))
  3. allotrope of nitrogen (Q56271974)
  4. allotrope of oxygen (Q428653)
  5. allotrope of silicon (Q107157182)
  6. allotrope of phosphorus (Q14714096)
  7. allotrope of sulphur (Q1094078)
  8. allotrope of iron (Q646981)
  9. allotrope of arsenic (Q103608080)
  10. allotrope of tin (Q112708119)
  11. allotrope of plutonium (Q4733227)
  12. allotrope of lithium (Q113195802) (added 23 July 2022)
  13. allotrope of tungsten (Q113293735) (added 26 July 2022: Nature 1955)

Related: allotropy (Q81915), allotrope (Q21198401).

Should the label be singular or plural, for all? DePiep (talk) 09:45, 10 July 2022 (UTC)

There were discussions in the past whether labels of classes (groups) in chemistry should be singular or plural. Since then classes in English, Polish and some other languages are in singular, plural is placed in aliases. So I think singular in labels, plural in aliases in this case too. Wostr (talk) 12:51, 10 July 2022 (UTC)
✓ Done for English. DePiep (talk) 09:41, 11 July 2022 (UTC)
Yes, the area needs more strucure in this. Hope I can add something, also wrt your proposals above. Challenging, both chemically and WD-classifying. -DePiep (talk) 19:48, 23 July 2022 (UTC)
The situation is quite exemplary for the state of things in Wikidata and knowledge engineering in general. The question about allotrope or allotropes is fascinating: the second is the class of compounds where the first is an instance of the things in that class. Ontologists come back to the question about instance versus class a lot anyway. The ChEBI ontology models chemical structures as classes, while Wikidata models them as instances. There is something to be said for both. Wikidata needs some freedom; it does not have a single upper ontology and needs to balance the needs from many different research fields with different habits. Many discussions we have in this community are in the end just a clash of research communities, not of individuals (don't shoot the messenger). Some pragmatism is useful here. All that context said (with no particular purpose other than context), I am happy we are moving away from assigning physical properties like density to elements :) --Egon Willighagen (talk) 05:29, 25 July 2022 (UTC)

austenite (Q487286) alloy and allotrope split

austenite (Q487286) is both alloy and allotrope. I propose to have separate items for these two. Then what to do with this existing one? Discussed at talk:austenite (Q487286). -DePiep (talk) 04:49, 28 July 2022 (UTC)

Working with metaclasses

Pending the discussion on Metaclasses, P31 and P279 above, I have used (Pt)/created(Au, Ag, Pd) these four simple substance (Q2512777) items for being a formal currency (Q8142) too (1 troy ounce unit etc).

I understand, later on the classifications tree might need adjustment (as do the placement of melting point (Q15318) etc.). Assuming I did not break nor enforce anything. DePiep (talk) 11:12, 19 October 2022 (UTC)

I think these items are all instances of Wikipedia article covering multiple topics (Q21484471) and the objects of their instance of (P31) statements should all be objects of main subject (P921) statements instead. --SCIdude (talk) 10:08, 21 November 2022 (UTC)
@SCIdude I don’t think so, the objects seems clearly legally defined, what count as « gold » as far as finance is involved. A better model to link with « investment » would be https://www.wikidata.org/wiki/Property:P366 Search than a catch all article de Wikipedia couvrant plusieurs sujets (Q21484471) which should in most cases be avoided and seems usually like « I don’t have a better idea ».
The fact that some possession titles are traded is not really relevant as it’s just the way to move the property of real objects, I think. author  TomT0m / talk page 11:10, 30 November 2022 (UTC)

Formally incorrect CAS Registry Numbers

In methyl 3,4,6-tri-O-benzyl-α-D-glucopyranoside (Q113645596), I noticed that the CAS Registry Number provided there (40246-52-1) is formally incorrect. It would probably be good to have all CAS Registry Numbers checked. The incorrect cases could be reported on Wikidata:Database reports/Constraint violations/P231. Leyo 11:30, 16 November 2022 (UTC)

Wikidata:Database reports/Complex constraint violations/P231. However, I'm not sure if the query added on Property talk:P231 as a complex constraint is correct. Wostr (talk) 13:31, 16 November 2022 (UTC)
Are you referring to [1-9]\d{1,6}-\d{2}-\d? There, the check digit may be any digit. --Leyo 15:41, 16 November 2022 (UTC)
Nope, check the 'Invalid CAS number' complex constraint below with a query: SELECT ?item WHERE { ?item wdt:P231 ?cas . BIND(REGEX (str(?cas), '^[1-9][0-9]{1,6}-[0-9]{2}-[0-9]$') AS ?correct_pattern ) BIND(replace(str(?cas), "-","") AS ?c) BIND(STRLEN(?c) AS ?strlen) BIND(xsd:integer(substr(?c,?strlen,1)) AS ?val ) BIND(xsd:integer(substr(?c,?strlen-1,1)) AS ?x1 ) BIND(xsd:integer(substr(?c,?strlen-2,1)) AS ?x2 ) BIND(xsd:integer(substr(?c,?strlen-3,1)) AS ?x3 ) BIND(IF(?strlen>4,xsd:integer(substr(?c,?strlen-4,1)),0) AS ?x4 ) BIND(IF(?strlen>5,xsd:integer(substr(?c,?strlen-5,1)),0) AS ?x5 ) BIND(IF(?strlen>6,xsd:integer(substr(?c,?strlen-6,1)),0) AS ?x6 ) BIND(IF(?strlen>7,xsd:integer(substr(?c,?strlen-7,1)),0) AS ?x7 ) BIND(IF(?strlen>8,xsd:integer(substr(?c,?strlen-8,1)),0) AS ?x8 ) BIND(IF(?strlen>9,xsd:integer(substr(?c,?strlen-9,1)),0) AS ?x9 ) BIND(?x1+?x2*2+?x3*3+?x4*4+?x5*5+?x6*6+?x7*7+?x8*8+?x9*9 AS ?sum0) BIND(?sum0-(xsd:integer(?sum0/10)*10) AS ?sum ) BIND(?sum=?val AS ?correct_checksum) FILTER(!?correct_pattern) FILTER(!?correct_checksum) } Wostr (talk) 17:30, 16 November 2022 (UTC)
Okay, but why doesn't it have any hits then? --Leyo 21:37, 16 November 2022 (UTC)
I suppose there is some error in this query... Wostr (talk) 18:39, 18 November 2022 (UTC)
Who might help here? --Leyo 15:34, 22 November 2022 (UTC)
The check was broken. If the format was correct it would not check the checksum. With any luck it should work now. Infrastruktur (talk) 16:48, 22 November 2022 (UTC)
Seems like. Thanks a lot. --Leyo 08:44, 23 November 2022 (UTC)
Actually, while there is this concept of pattern, I am told there are true exceptions: valid CAS registry numbers that do not match the pattern. Sadly, I don't have a list. (just a side note) --Egon Willighagen (talk) 08:58, 30 November 2022 (UTC)

CCDC Number and CSD Refcode

Hi all, I have been talking with the CCDC and we looked at CCDC Number and CSD Refcode. Currently, there is CCDC Number (P6852) but the examples for that property are actually CSD Refcodes. We want to propose to have both CCDC Number and CSD Refcode as properties as they are complementary. I will write up a property proposal for CSD Refcode asap. Egon Willighagen (talk) 09:03, 30 November 2022 (UTC)

Notified participants of WikiProject Chemistry Egon Willighagen (talk) 09:12, 30 November 2022 (UTC)
 Support, and one substance may have more than one CCDC No. corresponding to different temperature, pressure, etc. --Leiem (talk) 09:26, 30 November 2022 (UTC)
The proposal new property: https://www.wikidata.org/wiki/Wikidata:Property_proposal/CSD_Refcode Next stop, write down the fixes for CCDC Number (P6852). --Egon Willighagen (talk) 09:39, 30 November 2022 (UTC)
So, new CSD code, proposal link: Wikidata:Property proposal/CSD Refcode.
Existing property: CCDC Number (P6852) DePiep (talk) 12:02, 30 November 2022 (UTC)
 Support --Hugo (talk) 11:07, 30 November 2022 (UTC)
Both identifiers are useful, and improving their use and representation here is a good idea. --Daniel Mietchen (talk) 11:38, 2 December 2022 (UTC)

Update, CSD Refcode (P11375) has been accepted and I applied the needed fixes to CCDC Number (P6852). I moved the statements for 79 compounds from CCDC Number to CSD Refcode, see https://quickstatements.toolforge.org/#/batch/107654 The next step is to get more data in, and perhaps discuss if we also want the iCSD Refcode (if someone has an immediate use case in mind); if not, we can return on that later. Thanks, all! --Egon Willighagen (talk) 10:23, 29 December 2022 (UTC)