Wikidata talk:WikiProject Chemistry/Archive/2023
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Metaclasses for chemical entities and using instance of (P31)/subclass of (P279)
This discussion is not meant to resolve the classification problems of chemical entities (like how anions or tautomers should be classified regarding chemical classes or how to define 'chemical compound'), only to determine the basics in terms of using instance of (P31) and subclass of (P279) in items about chemical entities.
I. Proper metaclasses for all chemical entities
From the beginning of mass imports of chemical data to Wikidata, the basis for chemical entities was instance of (P31)chemical compound (Q11173). This statement was added to every item describing chemical entity, regardless of the nature of chemical entity described, and served as a metaclass for all chemical entities. However, it is also a regular class in chemical classification (was present in some items describing classes of compounds as subclass of (P279)chemical compound (Q11173)) and is not true for some chemical entities. The need for a metaclass for chemical entities is quite obvious:
- it helps to retrieve data about all chemical entities in Wikidata, check their completeness, create reports and statistics, make it easier to create appropriate constraints in properties;
- it also helps users to understand the concept that is described in a specific item, facilitating the selection of appropriate items and preventing some of the incorrect edits resulting from misunderstanding the content of an item.
The current chemical compound (Q11173) pseudo-metaclass fails to do so for the following reasons:
- it is not applicable for many chemical entities, like simple substances, some radicals, ions;
- it is a part of chemical classification, in some situations it appears to be redundant while it is present alongside its subclasses — this also creates confusion to users who do not understand why a superclass should be present in an item;
- there are many borderline cases regarding chemical compound (Q11173) where we are not sure whether or not some chemical entity can be properly classified as chemical compound or not.
None of the items like chemical entity (Q43460564), molecular entity (Q2393187), chemical species (Q899336) or chemical substance (Q79529) seems to be adequate metaclass for chemical entities as all are a part of chemical classification at some point.
Proposition I
Introducing two groups of metaclasses for chemical entities:
- type of chemical entity (Q113145171) — for all stereochemically or isotopically defined chemical entities; it would replace all instance of (P31)chemical compound (Q11173) statements in items about chemical entities and would be added to all entities that now lacks such statement (like some ions or items in which users deleted such statement due to the confusion described above).
- class of chemical entities metaclass — for all items that describes groups or classes of chemical entities; some metaclasses are already present under an ill-named group or class of chemical entities (Q72070508) that covers metaclasses for both open classes and closed classes.
In other words: an item describing chemical entity(ies) should be modelled either as a type of chemical entity or as a class of chemical entities
Open vs closed class
This distinction results from a ChEBI ontology (described in their documentation), while it is in some way present in other databases. It helps in determining what kind of class is described in an item and how numerous this class can be. In short:
- open class — class of chemical entities that have an infinite number of possible members
- closed class — class of chemical entities that have restricted number of members, usually limited to a few members.
Example:
- trihydroxybenzene (Q56697523) is an open class as it describes any chemical compound having 'a benzene ring with three hydroxy group attached to it' as part of its structure, therefore it has instance of (P31)structural class of chemical entities (Q47154513) statement (which is one of the possible metaclasses for open classes)
- benzenetriol (Q411618) is a closed class as it describes any chemical compound that has a structure comprised of a benzene ring and a three hydroxy group attached to it, i.e. a group of three isomers (plus isotopically modified compounds), therefore it has instance of (P31)group of isomeric entities (Q15711994) (which is one of the possible metaclasses for closed classes)
II. Using subclass of (P279) in items about chemical entities
Statements related to chemical, medical, biological, industrial etc. classification of chemical entities are now spitted between a number of properties. Sometimes the same class is added to items using different properties what is causing problems with querying and curating the data.
Proposition IIa
In items about chemical entities limit the use of subclass of (P279) to chemical classes only. For other classes use has use (P366), subject has role (P2868) or other more specific property.
In other words:
- classes like lactone (Q59078), alkane (Q41581), indoles (Q55698578), butanol (Q663902) or (RS)-2-methyl-1-butanol (Q209425) should be added using subclass of (P279)
- classes related to function or action of a chemical entity (e.g. pharmacological action), like anticoagulant (Q215118), antidepressant (Q76560), diuretic (Q200656), carcinogen (Q187661), enzyme inhibitor (Q427492) should be added using subject has role (P2868)
- classes related to the use of a chemical entity in a specific field, like medication (Q12140), insecticide (Q181322), flavour enhancer (Q898745), nerve agent (Q2612896), solvent (Q146505), drug (Q8386) should be added using has use (P366)
- some classes that are currently added using instance of (P31) or subclass of (P279) should be moved to safety classification and labelling (P4952), e.g. occupational carcinogen (Q21074597), Class IIIB combustible liquid (Q21009059).
Proposition IIb
In items about chemical entities the use of subclass of (P279) should be abandoned. Other, more specific properties should be used instead.
Chemical classification should be added using new property, like chemical classification or higher class in chemical classification etc. (chemical classificationsubproperty of (P1647)subclass of (P279)). Other classes should be added using properties mentioned in proposition IIa:
- classes related to function or action of a chemical entity (e.g. pharmacological action), like anticoagulant (Q215118), antidepressant (Q76560), diuretic (Q200656), carcinogen (Q187661), enzyme inhibitor (Q427492) should be added using subject has role (P2868)
- classes related to the use of a chemical entity in a specific field, like medication (Q12140), insecticide (Q181322), flavour enhancer (Q898745), nerve agent (Q2612896), solvent (Q146505), drug (Q8386) should be added using has use (P366)
- some classes that are currently added using instance of (P31) or subclass of (P279) should be moved to safety classification and labelling (P4952), e.g. occupational carcinogen (Q21074597), Class IIIB combustible liquid (Q21009059).
This proposal does not exclude the need to create additional properties for specific uses, e.g. based on MeSH or ChEBI relations (biological role for example).
Rationale, possible problems
- – This is already true for many statements related to pharmacology as usually subject has role (P2868) is used for such situations. The main reason for this change is to separate different groups of statements which are now added in a variety of ways, none of which are entirely wrong.
- – There may be some borderline cases in which both subject has role (P2868) and has use (P366) could be used. It would require an arbitrary decision in this case which one should be used.
- – Moving medication (Q12140) from instance of (P31) to other properties may require creating a separate metaclass for pharmacological entities.
III. Using has part(s) (P527) for atomic composition
Property has part(s) (P527) is used now for a variety of things, from elemental composition, functional group, type of bonds, rings etc., but in case of chemical composition it is redundant and IMO factually wrong.
- – There are already classes like carbon compound (Q2901852) which cannot be removed from the classification tree and are superclasses of many items.
- – Elemental composition can be also easily retrieved from the chemical formula (P274).
- – It's not true that chemical entityhas part(s) (P527)chemical element – it 'has part' atom(s) of chemical element, sometimes it 'has part' ion(s) etc.
Proposition III
Accept the recommendation to eliminate over time all statements like chemical entityhas part(s) (P527)chemical element in favour of proper superclasses like compound of X (chemical element) added either directly do the item or somewhere higher in the classification, and using chemical formula (P274) (regex) for this purpose.
Discussion
As English is not my first language, there may be some ambiguities for which I am sorry. Establishing clear rules, especially with regard to point I, seems to me very important as we don't have a way to retrieve all the needed data, as some items have instance of (P31) = 'chemical compound', in some this statement was modified (rightly or not) or deleted; in many situations this statement is wrong (e.g. for simple substances). As for the point II, I would prefer option IIa, it seems more consistent, it is already present in some items and it would not require the creation of another property. I hope we can resolve the above problems and work out the best solution through discussion here. Wostr (talk) 15:40, 8 July 2022 (UTC)
- Support for I, IIa and III. I'm intrigued by your open vs closed class issue - this is a useful distinction I hadn't had a way to think clearly about before, and I suspect we ought to think about it more widely in Wikidata. I still don't think this quite settles the issue of "molecule" vs "substance" or the implied context of a particular chemical entity - where would something like nitrate ion (Q182168) fit here? ArthurPSmith (talk) 17:58, 8 July 2022 (UTC)
- @ArthurPSmith: the distinction between open and closed class is not mine and it is not unambiguous in all cases (there may be some borderline cases in which both metaclasses seem to be correct), however, I think it does more good than harm, as we have classes like chlorobenzene (Q1075329) and chlorobenzene (Q72697380). Right now I don't use 'open' and 'closed' terms in labels, I chose 'class' vs 'group' (as in e.g. 'structural class of chemical compounds' vs 'group of isomers') – it seems quite intuitive in my language, however, I don't know if it is also intuitive in English. These propositions do not intend to settle any issue about the chemical classification of items – this is more complex issue and right now I'd like to focus on more basic issue that would allow us to cleanup the items a little bit, help us in organisation and curation of data. The non-existent yet metaclass 'type of chemical entity' is based on the ChEBI definition of 'chemical entity' (that is not equal to the definition of 'molecular entity') and covers both molecular entities, functional groups and chemical substances. So, distinction between molecular entity and chemical substance is not so important in this proposition; nitrate ion (Q182168) would have P31 = 'type of chemical entity' and all classes that are now added via P31 moved to P279 – these classes would define whether the item describes a molecular entity or chemical substance. Wostr (talk) 19:26, 9 July 2022 (UTC)
- Ok with IIa. For Proposition I, I don't like the proposition of classifying by creating abstract classes unknown by any casual contributors. I only support classification based on definition. Please provide the definition of chemical compound first before saying that we can't use this term as classification criterion. What is the definition of chemical entity (Q43460564), molecular entity (Q2393187), chemical species (Q899336) or chemical substance (Q79529) ? People are using those terms without any understanding of the concept because we propose no good, clear definition. So instead of providing better definition, we add new concepts which are creating more complexity and will fail to solve the classification problem.
- If we accept the following definition chemical substance (Q79529) = "Matter of constant composition best characterized by the entities (molecules, formula units, atoms) it is composed of. Physical properties such as density, refractive index, electric conductivity, melting point etc. characterize the chemical substance." (IUPAC) and that chemical compound (Q11173) is a subclass of chemical substance (Q79529), with the difference that chemical compound (Q11173) has an additional property which is the fact that chemical compound (Q11173) has to be composed of several different elements. Derived from the definition of chemical substance (Q79529), chemical compound (Q11173) has to have physical properties so ions and radicals can't be defiend as instance or subclass of chemical compound (Q11173) because they can't be isolated in order to measure some physical properties.
- We have concepts like chemical substance (Q79529), pure substance (Q578779), chemical compound (Q11173), simple substance (Q2512777), why can't we create an classification using them and based on their definition ?
- And any good classification should not be chosen based on one single example but by creating a coherent network between entities: how can we link water (Q283), heavy water (Q155890), tritiated water (Q424236), methanol (Q14982), diiodine (Q2064483), butan-1-ol (Q16391), (+)-2-butanol (Q27104553), (R)-2-Butanol (Q70731894), (E)-cinnamic acid (Q164785), cis-cinnamic acid (Q4062664),... Snipre (talk) 19:58, 18 July 2022 (UTC)
- @Snipre: As I wrote in the propositions: definitions of 'chemical compound' etc. are not needed here, because these concepts are to be used (or not) in chemical classification, not as metaclasses, and this is not a discussion at all about what their definitions should be, it is irrelevant. The problem is that right now 'chemical compound' is used both as a metaclass (P31) and as a regular chemical class (P279), it is used inconsistently and in fact it does not allow any meaningful operation to be performed on the data set. So the problem is not how to define these concepts, just to introduce a new metaclass to cover all the concepts that interest us. We already have chemical entity (Q43460564) which is a superclass of most concepts, the proposition is to introduce a new metaclass type of chemical entity (Q113145171) to be added as P31 instead of a mismatched and questionable 'chemical compound', as you can't have it both in P31 and P279. The chemical classification in P279, as in proposition II, can then include all sorts of concepts, including or not 'chemical compound', 'chemical substance', 'molecular entity' – but that is not a discussion about the chemical classification and the definitions of such concepts. Wostr (talk) 08:18, 19 July 2022 (UTC) PS As to I don't like the proposition of classifying by creating abstract classes unknown by any casual contributors – we already have a lot metclasses like this (astronomical object type (Q17444909), disease of a particular individual (Q112193769) vs class of disease (Q112193867)). In ChEBI we also have such metaclasses, but are not a visible part of this database and given the scope of ChEBI, type of chemical entity (Q113145171) is not needed there, as all entries have chemical entity (Q43460564) as a superclass. Wostr (talk) 08:23, 19 July 2022 (UTC)
- Notified participants of WikiProject Chemistry Wostr (talk) 08:23, 19 July 2022 (UTC)
- As heavily related, I will post this page here: User:SCIdude/Modeling#Chemical ontology
- In my opinion, best way to model this would be to adopt the same mappings as for (biological) taxa.
- They need to be instance of (P31) taxon (Q16521), have a Parent taxon (P171) and a Taxon rank (P105).
- So having the specific isotope of a stereochemically defined chemical compound would be :
- specific isotope stereochemically defined chemical: parent chemical (P99999): stereochemically defined chemical
- stereochemically defined chemical: parent chemical (P99999): eventually partially stereochemically defined chemical
- eventually partially stereochemically defined chemical: parent chemical (P99999): stereochemically undefined chemical
- ...
- All subclasses of ... "chemical classification"? AdrianoRutz (talk) 09:16, 19 July 2022 (UTC)
- We can't have something similar to 'taxon rank' in chemistry, because chemical classification is not hierarchical in the way biological classification is. There are no universal lower or higher ranks in chemistry, the same concept can be on different levels in chemical classification depending on many structural or functional variables. However, your proposition is similar to option IIb, only with one additional property. I'm not sure however, that changing many metaclasses to a 'taxon rank' property would be useful in chemistry. Specific metaclasses in P31 would allow to distinguish different concepts ('type of chemical entity', 'open class of chemical entities' etc.) based on only one property, plus there is really no 'taxon'-like concept in chemistry (there is no concept that I know of that would cover 'class of chemical entities' and 'type of chemical entity' for example). Wostr (talk) 10:55, 19 July 2022 (UTC)
- I disagree with the "we can't". Biological classification is as true mess. Each concept can be at different levels, with different names, it is just fairly standardized (as an example, see one of the demo queries of WD: https://w.wiki/5UQt). It has just many decades forhead. We could achieve the same for chemistry. And yes, very near to proposal IIb, which would already be a great step forward. AdrianoRutz (talk) 11:33, 19 July 2022 (UTC)
- Such classification can also be achieved without any special properties, with just using correct metaclasses as P31 and putting chemical classification in P279. This way we don't need 'taxon rank' as it is present in P31 ('type of chemical entity', 'group of chemical compounds', 'class of ions' etc.), we don't need 'taxon'-like concept (that doesn't exist in chemistry) and we don't need 'parent taxon', as it is present in P279. Wostr (talk) 13:17, 19 July 2022 (UTC)
- Support For proposal IIb AdrianoRutz (talk) 14:42, 28 July 2022 (UTC)
- Such classification can also be achieved without any special properties, with just using correct metaclasses as P31 and putting chemical classification in P279. This way we don't need 'taxon rank' as it is present in P31 ('type of chemical entity', 'group of chemical compounds', 'class of ions' etc.), we don't need 'taxon'-like concept (that doesn't exist in chemistry) and we don't need 'parent taxon', as it is present in P279. Wostr (talk) 13:17, 19 July 2022 (UTC)
- I disagree with the "we can't". Biological classification is as true mess. Each concept can be at different levels, with different names, it is just fairly standardized (as an example, see one of the demo queries of WD: https://w.wiki/5UQt). It has just many decades forhead. We could achieve the same for chemistry. And yes, very near to proposal IIb, which would already be a great step forward. AdrianoRutz (talk) 11:33, 19 July 2022 (UTC)
- We can't have something similar to 'taxon rank' in chemistry, because chemical classification is not hierarchical in the way biological classification is. There are no universal lower or higher ranks in chemistry, the same concept can be on different levels in chemical classification depending on many structural or functional variables. However, your proposition is similar to option IIb, only with one additional property. I'm not sure however, that changing many metaclasses to a 'taxon rank' property would be useful in chemistry. Specific metaclasses in P31 would allow to distinguish different concepts ('type of chemical entity', 'open class of chemical entities' etc.) based on only one property, plus there is really no 'taxon'-like concept in chemistry (there is no concept that I know of that would cover 'class of chemical entities' and 'type of chemical entity' for example). Wostr (talk) 10:55, 19 July 2022 (UTC)
- Comment 1. I understand the issue, but I am not scholared enough to help. Indeed, more structure is helpful. -DePiep (talk) 10:24, 19 July 2022 (UTC)
- Moved my sideissue from here into dedicated section #Simple substances and allotropes. -DePiep (talk) 06:44, 23 July 2022 (UTC)
- To make it more clear I tried to draft my propositions + AdrianoRutz's proposition in Figma. Wostr (talk) 13:14, 19 July 2022 (UTC)
- Support Thanks for working this out. A always like the use of "role" in ChEBI and adopting this in Wikidata makes sense. Chemistry in Wikidata has been one of my side projects, and some of the guiding principles I have tried to convert to shape expressions. We should aim at doing that for all rules. That way, we can continue to introduce structure and have these rules and shapes help us curate all the data (e.g. I quite lost the overview of all types of classes we have). One question however (and no show stopper), what should we do when Wikipedia does not align well with these aspects? Because all these questions tend to come and apply to Wikipedia too. How should we handle the sitelinks? --Egon Willighagen (talk) 05:42, 25 July 2022 (UTC)
- @Egon Willighagen: the question about sitelinks is much more related to the problem described here rather than to this discussion. Changing the main metaclass for chemical entities (like in Proposition I) does not affect Wikipedia sitelinks, nor the rest of the propositions. It may however allow a better display of chemical classification in Commons infoboxes, maybe some day it may allow to present chemical classification in Wikipedia articles or help with categorisation, because right now – with lack of any guidelines in this matter – we have quite a mess. And as to the problem described on the page linked by me befohttps://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_24431&lang=en&viewMode=All&siblings=falsere: I don't think it can be solved easily and never really solved. I always try to move the sitelinks to the item which is the closest equivalent of the Wikipedia articles, but even in the simplest example (two items describing each enantiomer, item describing the structure with undefined stereocenter i.e. 'group of stereoisomers' and an item describing a racemate; Wikipedias describing it all in one article) you always have to make a compromise. Some solution to this would be creating redirects in Wikipedias, but as I wrote, this would be a solution, but only a partial one. Wostr (talk) 08:35, 25 July 2022 (UTC)
- Exactly what I was thinking about: "this would be a solution, but only a partial one.". I do think it is related to the page you link to. As said, I am happy with this step, though could have made that more clear (sorry). I would also support the idea to accept that some Wikipedia pages for chemicals do not have a one-to-one page in Wikidata. I'm looking forward to seeing where this goes. --Egon Willighagen (talk) 06:29, 28 July 2022 (UTC)
- @Egon Willighagen: the question about sitelinks is much more related to the problem described here rather than to this discussion. Changing the main metaclass for chemical entities (like in Proposition I) does not affect Wikipedia sitelinks, nor the rest of the propositions. It may however allow a better display of chemical classification in Commons infoboxes, maybe some day it may allow to present chemical classification in Wikipedia articles or help with categorisation, because right now – with lack of any guidelines in this matter – we have quite a mess. And as to the problem described on the page linked by me befohttps://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_24431&lang=en&viewMode=All&siblings=falsere: I don't think it can be solved easily and never really solved. I always try to move the sitelinks to the item which is the closest equivalent of the Wikipedia articles, but even in the simplest example (two items describing each enantiomer, item describing the structure with undefined stereocenter i.e. 'group of stereoisomers' and an item describing a racemate; Wikipedias describing it all in one article) you always have to make a compromise. Some solution to this would be creating redirects in Wikipedias, but as I wrote, this would be a solution, but only a partial one. Wostr (talk) 08:35, 25 July 2022 (UTC)
- Comment @Wostr: Thank you for organizing this.
I mostly Support IIb, it is a clear organization. We may use subclass of (P279) for relations that are still missing dedicated properties (e.g. every real-world molecule of the class levomethadone (Q6535776) is also of the class (RS)-methadone (Q179996)). But for all other cases, I agree with Wostr.
But I have to Oppose Proposal I; the names "type" and "class" are often used interchangeably in ontology management. All chemical entities on Wikidata are groups or classes or types, as they describe several real-world, three-dimensional molecules. You can see "ethanol" is a very closed class, with only one member.
There are differences in the way we see concepts like alcohols (Q156) and ethanol (Q153), but they are both classes; and more importantly, they both have a series of properties in common.
I'd propose IV:
- IVa - One major class (merging type of chemical entity (Q113145171) and type of chemical entity (Q113681859)) as P31 for all chemical entities under ChEBI's "chemical entity" root (https://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_24431&lang=en&viewMode=All&siblings=false)
Why? ChEBI structures its ontology in that way; they share properties on Wikidata which currently have multiple domains (e.g. see https://www.wikidata.org/wiki/Property:P233#P2302);by having a shared P31 for all chemical entities we can easily pull out the chemical domain of Wikidata. To simplify the mapping.
- IVb - Additional P31 values for atoms osmium (Q751), highlighting their nature as a chemical entity and the other as the distinction as an atom. Other equally distinctive typing values like functional group (Q170409) could also receive additional values
Why? Because we still want to query directly for the most specific types; because we don't want to break old systems.
- IVc - Additional P31 values for differing between what we see as "chemical compounds" and "chemical classes" to split (e.g. alcohols (Q156) and ethanol (Q153)) one top IVa. \
Why? Because we still want to differentiate and searchwhat we see as "chemical compound" from what we see as "chemical class", as chemical compounds have e.g. a defined molecular weight, as outlined in the original proposal
- IVd- Use subclass of (P279) in place of the "chemical classification" new property in the figma (https://www.figma.com/file/hIUncmZaaWh9gxpXjUAVBT/Propositions?node-id=0%3A1), as the relation is one of subclass.
Why? To keep using Wikidata standard infrastructure and simplify modelling.The community is not always receptive for new "subclass of"-like properties (https://www.wikidata.org/wiki/Wikidata:Property_proposal/part_of_molecular_family)
Generally, I believe that having single P31 values works very well for human (Q5), but might not be fit for all knowledge domains. Maybe we can unify chemical entities with some flexibility for additional types and having few, predefined, relevant P31 values. TiagoLubiana (talk) 15:31, 3 September 2022 (UTC)
- @TiagoLubiana: thank you for taking part in this discussion. While I think in many points our views are similar, I want to point out some things I would like you to consider.
- Merging your first paragraph and proposition IVd gives more or less proposition IIa, not IIb. In short: proposition IIa is to use subclass of (P279) for chemical classification and use other more-specific properties for any other classes if possible.
- Ad IVa: I cannot agree with this and my argument is... ChEBI ontology. In ChEBI we have not one type of entries, but four types. Distinction between them is not directly indicated, but is quite clear from the ChEBI documentation. There are (1) molecular entities, (2) part-molecular entities, (3) open classes, (4) closed classes. There is no one common class for every entry in ChEBI like you propose with type of chemical entity (Q113681859). And in fact, there is no need for such class, especially as you proposing several more specific classes in IVb and IVc (which eventually would have to be subclasses of type of chemical entity (Q113681859), so type of chemical entity (Q113681859) would be redundant in all items in which sub-metaclasses are present). We already have widespread metaclasses like structural class of chemical entities (Q47154513) which covers most of ChEBI 'open classes', so the goal here with proposition I is to complete this metaclassification with a metaclass which would cover all ChEBI 'molecular entites'-like entries (which most now have instance of (P31)chemical compound (Q11173), some have 'ions' or 'radicals' etc.).
- What's more, this would not hamper quering: depening on the intended results, with one general metaclass like type of chemical entity (Q113681859) one have to limit the query excluding some metaclasses; with distinct metaclasses like type of chemical entity (Q113145171) or type of chemical entity (Q113681859), one have to combine the results. Both actions require a similar effort, however, given our usual needs, we query either for type of chemical entity (Q113145171) or for classes of entities like type of chemical entity (Q113681859).
- Also, I don't know any place in which entries like alcohols (Q156) and ethanol (Q153) would be classified under the same metaclass. The first is a class of entities, the second is a class of classes of entities. I can't see how mixing these two under one metaclass would be beneficial and I can't find similar approach in other fields in Wikidata.
- In other words, you propose one general metaclass type of chemical entity (Q113681859) and many sub-metaclasses (like in proposition IVb and IVc). I'd say to skip the general metaclass, and use only metaclasses you mentioned in proposition IVb and IVc. What you're proposing in IVc for 'chemical compounds' is in fact mostly proposition I with type of chemical entity (Q113145171).
- Ad IVb: the problem with this is something outside of the topic itself. In regards to chemical elements we have (or at least should have) three types of entries: (1) about chemical element, (2) about atom of this element, (3) about pure substance/molecular entity composed of atom(s) of this element. Due to disagreement over many years, many of these items have been merged. I agree with you that items about these entities should be included in the classification, but not items like bromine (Q879), but items like dibromine (Q2685750), i.e. items about molecular entities.
- Beside that, I have a strong feeling that our views in this matter are very similar, but maybe my not very good knowledge of English leads to some misunderstandings. Wostr (talk) 19:15, 5 September 2022 (UTC)
- @Wostr I just thought about something... once the decision regarding mappings will be fixed, we should strongly consider https://github.com/rwst/yaccl from @SCIdude to batch edit the chemical classification AdrianoRutz (talk) 11:48, 9 September 2022 (UTC)
- @AdrianoRutz We? I don't see other ways than me doing the actual work. I would be willing to do it if I hadn't the impression you're offhandedly imposing this on me right now. --SCIdude (talk) 07:42, 10 September 2022 (UTC)
- @Wostr I just thought about something... once the decision regarding mappings will be fixed, we should strongly consider https://github.com/rwst/yaccl from @SCIdude to batch edit the chemical classification AdrianoRutz (talk) 11:48, 9 September 2022 (UTC)
@SCIdude There is a clear misunderstanding then, my apologies. The goal of me mentioning your tool was not forcing you at all. First, I think it is the best we have right now so it is just pragmatic to want to use the best instead of investing hours doing yet another one. Second, I did not want you to be in charge of the edits, I really thought the people interested here would give it a look, and come with their own proposals for the batch edition. I am really sorry if I did not express myself correctly or gave you this impression, it was not intended at all. Hope we are aligned… AdrianoRutz (talk) 08:07, 10 September 2022 (UTC)
As I think most of the participants are inclined to options I and IIa (also TiagoLubiana's propositions seems to me very close, with only slight differences, to these two options) I plan to prepare appropriate project subpages explaining these changes and then carefully and gradually implement it. After that we will be able to check to what extent the applied solutions still require specific corrections and discuss it further. Wostr (talk) 19:02, 29 September 2022 (UTC)
- Agree. Small well-planned steps are essential for this to be done right. SCIdude (talk) 07:06, 30 September 2022 (UTC)
- @Wostr Hi, it seems like you took the initiative to launch some mass editing.
- Thank you for that. Any info somewhere? AdrianoRutz (talk) 13:04, 1 May 2023 (UTC)
- @AdrianoRutz: about the mass editing: yes and no. The first mass editing was done much earlier this year, about 1,2M edits (adding a new metaclass to every item), I've also updated this and this page. Right now I'm doing small batches (whenever I have time for this) with changes to P31, P279, P366, P2868 and other statements (some changes are done manually, most using QS). I'm trying to achieve some consistency (there are sometimes the same values in P31/P279 and in P336 and/or P2868, so I'm trying to clean this up) – the main goal right now is to have only type of chemical entity (Q113145171) and chemical compound (Q11173) in P31, other statements moved to P279/P336/P2868/... After this, I'd like to check whether we are ready to delete chemical compound (Q11173) from P31 – in some cases chemical compound (Q11173) would be moved to P279 (if there is no other class in P279 in item), in other deleted. Wostr (talk) 18:28, 3 May 2023 (UTC)
- @Wostr Thank you for the pointers, highly appreciated. I'll also try to move some statements accordingly. AdrianoRutz (talk) 19:19, 3 May 2023 (UTC)
- @AdrianoRutz: about the mass editing: yes and no. The first mass editing was done much earlier this year, about 1,2M edits (adding a new metaclass to every item), I've also updated this and this page. Right now I'm doing small batches (whenever I have time for this) with changes to P31, P279, P366, P2868 and other statements (some changes are done manually, most using QS). I'm trying to achieve some consistency (there are sometimes the same values in P31/P279 and in P336 and/or P2868, so I'm trying to clean this up) – the main goal right now is to have only type of chemical entity (Q113145171) and chemical compound (Q11173) in P31, other statements moved to P279/P336/P2868/... After this, I'd like to check whether we are ready to delete chemical compound (Q11173) from P31 – in some cases chemical compound (Q11173) would be moved to P279 (if there is no other class in P279 in item), in other deleted. Wostr (talk) 18:28, 3 May 2023 (UTC)
- I have cleaned up P31 statements in every item with a instance of (P31)type of chemical entity (Q113145171); in many of these items only chemical compound (Q11173) remained, which I plan to clean up in August/September (move it to P279 or delete, if there are subclasses of chemical compound (Q11173) present in an item).
- After moving many classes from P31 to P279/P2868/P366 or other more relevant properties, there are still some classes that are present in different properties in items. I have prepared lists of problematic classes in each of these properties and I intend to take care of this by the end of September as well.
- Items related to functional groups and other fragments of molecules also remained to be sorted out. This is a relatively small number of items that I will tidy up during other works.
- I have also noticed a number of issues that will need to be addressed in the longer term:
- Items about 'groups of stereoisomers' often mix up statements or external-ids with those about racemic mixtures. This usually applies to drugs. I changed the constraints of some medicine-related properties accordingly to show violations in some cases, in which most likely the item should be split into two separate items (one for 'stereoisomer group', the other for 'racemic mixture'). I plan to describe this problem and how to solve it on the appropriate subpage of the project.
- Items about polymers very often combine information about macromolecules, mixtures of macromolecules (polymers) and plastics. Then there are items related to prepolymers, resins, etc. After some initial changes, I got feedback that (1) my idea of classifying polymer items as generally 'mixtures of chemical entities' ('type of polymer' which is a subclass of 'mixture of chemical entities') is not always correct, (2) there are significant discrepancies between the terminology in different languages, and thus there may be problems with ordering this field, (3) there are some borderline cases with which there may be classification problems. This issue will need to be addressed in the future.
- User Zcp3000 (not inactive) made a significant number of edits, most of the appear to be correct. However, in dozens of item I have found that from this account have been added an InChIKey to the wrong item (items about genes, scientific articles etc.) probably based on label. Then, automatic tools imported data from external databases to such items (example). conflicts-with constraint (Q21502838) with DOI (P356) are present in InChI (P234) and InChIKey (P235) – at least some of these situations could be caught thanks to this (as well as some other issues that I'll clean up in the near future), but if time permits i will be reviewing all editions from this account.
- At the same time updated information on 'Guidelines' subpages. Some of the information is still being updated and expanded. In the course of all this work, I also created the 'Issues' subpage, which I will write about in separate threads. Wostr (talk) 14:25, 11 August 2023 (UTC)
Here is a discussion that proposed to overturn a previous discussion (i.e. to restore all IUPAC GOLDBOOK entities), Wikidata_talk:WikiProject_Chemistry/Archive/2022#RFD:_delete_IUPAC_GOLDBOOK_entities_"scholarly_article". Please comment there.
@Vladimir Alexiev, ArthurPSmith, Wostr, 99of9, Snipre, SCIdude:. GZWDer (talk) 16:28, 5 December 2022 (UTC)
- I think the existing scholarly articles already junk up the search. They should be a different entity to exclude in searches. Until that is implemented, IUPAC Gold Book ID (P4732) is the much better solution in this realm. This way, the ID can also be re-used in the existing templates on wikipedia:de:Vorlage:Gold Book for example. Matthias M. (talk) 13:27, 24 July 2023 (UTC)
- PS: They were not properly deleted Q103857383 etc. still exist. Matthias M. (talk) 19:47, 24 July 2023 (UTC)
Use of P5008 and/or P6104
Notified participants of WikiProject Chemistry Should Wikidata:WikiProject Chemistry be using on focus list of Wikimedia project (P5008) and/or maintained by WikiProject (P6104)? I was not aware of these properties and I am tempted to think anything that is a type of chemical entity (Q113145171) is our focus list, not? What do you think? --Egon Willighagen (talk) 15:25, 9 July 2023 (UTC)
- I noticed these properties some time ago, I'm not so sure about the difference between them. I've added this property to a few items (like type of chemical entity (Q113145171)) to indicate (or rather in hope) that every major change should be discussied in wikiproject, but I don't know what we would gain by adding this property to some million+ items. It would be possible to query all items that are related to chemistry, but does anyone need to query such thing? Wostr (talk) 15:39, 9 July 2023 (UTC)
- Ah, now I still forget to add it, but this discussion triggered my question: https://www.wikidata.org/wiki/Wikidata_talk:WikiProjects#Consistency_in_tagging_WikiProjects_via_P5008_and/_or_P6104 --Egon Willighagen (talk) 15:41, 9 July 2023 (UTC)
- @Wostr: for chemicals I agree, but perhaps it would be useful for name reactions, famous/award-winning chemists, historical sites, etc? Egon Willighagen (talk) 05:53, 5 August 2023 (UTC)
- When this is added to the chemical entities, I fear that the whole may become unqueriable. Even now (about 1.2M items for chemical entities) many queries give a timeout error. So the question is, in my opinion, what might we want or need to query other than chemical entities. I suspect that we can use this property for all chemistry-related items that are sort of scattered around without a uniform metaclass. Wostr (talk) 13:14, 11 August 2023 (UTC)
- A recent Scholia (Q45340488) extension shows info on WikiProjects and uses it. The Chemistry one lives at: https://scholia.toolforge.org/wikiproject/Q8487234 --Egon Willighagen (talk) 15:39, 9 July 2023 (UTC)
Translation of WikiProject Chemistry pages
Given that WikiProject pages are primarily a working space, and English remains the sole working language here anyway, I propose to abandon any attempt to translate these pages into other languages. Personally, English is not my native language and sometimes I have problems communicating at an appropriately understandable level, but the translation of subpages of the project does not make it easier.
Many subpages change over the course of weeks or months, which would require translation of certain changes on an ongoing basis; there are few of us here, taking into account specific languages, sometimes there is only one user per language. Attempts to make this WikiProject multilingual are, in my opinion, a waste of time that could be spent on more necessary tasks. Even after so many years this WikiProject homepage, it is only partially translated into many languages, and those into which it is fully translated are only due to the lack of changes on the homepage for many years. Wostr (talk) 14:37, 11 August 2023 (UTC)
Issues related to chemistry-related properties or items
When cleaning up items regarding the work on metaclasses, I also described several issues on separate subpages of this WikiProject. Details in the sections below. These are suggestions on how to solve a given problem, so a discussion is highly recommended and any comments and remarks are most welcome. I know that there are still many other issues (like the issue with tautomers) that need to be addressed. Wostr (talk) 14:25, 11 August 2023 (UTC)
Right now I described only the problem with InChI (P234) and the 1500-character limit, which I moved and expanded from the property discussion page. For many months now I always deal with this by adding:
InChI |
|
Value of this statement is set to some value, rank is set to deprecated with a specific reason for deprecation. | ||||||||||||
add value |
and I propose to make this a (temporary) 'good practice' for this issue. The best option would be to increase the max limit of characters, but it's probably not possible to the extent this property would need (about at least 3–4 times the current limit). The other solution would be to split long InChIs and add in fragments. However, this raises some problems: how to do it (separate statements with series ordinal (P1545), in the form of qualifiers?) and whether it will be re-usable for users at all. Wostr (talk) 14:25, 11 August 2023 (UTC)
Problem similar to above. There is currently a limit of 250 characters for labels and aliases. In many cases, the chemical entities described in WD have only systematic names that are well over 250 characters long. In these cases, most often the items: (1) have no name, (2) have the name in the form of InChIKey, (3) have the name in the form of a different identifier (e.g. CID, UNII), (4) have the wrong name.
There seem to be two solutions here:
- don't set any name at all and leave labels and aliases blank
- use InChIKey as a temporary name.
Of the two options, I would suggest using the latter. InChIKey is a short, non-proprietary identifier that uniquely identifies a chemical structure. In addition, it would then be possible to automatically check the number of such cases by comparing label and InChIKey (P235).
The problem here is also automatic and semi-automatic changing of labels. In some cases, there are short names in the databases, which, however, turn out to be incorrect and misleading (e.g. it is a correct name, but for a structure with a different spatial configuration). Therefore, along with the proposal to use InChIKey in such situations, I suggest that changing labels from InChIKey to other names should be done only manually. Wostr (talk) 14:25, 11 August 2023 (UTC)
In ChemSpider there are entries for both 'undefined' and 'unknown' stereocenters generated using non-standard InChI. In WD we do not distinguish between such entries, as we mainly describe chemical entities based on standard InChI. From our point of view, both identifiers in ChemSpider are valid and refer to the same item.
I propose to solve this by adding a proper qualifier and mark one ID as preferred:
ChemSpider ID |
| |||||||||||||||||||||||
add value |
As 'preferred' would be marked an ID that uses '?' symbol for stereocenter (just as in standard InChI), which usually (always?) has a lower ID. With 'normal' rank would be marked an ID that uses 'u' symbol for stereocenter (which is present in ChemSpider to deduplicate structures). Wostr (talk) 14:25, 11 August 2023 (UTC)
In some databases (PubChem, ChemSpider) chemical entities that exhibit predominantly ionic character may have more than one entry. This is due to the fact that it is difficult to show the ionic character of a bond in the form of a structural formula, thus the SMILES or InChI generation methods in principle allow to show either the ionic character or the covalent character of the bond. This leads to a situation where one chemical entity is described in databases in two ways. This therefore results in duplication of (1) identifiers, (2) structure-related properties (SMILES, InChI).
I consider it a mistake to describe such representations of chemical structures in separate items, and I believe that only one item should exist in WD in such situations, but with two sets of identifiers.
In the case of duplicated identifiers, I suggest adding appropriate qualifiers and marking one of the identifiers as 'preferred'.
ChemSpider ID |
| ||||||||||||
add value |
Why the 'normal' and 'preferred' rank instead of 'deprecated' rank of one of the identifiers? The entries in the database are not incorrect per se, one describes the chemical structure more accurately than the other.
In the case of duplicated structure-related properties (SMILES, InChI, InChIKey), I suggest to 'deprecate' one of the statements:
canonical SMILES |
|
First value shows a correct representation of a ionic compound, no need to set rank as preferred. The second value shows a covalent representation of a predominantly ionic compound, thus it is set to deprecated with a proper reason stated. | |||||||||||||||||||||
add value |
In this case, one of the representations of the chemical structure is generally much less correct than the other, hence the 'deprecated' rank with the appropriate qualifier. Wostr (talk) 14:25, 11 August 2023 (UTC) Wostr (talk) 14:25, 11 August 2023 (UTC)
- In some cases, which structure (ionic vs. covalent) is more correct isn't totally clear, because the structure differs between states of the substance. E.g.: in the CoSO4 example, the anhydrous compound might actually be a coordination polymer with sulfate ligands. NaCl is ionic in condensed phases, but molecular as a gas. Also, sometimes there are multiple solid-state polymorphs, e.g. sulfur trioxide. In principle, we could create a separate item for each structure, but that would quickly become unwieldy, and most sources don't specify precisely which polymorph they're talking about. I'm not sure of a great solution in general.
- Separately, there are also some other kinds of chemical relationship that result in multiple entries. For example, in Talk:Q1792796#Bad data from Pubchem there seem to be two different entries with the correct atoms and linkages; if I'm reading them correctly, they're two legitimate resonance contributors to the same structure. 73.223.72.200 22:56, 11 August 2023 (UTC)
- While I can't agree that e.g. NaCl forms a covalent molecules in gas phase, I agree that the presented approach would be at best true for normal conditions. Maybe the better approach would be to set all such statements as 'normal' and add proper qualifiers (like entry in a database describing the character of a chemical entity as covalent (Q121136454)) to both external-IDs and to structure-related properties. I'm too not sure about creating multiple items for each structure. We are doing this for e.g. carbohydrates and some tautomeric forms, but apart from carbohydrates, other tautomeric forms cause a lot of problems in WD. It would be similar if we wanted to duplicate items for each form of representing a chemical structure, especially since, unlike tautomers, it is not even possible to isolate this type of structures here, their existence is only the result of problems with generating a structure by certain tools or software. copper(II) acetylacetonate (Q1792796) is an example that in PubChem the same chemical compound has three different entries (I think all three IDs are correct here and the differences are due to the imperfections of the tools responsible for generating chemical structures). Wostr (talk) 13:45, 12 August 2023 (UTC)
- If we want to go further with the above problems, we have to treat in the same discussion zwitterion and tautomers like ketone/enol, imine/enamine, lactam/lactim... Most duplicated entries in WD are due to these two majors origins. The correct way would be to chose the more stable form at ambient conditions (covalent bond for NaCl in gas phase is not the common state of the molecule, so this can't be used to represents this salt), but there is not sufficient reference to confirm which is the more stable state. I would prefer therefore fix one form in the case of zwitterion and tautomers as the reference form for InChI/InChIKey/SMILES representations.
- Then whatever is the chosen solution, we have to fix the constraint violations: there is no sense to keep that kind of tools if there is no consistency between constraints rules and practice. If we decide that InChIKey is a single value property (see InChIKey (P235)), then we have to respect that choice and delete multiple values even with deprecated status.
- As general trend, I prefer to avoid multiple values with qualifiers like proposed by Wostr above. This is just a mess especially when retrieving data from an external system like WP. Reality can't be modeled in all details and I prefer to simplify the data structure and work more on adding valuable information than maintaining a complex structure of possible representations. Snipre (talk) 12:09, 14 August 2023 (UTC)
- Starting from the end: leaving some statements in the items with 'deprecated' rank does not pose any problem with retrieving the data. That why the ranks are in WD: the statement is 'deprecated', so (1) there is no risk that somewhere in WD new item will be added based on such statement, (2) such deprecated statement allows the item to be found by any user, but (3) any re-user know (or should know, based on general WD model) that only 'preferred'/'normal' rank statements should be used.
- I agree that the problem is broader, but the best solution in one area may not be the best in another. Even if we agree that some structural representations should not be placed in items in WD (InChI, SMILES), we cannot do the same with identifiers. At this stage, it seems to me inevitable that in some items we will have more than one external identifier due to the fact that not every database treats data in the same way, and on the other hand, in accordance with the general rules of WD, for each identifier, e.g. in PubChem, ChemSpider or ChEBI you can create an item and this item will be notable. So it seems impossible to have a 'single value' constraint in these situations, the only solution would be to go in the direction of 'one is preferred' (and establish which one we should mark as 'preferred'). In other databases the relation between entries is also not 1:1, 1:1 relationship between WD item and other databases is a nice idea, but not feasible.
- What's more, data consistency does not require to use 'single value' for identifiers. There are many options of simple constraints, we can always add some complex constraints and maintain the data consistency that way. Wostr (talk) 12:59, 14 August 2023 (UTC)
Why ? We have no obligation towards external databases to include all their entries. Instead of trying to merge all entries of external databases, we can, if we want, define what we as community of WD defines as possible items/values. If we define that zwitterions are not valid for creation of an item/statement, then we can excludes all external identifiers representing zwitterions. WD is not the phone book of external databases trying to connect everything. Having an external identifier is for me not sufficient if we have a clear policy. Accepting everything is the perfect example of no internal policy.Even if we agree that some structural representations should not be placed in items in WD (InChI, SMILES), we cannot do the same with identifiers- I just read one article regarding the tautomerism and there are 86 possible cases of tautomers. As described by the article, most databases are not consistent regarding the treatment of tautomers (see here. So accepting the existence of external identifiers as the only rule for item/statement creation will just import the mess of all databases in WD.
- Then I still waiting on the effect of your data maintaining based on simple indicators: the number of constraint violations of the following page. See
- Wikidata:Database reports/Constraint violations/P662
- Wikidata:Database reports/Constraint violations/P235
- Wikidata:Database reports/Constraint violations/P231
- The quality of WD is not the capacité of integrating all identifiers of most databases, the quality of WD is to have a set of data well organized and following a understandable policy by most of external people. That's my opinion. But this is the value of a database in general. Snipre (talk) 14:47, 14 August 2023 (UTC)
- The exception from the general notability rules (p. 2) would probably require a full-project discussion. Without it, anyone can add an item about a zwitterion or about a tautomer and frankly, we can't do anything about it, as such items are notable enough to be included in WD.
- InChI V2, which you've mentioned, is likely to have better recognition of tautomeric structures, however, still in non-standard InChIs.
- Leaving a certain part of external chemical databases outside WD will only result in one thing: constantly importing more items about these records from external databases. I have not seen any attempts to control this procedure so far, having already over 1.2M items, manual control over it is impossible. The only solution I see is a proper policy in which cases we should have 'one-to-many' linking to external-databases and how to qualify these external-ids so that it is understandable also in an automatic way.
- I've seen many times where 'deprecated' or duplicated IDs have been removed from items. This does not lead to anything good, because these external-IDs will reappear at some point, it will only be months or years before someone discovers them among over a million other items. That's why I say, and I will always say, removing something like this is short-sighted and will only lead to more work. I have never seen in any other database that they remove, even duplicated, links to other databases. It works like redirects in WP – it's better to have more, even 'deprecated' ones, because it allows you to find these items and prevents them from being imported in the future. Wostr (talk) 16:41, 14 August 2023 (UTC)
- Speaking as someone who has grappled with these problems for many years before my retirement (from Syngenta), some issues can be "solved" by using StdInChI and StdInChIKey rather than InChI and InChIKey. The former pair render all possible tautomers of a compound into an identical string and in my opinion this is correct when discussing chemical substances: tautomerism is a property of samples that also depends on temperature/solvent etc. I can't see any situation where there would be multiple articles in Wikipedia to cover multiple possible tautomers — they would always be merged into one article. Likewise with Zwitterions: we don't have an article for glycine as H2NCH2COOH and H3N+CH2COO- despite the latter certainly being the reality for all glycine samples in aqueous solution. Going a step further, polymorphism is also a property of a sample, not a substance, although in rare cases (e.g. ice, sulfur, phosphorus) Wikipedia has multiple articles to cover these. That isn't the general case: I authored ROY, = Q27281324, with >= 13 polymorphs and I don't think it would be helpful to have a Wikidata item for each. Michael D. Turnbull (talk) 15:09, 14 August 2023 (UTC)
- But our items are not about substances only. Many, or even most of the statements refer to the molecular entities. And therefore, fortunately or unfortunately, our items are a combination of both definitions, substance and molecular entity. InChI/InChIKey has its limitations as even this identifier sometimes fails to properly describe a structure (there are situations in which the same substance has different InChIs as a result of incorrect recognition of tautomeric structures in the InChI software).
- The reference to Wikipedia, on the other hand, I think is fundamentally incorrect. By definition, Wikidata is intended to describe the world in much more detail than encyclopedic articles. Chemistry is no exception here. Since Wikipedia only lists isotopes, should we do the same in Wikidata and remove all items about them? Because Wikipedia describes racemic mixtures, and there are no separate articles on individual stereoisomers, should we remove items about stereoisomers? Wikipedia is not an indicator of how Wikidata is supposed to work, much less Wikidata is not an information base for Wikipedia. Wostr (talk) 16:41, 14 August 2023 (UTC)
Can someone reimport chemical formula (P274)?
Notified participants of WikiProject Chemistry see Wikidata:Project_chat. Midleading (talk) 09:26, 19 October 2023 (UTC)
- I think I found the chat you referred to. Reimporting PubChem is for me not on the table right now. It sounds very complicated, and requires evaluating the history of an item, and see if edits have been made since the import. What I can help with, is create curation lists. -- Egon Willighagen (talk) 09:48, 19 October 2023 (UTC)
- I really don't see a problem here. Formulae imported from PubChem are correct, but the notation may not be the one you are looking for. There are many ways to write a chemical formula, Hill notation is the best to use in databases, but may not be preferred in other uses. So the problem here is the lack of other formulae you are looking for. Wostr (talk) 00:00, 20 October 2023 (UTC)
- PubChem is just an aggregator (like WD). If you notice valuable annotation in PubChem, it often comes from data sources we already have or want to import directly, instead of from PubChem. That said, data sources we cannot import directly are patents, scrapings from the literature, and SID submissions from the industry (if they have no other source). Concentrating on these cases would be valuable. --SCIdude (talk) 07:31, 21 October 2023 (UTC)
- Hi,
- I just started the deletion of 22,959 chemical formulas that were violating the constraints. (See https://quickstatements.toolforge.org/#/batch/215303, https://quickstatements.toolforge.org/#/batch/215304, https://quickstatements.toolforge.org/#/batch/215305, https://quickstatements.toolforge.org/#/batch/215306, and https://quickstatements.toolforge.org/#/batch/215307)
- I am also trying to complete the different masses/formulas from compounds where they are not present and can be easily calculated at the same time.
- The next step is to see if the formula is matching the SMILES but this will take more time. AdrianoRutz (talk) 14:33, 26 October 2023 (UTC)
- Many of these formulae seemed valid, but where imported with minor errors (like not using the subscript). I'd be good to check in which cases some formula can be reimported and which items will be left without a formula. Wostr (talk) 18:19, 26 October 2023 (UTC)
- I will do it when all the correctly formatted ones will be finished re-importing.
- Or I could post the list here so it can be fixed before re-import. Else I will simply re-import the violating ones, but representing probably only a few percent in comparison to the original amount before this operation. AdrianoRutz (talk) 10:05, 27 October 2023 (UTC)
- @AdrianoRutz: this doesn't seem like a right way to approach constraint violations. Also I don't see based on what you gathered these 22,959 formulas/items really. You removed many formulas that seem perfectly valid, including very simple ones like B(OH)₃ in sassolite (Q424769). E.g. in lanthanite-(La) (Q3826951) I restored the formula and as far as I can see there is no constraint violation. In some items a simple fix was needed, e.g. in walpurgite (Q1531254) it was probably only needed to replace
*
with·
. 2001:7D0:81DB:1480:45C:F54A:B289:F778 09:52, 27 October 2023 (UTC)- Hi, I took the regexp of the property (https://www.wikidata.org/wiki/Property:P274#P2302). I am also re-importing multiple thousands of them formatted correctly in parallel.
- Some might still remain without formula because of the unability to generate it but this should remain minor. I have downloaded all formulas locally as safety. Happy to re-import all the ones that will be still missing after curation in case. AdrianoRutz (talk) 10:03, 27 October 2023 (UTC)
- It does not look like you used (only) this particular regex. See above an example that doesn't yield a constraint violation using the very same regex, many others in your batches probably also don't. In case of a reimport why it was needed to remove the statements in first place anyway? Also reimport from where? PubChem? Examples that I checked (minerals) are without PubChem links and are not expected to have ones. 2001:7D0:81DB:1480:45C:F54A:B289:F778 10:25, 27 October 2023 (UTC)
- Re-import the exact same formula I deleted, nothing else relying on external sources. I have them locally so do not worry, no item having a formula previously will be left behind.
- I agree some edits could have been avoided but the end result will be cleaner.
- For the rest, I am really happy to see that you care about those cases so much that you seem to forget the rest. AdrianoRutz (talk) 10:43, 27 October 2023 (UTC)
- Well, these are not only a few cases. So far e.g. 4038 mineral species[1] in particular lack a chemical formula due to your recent edits. 2001:7D0:81DB:1480:45C:F54A:B289:F778 11:04, 27 October 2023 (UTC)
- Well, this is over 4038 over 6049. Indicates an issue with the chemical formula of mineral species not conforming to the actual constraints to me.
- In order to make clear that I never had intention of deleting formulas without re-importing them properly, I prioritized their re-import: https://quickstatements.toolforge.org/#/batch/215387
- For the rest, I will still wait my other imports to finish before re-importing them.
- I am happy my edits at least drew the attention to things that were left for years untouched. AdrianoRutz (talk) 11:42, 27 October 2023 (UTC)
- Alright, thanks for restoring. Just in case I mention that so far you didn't restore references, e.g. here. 2001:7D0:81DB:1480:F4F9:6B85:4F2E:EFED 08:25, 28 October 2023 (UTC)
- Well, these are not only a few cases. So far e.g. 4038 mineral species[1] in particular lack a chemical formula due to your recent edits. 2001:7D0:81DB:1480:45C:F54A:B289:F778 11:04, 27 October 2023 (UTC)
- It does not look like you used (only) this particular regex. See above an example that doesn't yield a constraint violation using the very same regex, many others in your batches probably also don't. In case of a reimport why it was needed to remove the statements in first place anyway? Also reimport from where? PubChem? Examples that I checked (minerals) are without PubChem links and are not expected to have ones. 2001:7D0:81DB:1480:45C:F54A:B289:F778 10:25, 27 October 2023 (UTC)
- Many of these formulae seemed valid, but where imported with minor errors (like not using the subscript). I'd be good to check in which cases some formula can be reimported and which items will be left without a formula. Wostr (talk) 18:19, 26 October 2023 (UTC)
@AdrianoRutz: I assume your (re)imports are done now but lost refernces are still a significant issue. I made some more queries and caught 1354 mineral species items where reference(s) got lost in your batches. In case you don't have the list, items are the following:
1354 items where refernces got lost |
---|
Q13094 Q13097 Q13103 Q111200 Q126204 Q165254 Q167741 Q189703 Q220373 Q225550 Q239589 Q256865 Q273663 Q280913 Q304018 Q319661 Q320694 Q333593 Q333827 Q338106 Q344688 Q355210 Q380942 Q381133 Q384447 Q407251 Q408516 Q408544 Q409410 Q409433 Q409611 Q409733 Q410719 Q411891 Q413128 Q413272 Q413322 Q413391 Q413516 Q413750 Q414132 Q414840 Q414848 Q414924 Q415059 Q415544 Q416303 Q417278 Q417292 Q417443 Q417518 Q417730 Q418304 Q418652 Q418873 Q419091 Q419241 Q419960 Q420442 Q420547 Q420570 Q420924 Q420958 Q421357 Q421362 Q422895 Q423051 Q423458 Q423494 Q424127 Q424818 Q425132 Q429647 Q429671 Q429712 Q429813 Q429857 Q454926 Q478080 Q515826 Q523795 Q604784 Q616884 Q644495 Q657684 Q690924 Q749082 Q775839 Q783420 Q808228 Q932260 Q936060 Q947884 Q958786 Q967671 Q978346 Q1051438 Q1056068 Q1061880 Q1063305 Q1065505 Q1067020 Q1067912 Q1069446 Q1069926 Q1070576 Q1070595 Q1070618 Q1070842 Q1070871 Q1071116 Q1071128 Q1071172 Q1071200 Q1072887 Q1072893 Q1111376 Q1113914 Q1146033 Q1171888 Q1224219 Q1242102 Q1469279 Q1532374 Q1552708 Q1759051 Q1759880 Q1853054 Q1912594 Q1913046 Q1935056 Q1962093 Q1973079 Q2008960 Q2048382 Q2056968 Q2075342 Q2177829 Q2235552 Q2238642 Q2251499 Q2252069 Q2252074 Q2252328 Q2252626 Q2252659 Q2275899 Q2293541 Q2294982 Q2419574 Q2502214 Q2517404 Q2517823 Q2518702 Q2573999 Q2599838 Q2629082 Q2705947 Q2706284 Q2738204 Q2856361 Q3039376 Q3045746 Q3110835 Q3120528 Q3189938 Q3357526 Q3381506 Q3529395 Q3558991 Q3570566 Q3606118 Q3606357 Q3606358 Q3606883 Q3607314 Q3607372 Q3607389 Q3608887 Q3611784 Q3612346 Q3612408 Q3612651 Q3612730 Q3613055 Q3613187 Q3613396 Q3613399 Q3613400 Q3613401 Q3613402 Q3613403 Q3613404 Q3613409 Q3613411 Q3613540 Q3613655 Q3613934 Q3614259 Q3614342 Q3614463 Q3614464 Q3614465 Q3614468 Q3615272 Q3615274 Q3615369 Q3616576 Q3622405 Q3623443 Q3623985 Q3632834 Q3633600 Q3635489 Q3637338 Q3640278 Q3640841 Q3642691 Q3643919 Q3644168 Q3644991 Q3646776 Q3647244 Q3647605 Q3651361 Q3663615 Q3665119 Q3665120 Q3666867 Q3675284 Q3675285 Q3675914 Q3680789 Q3683911 Q3693246 Q3693435 Q3693975 Q3697227 Q3699570 Q3700166 Q3700167 Q3700169 Q3700737 Q3701213 Q3702136 Q3703769 Q3704572 Q3705019 Q3705021 Q3705130 Q3705134 Q3705148 Q3705207 Q3705271 Q3705832 Q3705893 Q3705912 Q3706013 Q3706016 Q3706350 Q3707332 Q3712264 Q3712297 Q3712624 Q3712736 Q3713966 Q3714202 Q3714229 Q3714815 Q3714817 Q3715339 Q3716119 Q3716171 Q3716229 Q3716542 Q3716715 Q3716832 Q3716911 Q3716932 Q3718594 Q3720433 Q3730667 Q3736297 Q3742338 Q3743170 Q3743171 Q3743172 Q3743174 Q3743199 Q3743202 Q3743205 Q3743206 Q3743208 Q3743210 Q3743239 Q3743242 Q3743245 Q3743261 Q3743263 Q3743269 Q3746703 Q3746850 Q3746851 Q3746852 Q3746853 Q3746854 Q3757604 Q3760680 Q3764033 Q3764384 Q3764393 Q3764512 Q3764777 Q3765857 Q3768585 Q3769036 Q3771837 Q3771838 Q3771933 Q3772514 Q3772563 Q3772565 Q3772693 Q3772768 Q3772785 Q3773065 Q3773067 Q3773432 Q3773471 Q3773511 Q3774037 Q3774175 Q3774232 Q3775698 Q3775888 Q3775963 Q3776579 Q3776623 Q3776794 Q3776816 Q3776846 Q3777238 Q3778398 Q3778835 Q3779558 Q3779817 Q3780063 Q3780116 Q3780156 Q3780198 Q3780289 Q3780299 Q3782449 Q3783369 Q3784478 Q3784779 Q3786352 Q3787813 Q3787825 Q3796518 Q3798898 Q3807341 Q3808473 Q3808488 Q3808760 Q3812432 Q3813123 Q3814882 Q3815105 Q3815520 Q3816089 Q3816275 Q3816494 Q3817519 Q3817799 Q3826990 Q3829223 Q3831354 Q3834951 Q3836550 Q3839182 Q3843091 Q3843274 Q3843282 Q3843285 Q3843290 Q3843295 Q3843298 Q3843299 Q3855549 Q3859484 Q3861118 Q3861154 Q3865992 Q3867421 Q3868666 Q3869870 Q3869909 Q3873306 Q3877884 Q3879072 Q3879968 Q3881296 Q3881389 Q3885918 Q3886043 Q3886172 Q3886223 Q3886229 Q3886988 Q3887397 Q3887781 Q3888829 Q3896894 Q3899416 Q3900069 Q3901684 Q3905861 Q3907762 Q3909493 Q3909494 Q3909496 Q3909498 Q3909499 Q3911372 Q3924268 Q3924298 Q3924319 Q3924882 Q3925561 Q3926348 Q3926490 Q3926495 Q3926599 Q3926633 Q3927571 Q3927673 Q3927919 Q3928474 Q3929821 Q3932515 Q3932516 Q3932558 Q3935809 Q3936049 Q3941493 Q3941542 Q3941909 Q3945167 Q3950797 Q3952221 Q3954647 Q3959261 Q3959658 Q3959843 Q3961443 Q3963058 Q3963205 Q3963908 Q3963909 Q3963913 Q3965844 Q3973129 Q3976205 Q3978161 Q3978313 Q3980181 Q3982575 Q3992140 Q4003514 Q4003542 Q4006297 Q4006299 Q4006301 Q4006303 Q4006324 Q4006326 Q4006327 Q4006328 Q4006329 Q4006331 Q4006415 Q4006570 Q4006616 Q4006642 Q4006661 Q4006816 Q4007287 Q4008213 Q4008480 Q4008481 Q4008527 Q4008605 Q4008646 Q4008651 Q4008652 Q4008654 Q4008703 Q4008817 Q4008931 Q4014773 Q4016650 Q4018336 Q4018561 Q4018805 Q4019681 Q4020324 Q4021784 Q4021811 Q4021833 Q4021842 Q4022367 Q4022369 Q4022386 Q4022532 Q4022558 Q4022565 Q4022600 Q4022643 Q4022644 Q4022759 Q4022803 Q4022804 Q4022907 Q4022911 Q4023130 Q4023132 Q4023133 Q4023135 Q4023136 Q4023164 Q4023217 Q4023220 Q4023276 Q4023315 Q4023961 Q4024474 Q4024569 Q4024965 Q4025796 Q4114062 Q4731094 Q4736676 Q4737323 Q5227830 Q6080840 Q6437009 Q6871266 Q6956734 Q6966368 Q7861711 Q9189864 Q9697436 Q10572536 Q10778633 Q10914750 Q11295066 Q11379798 Q11441564 Q11541052 Q11616137 Q12021105 Q12149907 Q12528769 Q13368799 Q14949771 Q15044201 Q15646565 Q15784322 Q15854907 Q15915327 Q16020532 Q16856553 Q16957362 Q17013743 Q17028639 Q17126824 Q17166284 Q17212987 Q17394623 Q17394649 Q17449767 Q17466462 Q17484746 Q17484966 Q17485402 Q17501680 Q17502836 Q17534863 Q17537589 Q18053943 Q18059147 Q18119151 Q18121661 Q18123026 Q18220110 Q18324782 Q18335730 Q18338615 Q18338754 Q18339504 Q18459050 Q18562333 Q18700442 Q18700617 Q18700961 Q19317477 Q19327826 Q19357351 Q19358411 Q19601089 Q19698245 Q19717059 Q19717832 Q19726921 Q19726954 Q19726993 Q19739943 Q19740293 Q19744022 Q19744243 Q19744295 Q19746672 Q19749052 Q19766945 Q19767346 Q19767400 Q19767695 Q19767699 Q19767703 Q19771539 Q19771787 Q19772041 Q19772088 Q19772333 Q19772404 Q19772507 Q19772728 Q19772818 Q19799626 Q19799637 Q19799640 Q19799641 Q19799657 Q19799659 Q19799661 Q19799664 Q19799756 Q19799758 Q19799759 Q19799760 Q19799761 Q19799762 Q19799763 Q19799765 Q19799766 Q19799770 Q19799771 Q19810631 Q19810650 Q19810653 Q19810655 Q19810657 Q19810658 Q19810659 Q19810661 Q19810663 Q19810664 Q19810665 Q19810666 Q19810668 Q19810669 Q19810671 Q19810672 Q19810673 Q19810674 Q19810675 Q19810677 Q19810678 Q19810679 Q19810680 Q19810681 Q19810682 Q19810683 Q19810686 Q19810687 Q19810689 Q19810690 Q19810691 Q19810692 Q19810693 Q19810695 Q19810696 Q19810697 Q19810698 Q19810700 Q19810702 Q19810705 Q19810706 Q19810708 Q19810709 Q19833285 Q19833286 Q19833287 Q19833288 Q19833290 Q19833291 Q19833292 Q19833293 Q19833296 Q19833297 Q19833299 Q19833300 Q19833302 Q19833305 Q19833306 Q19833308 Q19833309 Q19833310 Q19833313 Q19833314 Q19833316 Q19833318 Q19833321 Q19833324 Q19833325 Q19833327 Q19833328 Q19833329 Q19833330 Q19833331 Q19833332 Q19833333 Q19833334 Q19833335 Q19833336 Q19833339 Q19833340 Q19833344 Q19833348 Q19833349 Q19833351 Q19833352 Q19833353 Q19833354 Q19833355 Q19833358 Q19833359 Q19833361 Q19833363 Q19833364 Q19833365 Q19833367 Q19833389 Q19833488 Q19833490 Q19833492 Q19833494 Q19833497 Q19833503 Q19833504 Q19833507 Q19833508 Q19833510 Q19833511 Q19833512 Q19833513 Q19833515 Q19833516 Q19833517 Q19833519 Q19833520 Q19833521 Q19833523 Q19833524 Q19833525 Q19833527 Q19833530 Q19833532 Q19833533 Q19833534 Q19833535 Q19833536 Q19833539 Q19833540 Q19833542 Q19833544 Q19833545 Q19833613 Q19833615 Q19833616 Q19833617 Q19833618 Q19833620 Q19833622 Q19833624 Q19833628 Q19833629 Q19833630 Q19833631 Q19833632 Q19833633 Q19833634 Q19833635 Q19833636 Q19833637 Q19833638 Q19833640 Q19833645 Q19833646 Q19833648 Q19833651 Q19833655 Q19833659 Q19833660 Q19833663 Q19833664 Q19833665 Q19833667 Q19833668 Q19833669 Q19833670 Q19833671 Q19833672 Q19833675 Q19833679 Q19833681 Q19833682 Q19833683 Q19833684 Q19833690 Q19833699 Q19833706 Q19833713 Q19833717 Q19833718 Q19833727 Q19833848 Q19841371 Q19841373 Q19841374 Q19841375 Q19841376 Q19841377 Q19841378 Q19841380 Q19841382 Q19841383 Q19841385 Q19841386 Q19841389 Q19841390 Q19841393 Q19841394 Q19841397 Q19841398 Q19841399 Q19841401 Q19841402 Q19841403 Q19841404 Q19841405 Q19841407 Q19841411 Q19841414 Q19841415 Q19841416 Q19841418 Q19841419 Q19841421 Q19841425 Q19841437 Q19841442 Q19841443 Q19841444 Q19841446 Q19841449 Q19841452 Q19841453 Q19841454 Q19841458 Q19841459 Q19841460 Q19841461 Q19841465 Q19841467 Q19841469 Q19841470 Q19841478 Q19860829 Q19860830 Q19860832 Q19860834 Q19860837 Q19860838 Q19860839 Q19860842 Q19860843 Q19860844 Q19860847 Q19860849 Q19860851 Q19860852 Q19860855 Q19860857 Q19860858 Q19860859 Q19860860 Q19860862 Q19860863 Q19860864 Q19860866 Q19860868 Q19860872 Q19860873 Q19860874 Q19860875 Q19860876 Q19860877 Q19860878 Q19860879 Q19860880 Q19860881 Q19860884 Q19860886 Q19860888 Q19860889 Q19860890 Q19860891 Q19860896 Q19860897 Q19860898 Q19860899 Q19860901 Q19860902 Q19860903 Q19860906 Q19860910 Q19860911 Q19860912 Q19860913 Q19860914 Q19860915 Q19860918 Q19860919 Q19860920 Q19860923 Q19860925 Q19860928 Q19860930 Q19860931 Q19860933 Q19860934 Q19860936 Q19860937 Q19860938 Q19860939 Q19860940 Q19860941 Q19860943 Q19860944 Q19860945 Q19860949 Q19860951 Q19860955 Q19860959 Q19860961 Q19860963 Q19860967 Q19860968 Q19860969 Q19860972 Q19860973 Q19860977 Q19860978 Q19860981 Q19860985 Q19860987 Q19860988 Q19860990 Q19860991 Q19860993 Q19860995 Q19860996 Q19860997 Q19860998 Q19860999 Q19861000 Q19861002 Q19861004 Q19861006 Q19861007 Q19861009 Q19861010 Q19861011 Q19861014 Q19861020 Q19861021 Q19861022 Q19861024 Q19861025 Q19861027 Q19861028 Q19861029 Q19861031 Q19861032 Q19861034 Q19861035 Q19861036 Q19861039 Q19861041 Q19861042 Q19861044 Q19861045 Q19861052 Q19861053 Q19861055 Q19861057 Q19861058 Q19861059 Q19861060 Q19861061 Q19861062 Q19861067 Q19861069 Q19861071 Q19861072 Q19861073 Q19861074 Q19861075 Q19861077 Q19861078 Q19861080 Q19861081 Q19861082 Q19861083 Q19861086 Q19861090 Q19861091 Q19861092 Q19861096 Q19861100 Q19861102 Q19861103 Q19861104 Q19861106 Q19861107 Q19861113 Q19861115 Q19861119 Q19861121 Q19861124 Q19861126 Q19861127 Q19861129 Q19861134 Q19861136 Q19861137 Q19861139 Q19861140 Q19861141 Q19861143 Q19861148 Q19861150 Q19861151 Q19861152 Q19861153 Q19861154 Q19861155 Q19861160 Q19861162 Q19861163 Q19861165 Q19861167 Q19861168 Q19861169 Q19861171 Q19861175 Q19861176 Q19861179 Q19861180 Q19861181 Q19861184 Q19861185 Q19861187 Q19861188 Q19861189 Q19861190 Q19861194 Q19861196 Q19861201 Q19861202 Q19861203 Q19861204 Q19861206 Q19861207 Q19861208 Q19861209 Q19861211 Q19861213 Q19861215 Q19861218 Q19861219 Q19861221 Q19861225 Q19861226 Q19861227 Q19861228 Q19861229 Q19861231 Q19861233 Q19861235 Q19861236 Q19861238 Q19861240 Q19861241 Q19861243 Q19861244 Q19861245 Q19861246 Q19861247 Q19861258 Q19861259 Q19861262 Q19861264 Q19861265 Q19861267 Q19861270 Q19861271 Q19861273 Q19861274 Q19861275 Q19861279 Q19861282 Q19861283 Q19861285 Q19861286 Q19861288 Q19861289 Q19861290 Q19861291 Q19861293 Q19861294 Q19861297 Q19861299 Q19861301 Q19861303 Q19861304 Q19861311 Q19861315 Q19861317 Q19861319 Q19861320 Q19861321 Q19861322 Q19861993 Q19861994 Q19861995 Q19861997 Q19861998 Q19861999 Q19862000 Q19862002 Q19862003 Q19862005 Q19862006 Q19862008 Q19862009 Q19862011 Q19862012 Q19862016 Q19862018 Q19862019 Q19862022 Q19862025 Q19862026 Q19862027 Q19862030 Q19862032 Q19862033 Q19862036 Q19862037 Q19862038 Q19862039 Q19862040 Q19862041 Q19862042 Q19862043 Q19862044 Q19862045 Q19862046 Q19862051 Q19862052 Q19862053 Q19862054 Q19862055 Q19862056 Q19862058 Q19862059 Q19862062 Q19862064 Q19862066 Q19862069 Q19862071 Q19862072 Q19862074 Q19862075 Q19862077 Q19862081 Q19862086 Q19862096 Q19862097 Q19862102 Q19862103 Q19862109 Q19862110 Q19862111 Q19862113 Q19862115 Q19862116 Q19862117 Q19862119 Q19862120 Q19862124 Q19862128 Q19862129 Q19862130 Q19862133 Q19862135 Q19862142 Q19862331 Q19862335 Q19862338 Q19862339 Q19862340 Q19862341 Q19862347 Q19862349 Q19862350 Q19862351 Q19862352 Q19862353 Q19862354 Q19862355 Q19862356 Q19862358 Q19862359 Q19862360 Q19862361 Q19862363 Q19890838 Q20021294 Q20084529 Q20203481 Q20280286 Q20285494 Q20287818 Q20653385 Q20687308 Q20706675 Q20725249 Q20725276 Q20725285 Q20727702 Q20727842 Q20743475 Q20828301 Q20828345 Q20828356 Q20829934 Q20830025 Q20830637 Q20830643 Q20830647 Q20870039 Q20870045 Q20871125 Q20871378 Q20871625 Q20871735 Q20872481 Q20882353 Q21666185 Q21682829 Q21813710 Q23005375 Q23005381 Q23005421 Q23636392 Q23894856 Q24255342 Q24257080 Q26270026 Q27013457 Q27037317 Q28125135 Q28791561 Q28791640 Q29471791 Q30335993 Q37278840 Q42303471 Q42303486 Q55877096 Q55887854 Q55888035 Q55891046 Q55897144 Q56146017 Q57812430 Q57814414 Q58822285 Q58886566 Q58903568 Q61478951 Q68676254 Q76768854 Q80198033 Q92197774 Q101084527 Q101084530 Q104144390 Q105697596 Q105697698 Q106071106 Q106623123 Q106623144 Q106623252 Q106623288 Q108146021 Q108146553 Q108146572 Q108146585 Q108146606 Q108147167 Q108150894 Q108150946 Q109301372 Q109301502 Q109301525 Q109317440 Q109322220 Q109322225 Q109322311 Q109322317 Q109322387 Q110297228 Q110297427 Q111812081 Q111812327 Q112944291 Q112961310 Q114793518 Q115520196 |
Additionally I found 13 mineral species item where P274 statements hasn't been restored yet: Q3129310 Q973557 Q115520207 Q123167546 Q123169008 Q123155195 Q123163486 Q123168695 Q3782486 Q284146 Q123152967 Q123170689 Q401047. Please further process these 1354+13 items too.
Note that I checked only mineral species. It is very likely that similar problems concern other items in your batches too. So it might be still better if you undid everthing and then redo it in a less messy way. 2001:7D0:81DB:1480:194B:C6D3:7F6:3878 08:26, 29 October 2023 (UTC)
- Thank you for your permanent concerns and feedback. Seeing someone this motivated to improve the content of Wikidata is cool.
- 1. I did not only check the mineral species, but also other instances, such as mineral varieties, for example, that I already fixed. In case I missed some, happy to hear your feedback.
- 2. No, the editing was not over, your engagement of having these things now fixed is faster than my capacities to edit. Still, I again prioritized your concerns: https://quickstatements.toolforge.org/#/batch/215583
- 3. These were 1360 and not 1354, additionally, in the meantime, the number of formulas got up by more than 100,000 and incorrectly formatted ones down by 20,000.
- 4. For our next interaction I would appreciate less judgemental sentences and eventually a bit more positiveness.
- Best, AdrianoRutz (talk) 12:53, 29 October 2023 (UTC)
- P.S.:
- I checked the 13 additional items you mentioned manually and thank you for pointing them out. These have gone under my radar as they contained 2 different chemical formulas, which was unexpected. I reverted the edits and hope someone more knowledgeable than me in the field will curate them. AdrianoRutz (talk) 12:59, 29 October 2023 (UTC)
How to handle incorrect PubChem entry
Q123257271 "6-bromo-2-mercaptotryptamine" matches the PubChem record titled as this chemical-name. But this structure is mercaptomethyl not mercapto and SciFinder has no such structure. Instead, SciFinder has the actual mercapto structure at this name (CASNo 808113-54-4) and there is no PubChem entry matching that CASNo. And the cited refs at en:BrMT also are actually the mercapto structure. How should Wikidata handle this? Should this WikiData item be a clone of the presumably-incorrect PubChem record, and a new WikiData item be created for the correct structure? Confusing to have two different items with a same chemical name. Or should this WikiData item be updated itself (and the PubChem link omitted)? DMacks (talk) 05:56, 7 November 2023 (UTC)
- Hi. Basically, both Wikidata and PubChem are based on the idea that each record has a unique InChI/-Key. When there is a mismatch between name and InChIKey, I normally resolve it like this: if there is a Wikipedia sitelink, follow what that says (because moving sitelinks is harder, and because Wikidata started out as database linking Wikipedias); if there is not, I tend to follow the InChI/-Key (and the matching SMILES) and update/fix the name. The matching of the PubChem CID follows the InChI/-Key. When I pass the name through OPSIN (Q26481302), the it confirms indeed that the name and structure do not match. I suggest to update the name. @Marbletan: maybe you can shed further light into this Wikidata item? --Egon Willighagen (talk) 06:14, 7 November 2023 (UTC)
- I created this Item based on the content in the English Wikipedia article. I did not catch that the article conflated two different chemical compounds. I'm sorry for bringing the confusion from there to here. I think there should be two different Items to describe the two different chemical compounds, but I don't have a personal preference for which way it goes. Pubchem's incorrect chemical name shouldn't be used. We can also use the "different from" property (P1889) to mitigate the potential for future confusion. Marbletan (talk) 13:32, 7 November 2023 (UTC)
- I updated the name to match the structure. Is there a tool/bot for creating new Wikidata entries for newly created chemical pages? DMacks (talk) 01:04, 8 November 2023 (UTC)
- I created Q123370393 manually. I'm not aware of a tool to automate it. Marbletan (talk) 13:24, 8 November 2023 (UTC)
- Thanks. DMacks (talk) 14:09, 8 November 2023 (UTC)
- I created Q123370393 manually. I'm not aware of a tool to automate it. Marbletan (talk) 13:24, 8 November 2023 (UTC)
- I updated the name to match the structure. Is there a tool/bot for creating new Wikidata entries for newly created chemical pages? DMacks (talk) 01:04, 8 November 2023 (UTC)
Is model for and Modeled by
Please consider supporting Wikidata:Property_proposal/model_for. Thanks. Fgnievinski (talk) 02:57, 12 November 2023 (UTC)