User talk:Photocyte

From Wikidata
Jump to navigation Jump to search
Logo of Wikidata

Welcome to Wikidata, Photocyte!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

Belatedly, Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:27, 8 April 2019 (UTC)[reply]


Photocyte (talk) 05:43, 18 April 2020 (UTC)[reply]

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

Hello gang (not sure if I am doing this right by placing it on my own talk page - let me know if this is unusual), I wanted to ask some questions about the integration of PubChem and Wikidata. As you may know, PubChem is (pretty sure) the most comprehensive database of molecular structures. Many PubChem entries, are linked to a Wikidata item (e.g. https://www.wikidata.org/wiki/Q418878 - scroll down to see the PubChem CID property). But, novel PubChem entries, as far as I can tell, do not get propagated to Wikidata within a reasonable amount of time (e.g., https://pubchem.ncbi.nlm.nih.gov/compound/139291741 , and if I search Wikidata for that PubChem ID "139291741", nothing comes up).

So, questions:

  • 1) Is there any effort to make it so *every* PubChem compound gets replicated in Wikidata, even if it doesn't have a Wikipedia page yet? If I made a bot that tried to do that, would that cause any issues?
  • 2) Actually, the https://www.wikidata.org/wiki/Q418878 example I gave above links out to PubChem, but if you click through the link, you'll notice that the entry has been marked non-live. In actuality, it should link to the live version of the compound: https://pubchem.ncbi.nlm.nih.gov/compound/135445694 . Is there any effort to be updating these PubChem<->Wikidata links?
  • 3) If updating the PubChem<->Wikidata links, in the Wikipedia page linked to the Wikidata item, the PubChem linkout is also used, but it is a separate thing under the Chembox template (see here: https://en.wikipedia.org/wiki/Coelenterazine). Does anyone know if there is a bot that is updating those Chembox PubChem linkouts from the Wikidata entry, or vis-versa?

Hi, the answer your first question: no, there is no such effort. There are two reasons why this will not happen soon either: PubChem is too large. Second, we do not have enough CC0 data in chemistry to demonstrate which chemicals are notable and which not. To answer the second questions, I don't think there is now, but this is something we could work on. The normal approach here is to create a report that indicates which non-live PubChem CID records are linked to, so that the WikiProject Chemistry can look at them. If not mistaken, this is how it all started, but this work has not been continued. I would say, all questions are relevant, but keep in mind, there is not a general 1-to-1 relation and so many corner cases that the term corner case is not really even appropriate. I would say the WikiProject has limited bandwidth, but if interested, you're most welcome to continue talking. --Egon Willighagen (talk) 07:12, 18 April 2020 (UTC)[reply]

You really do NOT want to simply load all of PubChem into Wikidata as you will inherit so many errors and will spend years cleaning them up if you are not careful. Our efforts at CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) have 882,000 chemicals (and five full time curators) and we are curating data every day and are 20 years into the effort and still finding errors. I can point you to a number of articles highlighting the issues of blindly using public domain chemistry data.... --Antony Williams 23:52, 26 July 2020 (UTC)
Hello, good place to discuss about this kind of topics is Wikidata:WikiProkect Chemistry. Then concerning your questions, no , ther eis no intention to import all PubChem into WD. WD is not a mirror of PubChem the intersection of more databases and contributors work. Then PubChem has some duplicates (just have a look at Wikidata:Database_reports/Constraint_violations/P662#"Single_value"_violations) and because the policy concerning what is a chemical compound is different between PubChem and WD. So there is a need to filter what should be imported into WD.
Then WD is lacking maintenance bots checking databases to ensure up-to-date information in WD. The main reason is tha mass importations were done in WD without a curation of data. So keeping data up-to-date is wortheless until a complete curation of the original set: nothing guarantees that a PubChem CID is added in the correct Q item. Most data were imported from Wikipedias with a lot of errors and bad definitions of the chemical strucutre (typical example: structure of salt form instead of acid form for an acid). So performing the task you mention is good but a more structural work is needed first.
Finally concerning data displayed in Wikipedia articles, this is a problem of Wikipedia: there is a possibility to instantaneous up-date of data in Chembox, but for that Wikipedia's people have to accept the use of lua template and the display of data from WD. Currently there is a strong opposition to that system for different reasons (you can find some discussion about that here. Best regards. Snipre (talk) 13:34, 19 April 2020 (UTC)[reply]