Wikidata:Property proposal/Natural Product Atlas ID

From Wikidata
Jump to navigation Jump to search

Natural Product Atlas ID[edit]

Originally proposed at Wikidata:Property proposal/Natural science

DescriptionA link from wikdata entries to the external chemical curation database, the Natural Product Atlas
Data typeExternal identifier
Domainproperty
Example 1sirolimus (Q32089)NPA000414
Example 2geldanamycin (Q904475) → NPA019914
Example 3Difficidin (Q58371294) → NPA019912
Example 4nystatin A1 (Q27292191) → NPA020315
Example 5kedarcidin (Q15426249) → NPA020328
Planned useFor all NPAtlas compounds that have existing Wikidata entries (as determined by NPAtlas->PubChemSID->PubChemCID-> WD Entity links), I will update the entries with an NPatlas tag. Compounds with this tag are also known to be a natural product (natural product (Q901227)) and to have a producing organism (this taxon is source of (P1672)).
Number of IDs in source25000
Expected completenesseventually complete (Q21873974)
Formatter URLhttps://www.npatlas.org/joomla/index.php/explore/compounds#npaid=$1
Robot and gadget jobscan be created
See alsothis taxon is source of (P1672) natural product (Q901227)

Motivation[edit]

The Natural Product Atlas (Natural Product Atlas (Q78224032)) is a curatorial effort that has annotated information on ~25000 small molecules produced by organisms in nature. The data has been released as a CC-attribution resource that is downloadable on their website. Many of these compounds exist in Wikidata and can be linked via a UID - either a PubChem CID or an InChIKey (or both). Furthermore, each NPAtlas compound is registered in PubChem with a Substance ID. Therefore, believe this is a useful tag that will facilitate a more dense linking of compounds to their producers in nature.

While I believe the addition of this tag is useful and desirable, there are a few challenges facing the broader issue of incorporating natural product data into wikidata that should be addressed.

1. Groups of related compounds. It is common to have a series of related compounds that are mostly similar to one another. For example, NPAtlas contains a number of entries with minor variants - say Spumigin A - Spumigin F. We may wish to link only to one member of the series.

2. What is the best way to link a compound to NPAtlas ID? Names can be ambiguous but even unique identifiers like InChIKeys can get you into trouble. For example, the Wikidata entry for Verruculogen, verruculogen (undef. stereochem.) (Q11954479), links to Pubchem CID 104862 while the NPatlas SID for Verruculogen, 386992827 , links to Pubmed CID 13887805. These two Pubmed-validated compound-IDs refer to the same compound but only one has assigned stereochemistry. In this case, a naive script that checks Wikidata for the existence of the named entity "Verruculogen" would find an entity but that entity would have a conflicting InChIKey; a search for matching InChIKeys (linked PubChem CID) would indicate the compound is not in wikidata.

If we take into account issues 1) and 2) the safest way to assign NPAtlas IDs is to only apply it to the subset of current Wikidata entities that have matching names, PubChem CIDs and InChIKeys. This will be considerably less than the current full set of ~25k compounds.


Discussion[edit]

Saehrimnir
Leyo
Snipre
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Adriano Rutz
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
TiagoLubiana

Notified participants of WikiProject Chemistry

@ديفيد عادل وهبة خليل 2, Zcp3000, Wostr, YULdigitalpreservation, Egon Willighagen: ✓ Done: Natural Product Atlas ID (P7746). − Pintoch (talk) 17:53, 30 December 2019 (UTC)[reply]