User:UreomiczBot/Human metabolites

From Wikidata
Jump to navigation Jump to search

Introduction[edit]

WikiPathways (Q7999828) is a database of biological databases. It contains many metabolites, a good number of which are not currently not found in Wikidata (Q2013). I have been previously adding metabolites using Bioclipse (Q1769726) and QuickStatements (Q20084080). However, that does not scale. Moreover, of metabolites that are present in Wikidata, not all of them have enough information to chemical identify them accurately.

The objective of this task is two-fold:

  1. annotate existing compounds in Wikidata as found in in a human (current)
  2. add metabolites not yet in Wikidata to Wikidata (implemented but currently failing with an API error about URLs in labels (which is not the case))

The first is done by adding a found in taxon (P703) statement with the value Homo sapiens (Q15978631). The provenance will express that the statement stated in (P248) WikiPathways (Q7999828), when it was retrieved (P813), the WikiPathways ID (P2410) it was stated in, and the reference URL (P854).

Current Scope[edit]

The set of entities maintained by this bot are determined based on their PubChem CID (P662) and the InChIKey (P235).

At present, the bot is limited to metabolites from pathways from the species Homo sapiens (Q15978631) with an PubChem CID (P662) (this is currently 151 metabolites).

Items maintained by this bot[edit]

  • entities with a PubChem CID or InChIKey that are metabolites in WikiPathways

Metabolite properties currently maintained by this bot for these items[edit]

Property Datatype Expected value

(if not listed, see property definition)

found in taxon (P703) Item Currently should only include Homo sapiens (Q15978631)
InChIKey (P235) external-id Added only for new compounds
InChI (P234) external-id Added only for new compounds
canonical SMILES (P233) string Added only for new compounds

Metabolite properties PLANNED for this bot[edit]

Property Datatype Expected value

(if not listed, see property definition)

Note

Data sources[edit]

The bot retrieves its content from the following sources (with a SPARQL query which can be found in the bot source code):

Prototype items[edit]

Bot test edits[edit]

An earlier version of the bot, using different data model made the following changes: 4'-hydroxydiclofenac (Q26690136) (diff), flutamide (Q418669) (diff), cortisol (Q190875) (diff),Serine (Q7454846) (diff), clarithromycin (Q118551) (diff), silibinin A (Q425702) (diff), and lysophosphatidic acid (Q2823281) (diff).

Bot approval[edit]

Implementation[edit]

The bot code is open source and available for inspection. It is implemented in Python, based on the User:ProteinBoxBot source code and is intended to be deployed as a cron job. The current operation is manual, however.