Wikidata:WikiFactMine/How the facts flow

From Wikidata
Jump to navigation Jump to search
WikiFactMine schematic as of June 2017. Adding WikiFactMine content to the primary sources tool is a future aim.

This page covers details of the fact-mining at the heart of WikiFactMine.

Canary-Perch/Karoo is a tool combination that handles the fact extraction stage of the project. It takes them from scientific papers and uses our selection of key terms, organised into "dictionaries". The "fact mining" occurs here, in batches.

ElasticSearch, an instance of the open source software product from elastic.co, receives the facts. Its retrieval infrastructure manages the pile of raw facts that have been extracted. This technology is also in use on Wikipedia.

The WikiFactMine API (Application Programming Interface) is used to run rationed, structured queries on the content held in ElasticSearch. Through it, the mined facts can be passed in several directions:

  • FactVis, which allows the facts to be set out in an array, the columns of which correspond to the various dictionaries in use.
  • The SwaggerUI provides human-readable access to the mined facts, either by date mined or by their matching Wikidata codes.
  • Into a dedicated Javascript tool or gadgets on Wikidata.
  • As statements into the broad-based Primary Sources tool on Wikidata.

Further applications are possible.