Wikidata:WikidataCon 2017/Notes/An open source tool for fishing Wikidata entities in text and PDF documents
Title: An open source tool for fishing Wikidata entities in text and PDF documents
Speaker(s)
[edit]Name or username: Patrice Lopez
Useful links:
https://github.com/kermitt2/nerd
http://entity-fishing.science-miner.com
Abstract
[edit]entity-fishing (repo: https://github.com/kermitt2/nerd, demo: http://entity-fishing.science-miner.com, documentation: http://nerd.readthedocs.io) is an open source tool dedicated to the automatic identification and disambiguation of Wikidata entities in multilingual text and PDF documents. The tool is based on machine-learning techniques exploiting Wikipedia as training source. entity-fishing offers high performance and scalability and is totally generic in term of domains and languages. It can thus address a large variety of usages. Our work focuses more particularly on processing scholarly documents, taking advantage of the massive amount of scientific knowledge and links present in Wikidata.
![](http://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Screenshot_from_2017-10-30_18-17-12.png/220px-Screenshot_from_2017-10-30_18-17-12.png)
Collaborative notes of the session
[edit]Entity recognizer
Grobid-NER - https://github.com/kermitt2/grobid-ner
LMDB
Entitiy embedding.