User:Fnielsen/Autolists/Datasets

From Wikidata
Jump to navigation Jump to search

Dataset used in works.

This list is periodically updated by a bot. Manual changes to the list will be removed on the next update!

WDQS | PetScan | TABernacle | Find images | Recent changes | Query: select DISTINCT ?item where { ?work wdt:P4510 ?item . ?item wdt:P31/wdt:P279* wd:Q1172284 . }


OWL ontology[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Simple Knowledge Organization System https://www.w3.org/TR/skos-reference/
The Data Cube vocabulary https://www.w3.org/TR/vocab-data-cube/ http://purl.org/linked-data/cube


Wiktionary language edition[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
German Wiktionary German https://de.wiktionary.org/
English Wiktionary 2002-12-12 English
multiple languages
Creative Commons Attribution-ShareAlike 3.0 Unported https://en.wiktionary.org/


bibliographic database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Web of Science en:WoS 2016
1997
English Bibliographic Scan of Digital Scholarly Communication Infrastructure https://clarivate.com/products/web-of-science/
MEDLINE 1966 https://www.nlm.nih.gov/medline/index.html https://www.nlm.nih.gov/bsd/medline.html
https://www.nlm.nih.gov/databases/databases_oldmedline.html
Web of Knowledge http://wokinfo.com/
http://www.isiwebofknowledge.com/
CINAHL 1961 https://www.ebsco.com/products/research-databases/cinahl-database
Crossref 2000 Bibliographic Scan of Digital Scholarly Communication Infrastructure
Open Science Thesaurus
https://www.crossref.org/
Embase
PsycINFO http://www.apa.org/psycinfo/
CNKI 1996 https://www.cnki.net/
OpenAlex 2022-01-03 American English Open Science Thesaurus https://openalex.org/ https://blog.ourresearch.org/openalex-update-june/
https://openalex.org/about


biological database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Pfam English Pfam: The protein families database in 2021
The Bioregistry
Nucleic Acids Research (NAR) database
GNU Lesser General Public License http://pfam.xfam.org/
Kyoto Encyclopedia of Genes and Genomes English
Japanese
The Bioregistry proprietary license http://www.genome.jp/kegg/
Search Tool for the Retrieval of Interacting Genes/Proteins The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets Creative Commons Attribution 4.0 International http://www.string-db.org/
dbSNP http://www.ncbi.nlm.nih.gov/snp
Electron Microscopy Data Bank 2002 New electron microscopy database and deposition system https://www.ebi.ac.uk/emdb/
https://www.ebi.ac.uk/pdbe/emdb/
GENCODE proprietary license https://www.gencodegenes.org/
GeneDB English Nucleic Acids Research (NAR) database
GeneDB and Wikidata
GeneDB: a resource for prokaryotic and eukaryotic organisms
http://www.genedb.org/
Identifiers.org Nucleic Acids Research (NAR) database proprietary license https://identifiers.org
SILVA ribosomal RNA database SILVA: Comprehensive Databases for Quality Checked and Aligned Ribosomal RNA Sequence Data Compatible with ARB
The SILVA ribosomal RNA gene database project: improved data processing and web-based tools
Nucleic Acids Research (NAR) database
http://www.arb-silva.de/
PanTHERIA https://ecologicaldata.org/wiki/pantheria
VertNet VertNet: a new model for biodiversity data sharing http://vertnet.org/
LOTUS English The Bioregistry
The LOTUS initiative for open knowledge management in natural products research
The LOTUS Initiative for Open Natural Products Research: Knowledge Management through Wikidata
https://search.nprod.net/
EURING Data Bank
Database of Invasive Island Species Eradications http://diise.islandconservation.org/


chemical database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
PubChem English PubChem in 2021: new data content and improved web interfaces
The Bioregistry
Nucleic Acids Research (NAR) database
free content http://pubchem.ncbi.nlm.nih.gov
ChEMBL The ChEMBL database in 2017
The Bioregistry
Creative Commons Attribution-ShareAlike 3.0 Unported https://www.ebi.ac.uk/chembl/
http://www.ebi.ac.uk/chembl
GNPS English Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking https://gnps.ucsd.edu/


clinical trials registry[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
ClinicalTrials.gov English http://www.clinicaltrials.gov
International Clinical Trials Registry Platform 2005 https://www.who.int/ictrp


data set[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Iris flower data set
MNIST database http://yann.lecun.com/exdb/mnist/ https://paperswithcode.com/dataset/mnist
AIFB DataSet Creative Commons Attribution https://figshare.com/articles/AIFB_DataSet/745364
COCO Microsoft COCO: Common Objects in Context http://mscoco.org/
http://cocodataset.org
Visual Genome http://visualgenome.org
Movie Review Data English Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales http://www.cs.cornell.edu/people/pabo/movie-review-data/
Customer Review Datasets English Mining and summarizing customer reviews https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip
Amazon product data Inferring Networks of Substitutable and Complementary Products http://jmcauley.ucsd.edu/data/amazon/
Stanford Sentiment Treebank 2013 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank https://nlp.stanford.edu/sentiment/treebank.html
Large Movie Review Dataset English Learning word vectors for sentiment analysis http://ai.stanford.edu/~amaas/data/sentiment/ http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Rockyou password dataset
FB15K Translating Embeddings for Modeling Multi-relational Data Creative Commons Attribution 2.5 Generic https://everest.hds.utc.fr/doku.php?id=en:transe https://paperswithcode.com/dataset/fb15k
LinkedGeoData http://linkedgeodata.org/
Microsoft Research Paraphrase Corpus en:MRPC English https://www.microsoft.com/en-us/download/details.aspx?id=52398 https://aclweb.org/aclwiki/Paraphrase_Identification_(State_of_the_art)
50 Salads dataset Combining embedded accelerometers with computer vision for recognizing food preparation activities Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International http://cvip.computing.dundee.ac.uk/datasets/foodpreparation/50salads/
YouTube-8M YouTube-8M: A Large-Scale Video Classification Benchmark https://research.google.com/youtube8m/
Citations with identifiers in Wikipedia Creative Commons CC0 License https://analytics.wikimedia.org/datasets/archive/public-datasets/all/mwrefs/
WikiSQL English Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning https://github.com/salesforce/WikiSQL
cc-dbp Apache License, Version 2.0 https://github.com/IBM/cc-dbp
Corpus of Linguistic Acceptability Neural Network Acceptability Judgments https://nyu-mll.github.io/CoLA/
AudioSet Audio Set: An ontology and human-labeled dataset for audio events https://research.google.com/audioset/
FB15K-237 https://www.microsoft.com/en-us/download/details.aspx?id=52312
WikiTableQuestions Compositional Semantic Parsing on Semi-Structured Tables http://nlp.stanford.edu/software/sempre/wikitable/
DocRED English DocRED: A Large-Scale Document-Level Relation Extraction Dataset https://github.com/thunlp/DocRED https://paperswithcode.com/dataset/docred
WebNLG 2020 Dataset
WebNLG 2017 Dataset English The WebNLG Challenge: Generating Text from RDF Data https://webnlg-challenge.loria.fr/download/
PandaSet https://pandaset.org/
Heuristic Analysis for NLI Systems English Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference https://github.com/tommccoy1/hans
CodRED English CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild https://github.com/thunlp/CodRED
Labeled Faces in the Wild en:LFW Labeled Faces in the Wild: Updates and New Reporting Procedures
Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments
http://vis-www.cs.umass.edu/lfw/
LJ Speech English https://keithito.com/LJ-Speech-Dataset/
Conceptual Captions Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
PAN12 Deception Detection: Sexual Predator Identification English https://zenodo.org/record/3713280 https://pan.webis.de/clef12/pan12-web/sexual-predator-identification.html
SuperGLUE SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
ISTEX-1000 OpenTapioca: Lightweight Entity Linking for Wikidata https://github.com/wetneb/opentapioca/tree/master/data
COVID-19 State and County Policy Orders
Reuters-21578 English https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection
http://www.daviddlewis.com/resources/testcollections/reuters21578/
BBCSport English http://mlg.ucd.ie/datasets/bbc.html
Strava Metro data set 2020-09 https://metro.strava.com/
Proprioceptive contribution to oculomotor control in humans (dataset) https://risweb.st-andrews.ac.uk/portal/en/datasets/proprioceptive-contribution-to-oculomotor-control-in-humans-dataset(9e107884-df5a-4de6-8327-ba809d1b2168).html
ImageNet ILSVRC-2012 https://www.image-net.org/
DailyDialog DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset http://yanran.li/dailydialog https://huggingface.co/datasets/daily_dialog
SciDocs SPECTER: Document-level Representation Learning using Citation-informed Transformers https://github.com/allenai/scidocs https://paperswithcode.com/dataset/scidocs
Multilingual Compositional Wikidata Question en:MCWQ
Sentences Involving Compositional Knowledge dataset da:SICK English A SICK cure for the evaluation of compositional distributional semantic models Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported https://marcobaroni.org/composes/sick.html https://paperswithcode.com/dataset/sick
CONLL04 English A Linear Programming Formulation for Global Inference in Natural Language Tasks https://cogcomp.seas.upenn.edu/page/resource_view/43
DWIE en:DWIE English DWIE: An entity-centric dataset for multi-task document-level information extraction https://github.com/klimzaporojets/DWIE
FewRel FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation http://www.zhuhao.me/fewrel/
ACE 2005 Multilingual Training Corpus English
Mandarin
Modern Standard Arabic
https://catalog.ldc.upenn.edu/LDC2006T06
Wind Integration National Dataset Toolkit Overview and Meteorological Validation of the Wind Integration National Dataset toolkit
The Wind Integration National Dataset (WIND) Toolkit
Validation of Power Output for the WIND Toolkit
https://www.nrel.gov/grid/wind-toolkit.html
ACI-BENCH English ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation Creative Commons Attribution 4.0 International https://github.com/wyim/aci-bench
MTS-Dialog English An Empirical Study of Clinical Note Generation from Doctor-Patient Encounters https://github.com/abachaa/MTS-Dialog
CNN/Daily Mail Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond https://paperswithcode.com/dataset/cnn-daily-mail-1
ZESHEL https://github.com/lajanugen/zeshel
HotelRec HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset https://github.com/Diego999/HotelRec
https://paperswithcode.com/dataset/hotelrec
TweekiGold English Tweeki: Linking Named Entities on Twitter to a Knowledge Graph https://github.com/ucinlp/tweeki/tree/main/data/Tweeki_gold
Google RE English https://code.google.com/archive/p/relation-extraction-corpus/
TruthfulQA English TruthfulQA: Measuring How Models Mimic Human Falsehoods https://github.com/sylinrl/TruthfulQA


database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Cochrane Library https://www.cochranelibrary.com/
OpenCitations Corpus en:OCC The varying openness of digital open science tools Creative Commons CC0 License http://opencitations.net/corpus
SciGraph 2017 https://scigraph.springernature.com/explorer https://www.springernature.com/gp/researchers/scigraph
AACT Database https://www.ctti-clinicaltrials.org/aact-database
ClinWiki MIT License https://www.clinwiki.org/
GeoDanmark https://www.geodanmark.dk
National Inpatient Sample
Det Centrale Ordregister da:COR Danish https://ordregister.dk/


digital library[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Wikisource 2003-11-24 https://wikisource.org/
Project Gutenberg 1971-07-04 multiple languages Unlicense https://gutenberg.org
PubMed Central en:PMC English Open Science Thesaurus
The varying openness of digital open science tools
http://www.ncbi.nlm.nih.gov/pmc/
https://www.ncbi.nlm.nih.gov/pmc/
HathiTrust 2008 Bibliographic Scan of Digital Scholarly Communication Infrastructure https://www.hathitrust.org/ https://tapor.ca/tools/1461
https://marketplace.sshopencloud.eu/tool-or-service/VUsxa0


free and open-source software[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Open Science Framework en:OSF Bibliographic Scan of Digital Scholarly Communication Infrastructure
Open Science Thesaurus
The varying openness of digital open science tools
Apache License, Version 2.0 https://osf.io https://tapor.ca/tools/742
https://marketplace.sshopencloud.eu/tool-or-service/ROkULj
QLever QLever: A Query Engine for Efficient SPARQL+Text Search Apache License, Version 2.0


free software[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
World Atlas of Language Structures en:WALS 2008 Creative Commons Attribution 4.0 International http://wals.info
Wikibase GNU General Public License, version 2.0 or later https://wikiba.se/


graph database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Blazegraph awesome RDF github page GNU General Public License, version 2.0
proprietary license
https://www.blazegraph.com/
https://blazegraph.com/
Stardog awesome RDF github page
OntoCommons Report D4.3
proprietary license https://www.stardog.com


image database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
CBCL Face Database http://cbcl.mit.edu/software-datasets/FaceData2.html http://www.ai.mit.edu/courses/6.899/lectures/faces.tar.gz
imSitu Situation Recognition: Visual Semantic Role Labeling for Image Understanding http://imsitu.org/ https://s3.amazonaws.com/my89-frame-annotation/public/of500_images.tar


image dataset[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
ImageNet ImageNet: A large-scale hierarchical image database http://www.image-net.org/
Fashion-MNIST Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms MIT License https://github.com/zalandoresearch/fashion-mnist
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/
CIFAR-10 https://www.cs.toronto.edu/~kriz/cifar.html https://paperswithcode.com/dataset/cifar-10
CIFAR-100 https://www.cs.toronto.edu/~kriz/cifar.html https://paperswithcode.com/dataset/cifar-100
CelebA Deep Learning Face Attributes in the Wild http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
80 Million Tiny Images 80 million tiny images: a large data set for nonparametric object and scene recognition http://people.csail.mit.edu/billf/papers/80millionImages.pdf
The Street View House Numbers Dataset en:SVHN Reading Digits in Natural Images with Unsupervised Feature Learning http://ufldl.stanford.edu/housenumbers/
Hotels-50K Hotels-50K: A Global Hotel Recognition Dataset
2021 Hotel-ID The 2021 Hotel-ID to Combat Human Trafficking Competition Dataset https://www.kaggle.com/c/hotel-id-2021-fgvc8
https://paperswithcode.com/dataset/2021-hotel-id
ObjectNet ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models https://objectnet.dev/


knowledge base[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
DBpedia 2007-01-10 multiple languages DBpedia: A Nucleus for a Web of Open Data
DBpedia - A crystallization point for the Web of Data
DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia
Creative Commons Attribution-ShareAlike 3.0 Unported
Creative Commons CC0 License
GNU General Public License, version 2.0
https://dbpedia.org/
YAGO en:YAGO 2008 Lecture Notes in Computer Science Creative Commons Attribution-ShareAlike 3.0 Unported
Creative Commons Attribution-ShareAlike 4.0 International
http://www.yago-knowledge.org/
http://yago.r2.enst.fr/
https://yago-knowledge.org/
https://tapor.ca/tools/377
https://marketplace.sshopencloud.eu/tool-or-service/xQ9Fe1
GRID en:GRID
en:grid.ac
2015-10-12 English The varying openness of digital open science tools Creative Commons CC0 License https://www.grid.ac/ https://www.digitalscience.ru/products/grid/
lexicographical data in Wikidata 2018-05-23


knowledge graph[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Artificial Intelligence Knowledge Graph AI-KG: An Automatically Generated Knowledge Graph of Artificial Intelligence
CaLiGraph http://caligraph.org/


knowledge graph of science[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Microsoft Academic Graph 2015-06-05 English https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/
Open Research Knowledge Graph en:ORKG http://orkg.org/


lexical database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
WordNet 1998 English WordNet: An Electronic Lexical Database
WordNet: a lexical database for English
BSD licenses https://wordnet.princeton.edu/
FrameNet 1997 English FrameNet: Theory and Practice https://framenet.icsi.berkeley.edu/fndrupal/ https://framenet.icsi.berkeley.edu/fndrupal/WhatIsFrameNet
VerbNet English https://verbs.colorado.edu/verbnet/
NorthEuraLex Creative Commons Attribution-ShareAlike 4.0 International http://northeuralex.org/


oncology[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Gene Ontology mul:go Gene ontology: tool for the unification of biology Creative Commons Attribution 4.0 International http://geneontology.org/
FOAF 2000 http://www.foaf-project.org
http://xmlns.com/foaf/0.1/
http://www.foaf-project.org/
http://xmlns.com/foaf/0.1/
Foundational Model of Anatomy http://si.washington.edu/projects/fma http://si.washington.edu/projects/fma
http://sig.biostr.washington.edu/projects/fm/FME/index.html
Human Phenotype Ontology mul:hp Nucleic Acids Research (NAR) database
The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease
DiNGO: standalone application for Gene Ontology and Human Phenotype Ontology term enrichment analysis
hpo license http://www.human-phenotype-ontology.org/
NCI Thesaurus English The Bioregistry https://ncit.nci.nih.gov/ncitbrowser/ https://bioportal.bioontology.org/ontologies/NCIT
BioAssay Ontology The Bioregistry http://www.bioassayontology.org
Citation Typing Ontology http://www.sparontologies.net/ontologies/cito
Pizza Ontology English
Portuguese
Creative Commons Attribution 3.0 Unported https://protege.stanford.edu/ontologies/pizza/pizza.owl
Cell Expression; Localization; Development and Anatomy Ontology http://nar.oxfordjournals.org.gate.lib.buffalo.edu/content/early/2013/12/03/nar.gkt1264.full.pdf
Cell Ontology mul:cl An ontology for cell types
Hematopoietic cell types: prototype for a revised cell ontology
Logical development of the cell ontology
The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability
The Bioregistry
BioPortal
Ontobee
Creative Commons Attribution 4.0 International https://obophenotype.github.io/cell-ontology/
Chemical Entities of Biological Interest The Bioregistry
Ontobee
BioPortal
http://www.ebi.ac.uk/chebi
Chemical Information Ontology mul:cheminf The Bioregistry
The Chemical Information Ontology: provenance and disambiguation for chemical data on the biological semantic web
BioPortal
Ontobee
Creative Commons CC0 License https://github.com/semanticchemistry/semanticchemistry
http://code.google.com/p/semanticchemistry/
https://github.com/semanticchemistry/semanticchemistry
Protein Ontology mul:pr Nucleic Acids Research (NAR) database Creative Commons Attribution 4.0 International http://proconsortium.org
Extensible Observation Ontology en:OBOE Creative Commons Attribution 3.0 Unported https://github.com/NCEAS/oboe/
Computer Science Ontology The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas https://cso.kmi.open.ac.uk
SuperPattern Ontology Creative Commons CC0 License https://w3id.org/linkflows/superpattern/
eNanoMapper ontology eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment Creative Commons Attribution-ShareAlike https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-015-0005-5


online database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Scopus 2004-03-15 English
Japanese
Simplified Chinese
Traditional Chinese
Russian
Bibliographic Scan of Digital Scholarly Communication Infrastructure https://www.scopus.com https://dbis.uni-regensburg.de/frontdoor.php?titel_id=3636
International Plant Names Index 1999 English https://www.ipni.org/
Directory of Open Access Journals en:DOAJ 2003 Bibliographic Scan of Digital Scholarly Communication Infrastructure
Open Science Thesaurus
The varying openness of digital open science tools
https://doaj.org/ https://tapor.ca/tools/1264
Freebase 2007-03 English Freebase: a collaboratively created graph database for structuring human knowledge Apache License http://www.freebase.com https://tapor.ca/tools/1302
https://marketplace.sshopencloud.eu/tool-or-service/FmChcn
Amphibian Species of the World English https://amphibiansoftheworld.amnh.org/
Global Invasive Species Database English http://www.iucngisd.org/gisd/
Gene Expression Omnibus The varying openness of digital open science tools https://www.ncbi.nlm.nih.gov/geo/
United States Department of Agriculture Plants Database English http://plants.usda.gov/about_plants.html
Neurosynth NeuroSynth: a new platform for large-scale automated synthesis of human functional neuroimaging data
Decoding the large-scale structure of brain function by classifying mental States across individuals
http://neurosynth.org/
PROSPERO https://www.crd.york.ac.uk/PROSPERO/
Global Names Architecture http://gni.globalnames.org/
OpenStreetMap database en:OSM British English Open Database License


open-access repository[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
CiteSeer http://citeseer.ist.psu.edu
Figshare 2011-01-12 Bibliographic Scan of Digital Scholarly Communication Infrastructure
Open Science Thesaurus
The varying openness of digital open science tools
Directory of Open Access Preprint Repositories
https://figshare.com/ https://tapor.ca/tools/1045
https://marketplace.sshopencloud.eu/tool-or-service/mdEbYT


open-source software[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Virtuoso Universal Server awesome RDF github page
OntoCommons Report D4.3
GNU General Public License, version 2.0
proprietary license
https://virtuoso.openlinksw.com/
Apache Jena Fuseki Apache License, Version 2.0 https://jena.apache.org/documentation/fuseki2/index.html


organization[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
World Register of Marine Species en:WoRMS 2008 English Creative Commons Attribution 4.0 International https://www.marinespecies.org
Orphanet 1997 English
French
Spanish
German
Italian
Portuguese
Dutch
Polish
Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users
The Bioregistry
Creative Commons Attribution-NoDerivs 3.0 Unported https://orpha.net


question-answering dataset[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Stanford Question Answering Dataset en:SQuAD 2016 English SQuAD: 100,000+ Questions for Machine Comprehension of Text Creative Commons Attribution-ShareAlike 4.0 International https://rajpurkar.github.io/SQuAD-explorer/
CuratedTREC
WebQuestions
WikiMovies
QALD2017 Task 4: English question answering over Wikidata English
Brazilian Portuguese
German
Spanish
Italian
French
Dutch
Hindi
Romanian
Persian
https://project-hobbit.eu/challenges/qald2017/qald2017-challenge-tasks/
SimpleQuestions en:SQ 2015 Large-scale Simple Question Answering with Memory Networks https://research.fb.com/downloads/babi/ https://www.dropbox.com/s/tohrsllcfy7rch4/SimpleQuestions_v2.tgz
NewsQA 2016 NewsQA: A Machine Comprehension Dataset https://github.com/Maluuba/newsqa
https://datasets.maluuba.com/NewsQA
WikiQA 2015 English WikiQA: A Challenge Dataset for Open-Domain Question Answering https://www.microsoft.com/en-us/download/details.aspx?id=52419
MS MARCO English MS MARCO: A Human Generated MAchine Reading COmprehension Dataset http://www.msmarco.org
Free917 Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
LC-QuAD LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs http://lc-quad.sda.tech/
SQuAD2.0 2018 English Know What You Don't Know: Unanswerable Questions for SQuAD https://rajpurkar.github.io/SQuAD-explorer/
TriviaQA English TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension http://nlp.cs.washington.edu/triviaqa/ https://competitions.codalab.org/competitions/17208
HotpotQA English HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering https://hotpotqa.github.io/
Natural Questions English Natural Questions: A Benchmark for Question Answering Research https://ai.google.com/research/NaturalQuestions
WikiHop English Constructing Datasets for Multi-hop Reading Comprehension Across Documents http://qangaroo.cs.ucl.ac.uk/
QALD-9 2019 Russian
Portuguese
English
Hindi
Persian
Italian
French
Romanian
Spanish
Dutch
http://2018.nliwod.org/challenge
GQA English GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering https://cs.stanford.edu/people/dorarad/gqa/
WebQSP
SimpleQuestions-WD 2017 English Question Answering Benchmarks for Wikidata
LC-QuAD 2.0 English LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia http://lc-quad.sda.tech/
CLC-QuAD Chinese A Chinese Multi-type Complex Questions Answering Dataset over Wikidata
QALD-9-plus English QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers Creative Commons Attribution 4.0 International https://figshare.com/articles/dataset/QALD-9_/16864273
BoolQ English BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions Creative Commons Attribution-ShareAlike 3.0 Unported
PubMedQA English PubMedQA: A Dataset for Biomedical Research Question Answering https://pubmedqa.github.io/
MedMCQA English https://medmcqa.github.io/
MedQA English What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams https://github.com/jind11/MedQA
QALD-5
GSM8K English Training Verifiers to Solve Math Word Problems https://github.com/openai/grade-school-math https://openai.com/research/solving-math-word-problems


semantic network[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
GermaNet German http://www.sfs.uni-tuebingen.de/GermaNet/
ConceptNet https://www.conceptnet.io/


software[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
BabelNet multiple languages https://babelnet.org/
BridgeDb Providing gene-to-variant and variant-to-gene database identifier mappings to use with BridgeDb mapping services
The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services
https://www.bridgedb.org/
https://bridgedb.github.io/


text corpus[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
British National Corpus 1994 British English http://www.natcorp.ox.ac.uk/
Brown Corpus English
Europarl corpus version 7 Danish
Dutch
British English
Finnish
French
German
Greek
Italian
Portuguese
Spanish
Swedish
http://www.statmt.org/europarl/
National Corpus of Polish http://www.nkjp.pl/
http://www.nkjp.uni.lodz.pl/
Stanford Natural Language Inference corpus en:SNLI corpus 2015 English A large annotated corpus for learning natural language inference Creative Commons Attribution-ShareAlike 4.0 International https://nlp.stanford.edu/projects/snli/
Leipzig Corpora Collection Corpus Portal for Search in Monolingual Corpora http://corpora.uni-leipzig.de https://tapor.ca/tools/1554
https://marketplace.sshopencloud.eu/tool-or-service/y4fvez
UMBC corpus UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems
SemCor English https://web.eecs.umich.edu/~mihalcea/downloads.html#semcor
Daily Mail dataset English https://github.com/deepmind/rc-data/
Newsroom dataset English Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies https://summari.es/
TinyStories English TinyStories: How Small Can Language Models Be and Still Speak Coherent English? https://huggingface.co/datasets/roneneldan/TinyStories


trait database[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
AmphiBIO
TRY 2007 TRY - a global database of plant traits http://www.try-db.org/


treebank[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Penn Treebank English https://catalog.ldc.upenn.edu/ldc99t42
Hamburg Dependency Treebank German


video streaming service[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
YouTube en:YT 2005-02-14 multiple languages Lentapedia
Bibliographic Scan of Digital Scholarly Communication Infrastructure
end-user license agreement https://www.youtube.com/
PlayStation Now 2014 https://www.playstation.com/ps-now


voice dataset[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Common Voice 2017-06-19 multiple languages Common Voice: A Massively-Multilingual Speech Corpus Creative Commons CC0 License https://commonvoice.mozilla.org/
LibriSpeech Librispeech: An ASR corpus based on public domain audio books Creative Commons Attribution 4.0 International
VoxPopuli VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation https://github.com/facebookresearch/voxpopuli
VoxLingua107 VoxLingua107: a Dataset for Spoken Language Recognition http://bark.phon.ioc.ee/voxlingua107/


website[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
PubMed en:PM 1997 English Nucleic Acids Research (NAR) database https://pubmed.ncbi.nlm.nih.gov/
https://pmlegacy.ncbi.nlm.nih.gov
LibraryThing 2005-08-29 LibraryThing: A Review https://librarything.com
DNA Data Bank of Japan Creative Commons Attribution 2.1 Japan http://www.ddbj.nig.ac.jp/
ScienceDirect 1997-03 English The Serials Librarian
The varying openness of digital open science tools
https://www.sciencedirect.com/
Media Cloud 2009 https://mediacloud.org
Semantic Scholar Semantic Scholar.
Bibliographic Scan of Digital Scholarly Communication Infrastructure
https://www.semanticscholar.org
Nextstrain https://nextstrain.org/
Dimensions 2018-01-15
2014
English Bibliographic Scan of Digital Scholarly Communication Infrastructure
The varying openness of digital open science tools
https://app.dimensions.ai/discover/publication
https://www.dimensions.ai


word analogy dataset[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
Google analogy test set English Efficient Estimation of Word Representations in Vector Space http://download.tensorflow.org/data/questions-words.txt https://www.aclweb.org/aclwiki/index.php?title=Google_analogy_test_set_(State_of_the_art)
Microsoft Research Syntactic Analogies Dataset English Linguistic Regularities in Continuous Space Word Representations https://aclweb.org/aclwiki/Syntactic_Analogies_(State_of_the_art)
SemEval 2012 Task 2 dataset English SemEval-2012 task 2: measuring degrees of relational similarity https://aclweb.org/aclwiki/SemEval-2012_Task_2_(State_of_the_art)


word net[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
plWordNet 2005 Polish BSD licenses http://plwordnet.pwr.wroc.pl
DanNet 2009 Danish DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary MIT License
Creative Commons Attribution 4.0 International
http://www.wordnet.dk/
https://cst.ku.dk/projekter/dannet/
http://www.wordnet.dk/owl/instance/
Arabic WordNet 2006 Arabic The Use of Arabic WordNet in Arabic Information Retrieval
Chinese WordNet da:CWN Constructing chinese wordnet: Design principles and implementation
MultiWordnet of Portuguese en:MWN.PT Portuguese
KeNet Turkish Constructing a WordNet for Turkish Using Manual and Automatic Annotation http://haydut.isikun.edu.tr/kenet.html
odenet de:odenet German https://ikum.mediencampus.h-da.de/projekt/open-de-wordnet-initiative/


word similarity dataset[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
WordSim-353 en:WS-353 English Placing search in context: the concept revisited http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
http://gabrilovich.com/resources/data/wordsim353/
http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.zip
http://gabrilovich.com/resources/data/wordsim353/wordsim353.zip
https://aclweb.org/aclwiki/WordSimilarity-353_Test_Collection_(State_of_the_art)
SimVerb-3500 English SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity http://people.ds.cam.ac.uk/dsg40/simverb.html http://people.ds.cam.ac.uk/dsg40/paper/simverb/simverb-3500-data.zip
SimLex-999 English SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation https://www.cl.cam.ac.uk/~fh295/simlex.html https://www.cl.cam.ac.uk/~fh295/SimLex-999.zip https://aclweb.org/aclwiki/SimLex-999_(State_of_the_art)
Miller-Charles dataset en:MC English Contextual correlates of semantic similarity https://aclweb.org/aclwiki/MC-28_Test_Collection_(State_of_the_art)
MEN Test Collection 2012 English Multimodal distributional semantics http://clic.cimec.unitn.it/~elia.bruni/MEN.html http://clic.cimec.unitn.it/~elia.bruni/resources/MEN.zip
ConceptSim English Evaluating Semantic Metrics on Tasks of Concept Similarity https://www.seas.upenn.edu/~hansens/conceptSim/ https://www.seas.upenn.edu/~hansens/conceptSim/ConceptSim.tar.gz
Rubenstein-Goodenough dataset en:RG English Contextual correlates of synonymy https://aclweb.org/aclwiki/RG-65_Test_Collection_(State_of_the_art)
Rare Word Dataset en:RW English Better Word Representations with Recursive Neural Networks for Morphology
Stanford Contextual Word Similarity en:SCWS English Improving Word Representations via Global Context and Multiple Word Prototypes
Semantic and Visual Similarity Judgements for Concept Pairs English Learning Grounded Meaning Representations with Autoencoders http://homepages.inf.ed.ac.uk/s1151656/resources.html
YP130 English Measuring semantic similarity in the taxonomy of WordNet
Danish similarity dataset Danish Towards a Gold Standard for Evaluating Danish Word Embeddings https://github.com/kuhumcst/Danish-Similarity-Dataset


Misc[edit]

artikel short name inception language of work or name described by source copyright license official website URL described at URL
GitHub 2007-10-19 English The varying openness of digital open science tools https://github.com https://tapor.ca/tools/1410
https://marketplace.sshopencloud.eu/tool-or-service/HuuGPE
Wikidata mul:WD
ru:ВД
be-tarask:ВЗ
2012-10-29 multiple languages Wikidata: A Free Collaborative Knowledgebase
Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph
Wikidata: A New Platform for Collaborative Data Collection
Wikidata: How We Brought Structured Data to Wikipedia
Creative Commons CC0 License
Creative Commons Attribution-ShareAlike 3.0 Unported
Creative Commons Attribution-ShareAlike 4.0 International
https://wikidata.org/ http://viaf.org/viaf/partnerpages/WKP.html
https://dashboard.wikiedu.org/training/wikidata-professional
https://www.theguardian.com/news/datablog/2013/apr/26/wikidata-launch
https://iccl.inf.tu-dresden.de/web/Wikidata
Google Drive 2012-04-24 freeware https://www.drive.google.com/
https://www.google.com/intl/el/drive/
https://drive.google.com/ https://tapor.ca/tools/973
https://marketplace.sshopencloud.eu/tool-or-service/slb8ue
MusicBrainz 2000-07-17 English https://musicbrainz.org/
arXiv 1991-08-14 English Bibliographic Scan of Digital Scholarly Communication Infrastructure
Open Science Thesaurus
The Bioregistry
Directory of Open Access Preprint Repositories
Creative Commons CC0 License https://arxiv.org
http://xxx.lanl.gov
http://export.arxiv.org/oai2
https://arxiv.org/help/rss
OmegaWiki 2005-12-27 multiple languages Creative Commons CC0 License http://omegawiki.org/
http://www.omegawiki.org/Meta:Main_Page/it
http://www.omegawiki.org/Meta:Main_Page/fr
http://www.omegawiki.org/Meta:Main_Page/es
http://www.omegawiki.org/Meta:Main_Page/ja
http://www.omegawiki.org/Meta:Main_Page/zh
http://www.omegawiki.org/Meta:Main_Page/ia
Urban Dictionary 1999 English https://www.urbandictionary.com
Google Scholar 2004-11 English
German
Spanish
French
Catalan
Czech
Danish
Filipino
Croatian
Indonesian
Latvian
Lithuanian
Hungarian
Dutch
Norwegian
Polish
Portuguese
Brazilian Portuguese
Romanian
Slovak
Slovene
Finnish
Swedish
Vietnamese
Turkish
Greek
Bulgarian
Russian
Serbian
Ukrainian
Hebrew
Arabic
Persian
Hindi
Thai
Korean
Simplified Chinese
Standard Taiwanese Mandarin
Japanese
Bibliographic Scan of Digital Scholarly Communication Infrastructure
The varying openness of digital open science tools
Google Scholar
Google Scholar
Google Scholar
https://scholar.google.com/
Protein Data Bank 1971 English The Bioregistry http://www.wwpdb.org/ https://www.rcsb.org/pages/about-us/history
GeoNames 2005 English Creative Commons Attribution 3.0 Unported http://www.geonames.org/ https://tapor.ca/tools/1371
https://marketplace.sshopencloud.eu/tool-or-service/SF6SHP
GenBank Nucleic Acids Research (NAR) database http://www.ncbi.nlm.nih.gov/genbank/
Common Terminology Criteria for Adverse Events
Global Biodiversity Information Facility en:GBIF 2001 multiple languages The varying openness of digital open science tools https://www.gbif.org
Neo4j 2010-02 GNU General Public License, version 3.0
GNU Affero General Public License, version 3.0
https://neo4j.com/
Apache Jena 2012-07-03 Apache License, Version 2.0 https://jena.apache.org/
GISAID 2006 https://gisaid.org/
https://platform.gisaid.org
Schema.org 2011-06-02 Creative Commons Attribution-ShareAlike 3.0 Unported https://schema.org https://www.w3.org/community/schemaorg/
The Cancer Genome Atlas
The Dark Energy Survey en:DES
de:DES
http://www.darkenergysurvey.org
Anaconda The varying openness of digital open science tools freemium https://anaconda.com/
Zenodo en:Zenodo 2013-05 Bibliographic Scan of Digital Scholarly Communication Infrastructure
Open Science Thesaurus
The varying openness of digital open science tools
Directory of Open Access Preprint Repositories
GNU General Public License, version 2.0 https://zenodo.org https://tapor.ca/tools/596
https://marketplace.sshopencloud.eu/tool-or-service/SA186o
Zachary's karate club An Information Flow Model for Conflict and Fission in Small Groups
CEUR Workshop Proceedings en:CEUR-WS 1995 multiple languages The varying openness of digital open science tools http://ceur-ws.org/
Mix'n'match 2013-11 GNU General Public License, version 2.0 https://mix-n-match.toolforge.org http://magnusmanske.de/wordpress/?p=114
Ontotext GraphDB awesome RDF github page
OntoCommons Report D4.3
proprietary license http://graphdb.ontotext.com
DanPASS Danish https://danpass.hum.ku.dk/
Danish Twin Register 1953 https://www.sdu.dk/DTR
Prosopographic and Social Network Database of the Tang and Five Dynasties The Evolution of the Tang Political Elite and its Marriage Network
Archive of Formal Proofs 2004 English GNU Lesser General Public License
BSD licenses
https://www.isa-afp.org/
http://www.afp.sourceforge.net
The Bioregistry Unifying the Identification of Biomedical Entities with the Bioregistry Creative Commons CC0 License
MIT License
https://bioregistry.io
Protomaps 3-clause BSD License https://protomaps.com/
Early Life Exposures in Mexico to ENvironmental Toxicants (ELEMENT) en:ELEMENT https://sph.umich.edu/cehc/element/index.html
End of automatically generated list.