User:PangolinMexico/wasian

From Wikidata
Jump to navigation Jump to search

Documentation and Archiving for What's in a Name?: Automatically Automatically identifying first and last author names for Wikicite and Wikidata for Outreachy Round 24[edit]

Project Description[edit]

Names are really complex. Which part is the first name? Which is the middle name? How do you define your surname? What happens if you have multiple family names? How do names work across multiple languages and cultures?

Accurately recording this information is important for scientific references that are used in Wikipedia articles and Wikidata items - but if it is wrong, then it's easy to miss-attribute publications, or miss connections between different works by the same author. It's also very difficult to get right, since this is very complex, particularly between different languages.

This project will focus on understanding what makes a name, and how it can be recorded in structured data, across many languages and conventions. The project focuses on Wikidata, which is the structured data repository linked to Wikipedia and the other Wikimedia projects. Wikidata holds records of millions of scientific publications as part of WikiCite. However, identifying individual author names and linking between their different publications is still in its early stages.

In this project, you will use currently available Bibtex author information to split author names into 'first' and 'last' names, and you will add this information to thousands of Wikidata items using Pywikibot. You will explore other approaches to identifying first and last names, potentially including machine learning, to see how reliably you can identify first/last names.

Progress[edit]

AuthorBot (hopefully run by PangolinBot): Roberto's project[edit]

AuthorBot automatically adds missing author information to scholarly articles on Wikidata.

ADSEnglishBot: Feliciss's project[edit]

ADSEnglishBot automatically creates scholarly articles and authors from the ADS database to Wikidata.