User:Fantasticfears/CBDB proposal

From Wikidata
Jump to navigation Jump to search

I've imported a huge amount of statements linked with CBDB people. Old discussion can be find here Wikidata:Data Import Hub#CBDB. I realized that a bot for such a import is required. You're welcome to leave some comments.

Criteria and notes[edit]

The item should fulfill Wikidata's notability. Here's the explanation.

  • Notable historical people
Most people in the database should be imported except their wives. Most wives are not notable so they don't have a name. For example, 王氏. 氏 means someone's wife so it doesn't really add anything.
  • CBDB ID
CBDB is a highly trustworthy datasource. Every item needs the link to it by Property:P497.
  • Scale
ID ends with 366589.
  • Language variant
CBDB uses traditional Chinese. So labels are stored as zh-hant. Note that you can store labels in different variants but it's not useful. I believe zhwiki_p's convention makes sense.

Phase 1: Status, birth, places[edit]

Notes:

  • STATUS_CODES, STATUS_DATA have occupation and status of individuals.
  • BIOG_MAIN has birth year and death year data of many people.
  • BIOG_ADDR_CODES, BIOG_ADDR_DATA have a list of years about moving records. There are many reasons about moving. Not all of them have existing entities in Wikidata.
  • ENTRY_TYPES, ENTY_DATA have records about how they became an official. Not all of reasons have existing entities in Wikidata.

Challenges[edit]

  • Not all status can be linked with an existing entity. For example, confucian learning is called 儒學. We can't find this entity somewhere and create a new one that seems violates Wikidata's notability if no other entities utilize that. A expert may help to calibrate this with the cultural difference.
  • If we are going to import partial data. How can we prevent duplicates next time? I.e. how can we know what we have imported.
  • pywikibot don't check duplicates :(

Phase 2: Publications[edit]

PENDING.

Notes:

  • TEXT_DATA, TEXT_CODES have a list of publications with titles along with authors. It would be awesome to link with Wikisource articles.