User:Magnus Manske/Mix'n'match date import
Jump to navigation
Jump to search
- Mix'n'match has lots of "people entries" (biographies) from various third-party catalogs, both matched and unmatched to Wikidata
- Many of these people entries have birth and/or death dates in their description
- Reliably extracting these generically is hard/impossible
- I wrote a script that has specific code (mostly, regular expressions) for each catalog, where possible
- The aim is to reliably extract dates from a specific catalog, to ensure the date is what the catalog states; some dates may be skipped if they are in unusual form, contain "fuzzy" keywords like "died before", etc.
- This has yielded ~2.7 million birth and/or death dates (years, year-month, or year-month-day), so far
- These are stored in a separate table in Mix'n'match
- That data are used when creating a new Wikidata item from Mix'n'match
- That data can also be used by a bot to add dates (where missing) and/or references to birth/death statements. Catalog-independent bot code exists:
- A test edit is here.
- That data can also be used to find new matches, e.g., two Mix'n'match entries with identical, day-specific birth and death dates, with one entry matched to Wikidata but not the other, yields a strong candidate for matching to the same item. Reconciling this could become a separate function in Mix'n'match, or a Game. No code exists yet
Technical
[edit]- If you have a ToolForge (formerly "WMF Labs") user account, you can access the dates in the public-readable database "s51434__mixnmatch_p"; table is "person_dates", field "entry_id" links to "entry.id"
- The code for script extracting the dates from the catalogs is here
- The code for (preliminary) bot script to add dates and/or references to Wikidata is here