Wikidata:Property proposal/OpenStates Person ID

From Wikidata
Jump to navigation Jump to search

OpenStates person ID[edit]

Originally proposed at Wikidata:Property proposal/Person

Descriptionidentifier for person entries in OpenStates.org
Representsperson
Data typeExternal identifier
Domainhuman (Q5)
Allowed valuesocd-person\b\/[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}
Example 1Rick Outman (Q7331626) → ocd-person/9c525554-fb85-45f3-87fa-cf6a3208ce32
Example 2Adam Hollier (Q76373561) → ocd-person/5417a310-a276-4035-8dc8-f536c61db49d
Example 3Jeff Irwin (Q16216607) → ocd-person/a1c1163d-af8d-4b88-9365-07717d4a45c3
Example 4Kevin Daley (Q16730764) → ocd-person/cf7ed8de-2df4-41ff-9dcb-489980b031a0
Sourcehttps://openstates.org/data/
Planned useCross-linking of Wikidata and OpenStates.org data. Used in initial resolving of entities via Mix-n-match followed by verification of (party, given name, family name, district) and sync of external IDs. The resolved Wikidata Q will be added as a new field to OpenStates for cross linking between Wikidata and OpenStates entities.
Number of IDs in source7997 as of 20220128
Expected completenessalways incomplete (Q21873886)
Robot and gadget jobsChecking properties for consistency and cross check changes in data
Applicable "stated in"-valueOpen States (Q54449686)
Distinct-values constraintyes
Wikidata projectThe ID intersects with Wikiproject Every politician's goal for persons holding positions in the US Government federal, state, and local levels.

Motivation[edit]

Existing data collection and cleaning of common political figure data by OpenStates as CC0 which does not currently interface with Wikidata, but directly aligns with the goals of Wikiproject Every Politician. Adding a property for an OpenStates people ID enables a bidirectional linking, import, and checks against the data work by both the Wikidata community and OpenStates. This may help with updates after each U.S. election or event. This may also assist with easier use of related Wikidata entries and Wikimedia properties through the existing notable consumers of OpenStates. Wolfgang8741 (talk) 22:32, 31 January 2022 (UTC)[reply]

Discussion[edit]

A point of discussion to complement this proposal is how best to capture both the UUID and computed ID for the web URL. The UUID is ocd-person/9c525554-fb85-45f3-87fa-cf6a3208ce32 the ID for the web URL is computed from the full name and UUID as a base62 slug-hash and relies on the full name from an OpenStates entry to compute reliably. An example instance https://openstates.org/person/rick-outman-4kyQcmzxj3evAoxO2Tx3OU/. It seems it would be important to capture the UUID then compute or import the OpenStates Computed ID to link back to the OpenStates website page for the person. This may as well be done through a second property such as OpenStates Computed ID.

Consensus on this approach could enable further data linking and integration with OpenStates. OpenStates was contacted on their Slack prior to this proposal to check feasibility of integrating a WikiData ID and are open to integrating back the Wikidata Identifier to each corresponding entry. Wolfgang8741 (talk) 22:32, 31 January 2022 (UTC)[reply]

Second property for the computed ID sounds ok to me. ArthurPSmith (talk) 19:00, 1 February 2022 (UTC)[reply]
  • It would be preferable to determine a single identifier with a formatter URL that could be included in Wikidata.
BTW what happens when the spelling of a person's name is slightly changed? Does that recalculate everything? --- Jura 18:27, 3 February 2022 (UTC)[reply]
@Jura1 I agree a single identifier with a formatter URL would be ideal. After some messaging with a maintainer of OpenStates, a slug change would redirect to a canonical slug-hash ID URL. This does mean at this time that any dumps used for the import need to compute this ID from the full name key and ID key using the slug-base62 conversions of each respective field. The OpenStates API can lookup data with just the base62 hash supplied. Should I refactor the proposal based on use of the slug-hash ID and formatter URL? Example 5 would become Rick Outman (Q7331626) → rick-outman-4kyQcmzxj3evAoxO2Tx3OU for formatted link [1]https://openstates.org/person/rick-outman-4kyQcmzxj3evAoxO2Tx3OU/ and yes, the slug may have more than two dashes depending on the name of the individual, for example: [2]https://github.com/openstates/people/blob/main/data/mi/legislature/Cynthia-A-Johnson-8579a776-ec6d-4239-b1a2-5e89dd3cee49.yml Wolfgang8741 (talk) 22:58, 9 February 2022 (UTC)[reply]
@Jura1@ArthurPSmith Most of the data is now resolved to Wikidata Entities in the mix'n'match catalog 5046 pending final Proposal ID format approval. I did find in practice the redirects have a bug and still encourage including the qualifier property of the ocd-person/ format as it affords access to the underlying data while the computed URL is not included in the raw data. We can make the computed ID the primary ID and the ocd-person the qualifier ID from a constraint perspective. Wolfgang8741 (talk) 16:53, 22 February 2022 (UTC)[reply]

WikiProject every politician has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.