Wikidata:WikiProject PCC Wikidata Pilot/Smithsonian Libraries/Projects/Smithsonian Research Online
Aim and Scope
[edit]This project aims to use our knowledge of Smithsonian researchers to explore how they are represented in Wikidata. The information source will mainly be the Smithsonian Libraries research output tracking program Smithsonian Research Online and the VIVO instance branded SI Profiles. We want to clearly identify those individuals and their related Smithsonian organization. It would be fabulous if we could find a way to connect citations of scholarly products credited to the individual.
Background
[edit]Since 2008, the Smithsonian Libraries has been tracking the research output of the Smithsonian. More recently, a VIVO instance was launched. See SI Research Online and Smithsonian Profiles
Timeline
[edit]- 22 November2020: complete initial draft project plan
- Fall/Winter 2020/2021: determine priorities, finalize core fields
- Spring 2021: determine workflow, tools, and test reconciliation and bulk loads
- Summer 2021:complete data investigation for missing values
Contributors
[edit]- Suzanne Pilsk Pilsks - Smithsonian Libraries and Archives Metadata Department
- Richard Naples Drastician - Smithsonian Libraries and Archives Metadata Department
- Amy Watson WatsonAmy - Smithsonian Libraries and Archives Resource Description Department
- Deborah Shapiro BornOnTheCob - Archives, Smithsonian Libraries and Archives
Workflow
[edit]Authors
- Identify authors in SRO that are already in Wikidata
- Identify authors in SRO that are worthy of Wikidata but not represented (use of SI Profiles Project)
- Determine what we deem as core properties are missing in wikidata for authors
Publications
- Relate existing publications in Wikidata to known Smithsonian authors
- Contribute publication data and associate with Smithsonian authors
Organizations
- Determine what Smithsonian organizations are already in Wikidata
- Determine what Smithsonian organizations are missing from Wikidata that will be critical for SRO
- Determine what we deem as core properties are missing in wikidata for SI organizations
Tasks
[edit]Orgs
[edit]- Export Profiles Orgs.
- Compare to what is in Wikidata as "part of" and "parent organization" = Smithsonian.
- Look at and reconcile those that are not a one-to-one match.
- Edit record so all appropriate orgs get both "part of" and "parent organization"
- Set up Google Sheet with all appropriate orgs and matching data model properties.
- Add missing orgs
- Create a google/excel sheet with Q items with P# as headers, and fill in this data. Label=Len, Alias=Aen, Description=Den (verify with documentation)
- Move into OpenRefine (do we reconcile, or do we use Q# in the sheet?)
- Generate schema in OpenRefine to send to Wikidata
- Check, double check, triple check, and do it in small batches
- Fill in gaps
- Fix discrepancies
- Add missing orgs
People
[edit]- Work on the Properties below, to decide what ones we can provide data for
Example http://www.wikidata.org/entity/Q88485672 Andrea Quattrini Example https://www.wikidata.org/wiki/Q19060876 http://www.wikidata.org/entity/Q19060876 Victoria Funk
- Pull from Profiles all the people according to duty station (matching our orgs)
- Take those people and see if they are in Wikidata
- Add missing people
Decisions for data modeling
[edit]- employer (P108) - Smithsonian Institution AND Museum both
- occupation (P106) - Work on the list. Occupation (P106) vs Field of Work (P101). Profession (Q28640) vs Position (Q4164871) needs clarification.
- instance of (P31) - Work on the list instance of
Item Label, Description, and Aliases
[edit]Property | Value | Usage note |
---|---|---|
Label | Person's name as given on Smithsonian website. | Researchers typically have a preferred name by which they refer to themselves while publishing, and are known by in the professional world. Use the name they typically go by FirstName MiddleName LastName, suffix format. |
Description | For items without existing descriptions, model on nearest neighbor | Consistency decisions need to be documented |
Alias (For People) | name string variants commonly found in publications | This will include the various ways researchers names are used in literature and citation indexes. Aliases examples: Lastname, Firstname M., FM Lastname, etc. |
Alias (For SI Organizations) | abbreviations and other name variants | Various forms of the units and departments alternatives |
People Properties
[edit]Core for People
[edit]Property | Value | Notes |
---|---|---|
instance of (P31) | human (Q5) | individual must have human/person |
Employer (P108) | Smithsonian Institution (Q131626) | |
Field of work (P101) | Must add department level (see Stanford's example) for Natural History people. In the future we can determine success to apply to other Museums (philatelist vs historian) based off of active directory. Botany Department - botany; NH-Entomology field of work would be entomology. Not add entomologist for occupation | |
Position held (P39) | If significant | |
ORCID iD (P496) | ||
ISNI (P213) |
Extended
[edit]Property | Value | Notes |
---|---|---|
Occupation (P106) | Directors of Museums will use both 1) occupation = museum director and 2) employer - positions held (with dates) | |
date of birth (P569) | Only if publicly available | |
place of birth (P19) | Only if publicly available | |
date of death (P570) | Only if publicly available | |
place of death (P20) | Only if publicly available | |
educated at (P69) | where the person was educated, at any level | see also qualifier usage |
educated at (P69) qualifier: academic degree (P512) | academic degree earned at that institution | e.g., bachelor's degree (Q163727), master's degree (Q183816), doctorate (Q849697) |
educated at (P69) qualifier: start time (P580) | year in which the degree was started | |
educated at (P69) qualifier: end time (P582) | year in which the degree was completed | |
educated at (P69) qualifier: point in time (P585) | year in which the degree was awarded, if start date not known | |
educated at (P69) qualifier: academic major (P812) | academic major or discipline that was the focus of the degree | |
educated at (P69) qualifier: academic thesis (P1026) | item representing the doctoral dissertation (work) | |
educated at (P69) qualifier: doctoral advisor (P184) | primary advisor of the doctoral dissertation | |
award received (P166) qualifier: for work (P1686) | This is for prizes and honors, not grants awarded. qualifier of award received (P166) is to specify the work that an award was given to the creator for | |
VIAF ID (P214) | ||
notable work (P800) | Only use this for major academic works. This might be a lower priority, actually. |
Graph of Smithsonian People
[edit]People who have employer, affiliation, or member of Smithsonian, in a graph based on co-authorship.
Organization Properties
[edit]Core
[edit]Property | Value | Notes |
---|---|---|
instance of (P31) | organization (Q43229) art museum (Q3196771) history museum (Q16735822) science museum (Q588140) museum (Q33506) research center (Q7315155) zoo (Q43501) Organization must have something that is non human |
Museum? Research Facility? natural history museum, art museum? Q3196771 art museum institution (not building - building is Q207694)See below for current terms used by SI organizations |
inception (P571) | ||
parent organization (P749) | Smithsonian Institution (Q131626) | both parent organization and part of |
part of (P361) | Smithsonian Institution (Q131626) | both part of and parent organization |
official website (P856) | ||
ISNI (P213) | ||
located in the administrative territorial entity (P131) | Washington, D.C. (Q61) Manhattan (Q11299) Prince George's County (Q26807) Cambridge (Q49111) Anne Arundel County (Q488701) Panama City (Q3306) |
Property | Value | Notes |
---|---|---|
alias | NMNH, Smithsonian's National Museum of Natural History, Natural History Museum | |
instance of (P31) | natural history museum (Q1970365) | repeat instance or use subclass? Pilsks (talk) 20:08, 5 November 2020 (UTC) |
inception (P571) | 17 March 1910 | |
parent organization (P749) | Smithsonian Institution Q131626 | |
part of (P361) | Smithsonian Institution Q131626 | |
official website (P856) | https://naturalhistory.si.edu/ | |
ISNI (P213) | 0000 0001 2364 2127 | |
located in the administrative territorial entity (P131) | Washington, D.C. (Q61) |
Items Part of / Parent Organization is Smithsonian
[edit]
Graph display of Organizations part of (P361) Smithsonian, 4 levels down
Extended
[edit]Property | Value | Notes |
---|---|---|
director / manager (P1037) | qualifier: start time (P580) | |
country (P17) | United States of America (Q30) Panama (Q804) |
|
street address (P6375) | ||
has part(s) (P527) | ||
has subsidiary (P355) | ||
VIAF ID (P214) | ||
Open Funder Registry funder ID (P3153) | ||
GRID ID (P2427) | This should change to ROR |
Items Part of / Parent is Smithsonian (Extended)
[edit]Vocabularies
[edit]instance of (P31) for SI organizations already in Wikidata
[edit]part of (P361) Smithsonian, aka Smithsonian Organizations
[edit]These are organizations that have property "part of (P361)" with Smithsonian Institution (Q131626).
Questions And Notes
[edit]Use Cases Smithsonian Research Online
[edit]Inquiry | Query | Discussion |
---|---|---|
Find people who have authored something and are associated with the Smithsonian in some way. | https://w.wiki/3Kem https://w.wiki/3Kf5 https://w.wiki/3KfQ |
Discussion |
Find scientists at the Smithsonian that work in botany | https://w.wiki/3J4d https://w.wiki/3J4f |
Discussion |
Find all people affiliated with Smithsonian and report the respective organization/research center/ in Smithsonian - but not bring back the building of the same name: National Museum of Natural History - building vs concept. National Museum of Women (only a concept right now) | https://w.wiki/3MtY | Problem statment |
Find all publications with authors associated with organization/research center/ department of the Smithsonian (all publications by authors at National Museum of Natural History) | https://w.wiki/3MsV | Discussion |
Project Year-end report
[edit]The purpose of participating in the PCC Wikidata Pilot Project was to model people and organizations associated with Smithsonian Research Online. Initially the project had bold goals of creating Wikidata items for all the current published authors, the associated Smithsonian research facility/museum, and connect publications. The data modeling produced interesting discussion around practical issues, clarity of meaning, and ethical issues. Staff came away from working on this project with practical and theoretical data modeling skills to be applied to our local wikibase installation (once it is established). Staff used a variety of tools and services to query, reconcile, and push data. At the end of the year we had narrowed our goals for enhancing Wikidata to updating accurate core information on Smithsonian museums and reviewing a select few of the researchers already with a presence in Wikidata.