User:Feliciss/Sandbox
ADSBot English Paper[edit]
ADSBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Feliciss (talk • contribs • logs)
Task/s: Importing scholarly articles from ADS database to Wikidata, by creating Wikidata Item of a scholarly article (optionally author items) and adding statements and statements-related properties to the item. Part of Outreachy Round 24.
Code: import_papers_from_ads.py
Function details: --Feliciss (talk) 08:16, 15 July 2022 (UTC)
- Search all surnames in English from Wikidata. There are about 7,000 surnames in English on Wikidata.
- Use surnames as keys in first_author to find papers in the ADS database where the person who shares the same surname is in the first place of the paper.
- Extract DOI information from the paper and try to find an item on Wikidata
- If a DOI exists in ADS, and
- There's no article containing that DOI on Wikidata.
- Create an item of a title in the scholarly article (optionally author items) on Wikidata and add statement and statement-related properties to the item.
- There's an article containing that DOI on Wikidata.
- ADSBot English Statement will handle this situation.
- There's no article containing that DOI on Wikidata.
- If a DOI isn't available in ADS
- Extract title information from ADS and compare the title with an itemLabel (title) on Wikidata in a consent match ratio.
- If a title already exists on Wikidata
- ADSBot English Statement will handle this situation.
- If there's no such title on Wikidata
- Create an item of that title in the scholarly article (optionally author items) on Wikidata and add statement and statement-related properties to the item.
- If a title already exists on Wikidata
- Extract title information from ADS and compare the title with an itemLabel (title) on Wikidata in a consent match ratio.
- If a DOI exists in ADS, and
Notes:
- For those who are curious about what statements will be added to Wikidata from the ADS database, there's an item listing that: https://www.wikidata.org/wiki/Q112684896
- There're about 47 DOIs of 50 articles in the ADS database, assuming the DOI-in-articles ratio. The title exists in every paper in the ADS database.
- Consent match ratio: If the title of a paper contains special characters, use difflib from SequenceMatcher in Python to compare two titles above a similarity constant, say, >=0.8.
- Original thoughts come from Pathway 1 on a diagram drafting on Wikimedia Phabricator if anyone's interested: https://phab.wmfusercontent.org/file/data/lnlj5477majaglrd4eas/PHID-FILE-gidyiuwdukmtjap42zgi/Approach_to_Surnames_%282%29.png
- This bot runs regularly in the case a new surname is added to Wikidata.
- From my estimation, there will be approximately 290,000 articles that will be added to Wikidata from this bot run. 290,000 comes from 6994 (surnames in English on Wikidata) * 5 (estimated authors share the same surname) * 10 (estimated average paper per author) * (1 - (13300000 / 749103 / 100)) (percentage of non-existent articles of ADS on Wikidata, 13300000, total articles in ADS and 749103, articles with a value of ADS bibcode on Wikidata)
ADSBot English Statement[edit]
ADSBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Feliciss (talk • contribs • logs)
Task/s: Adding missing statements and statement-related properties to existing scholarly articles on Wikidata from the ADS database. Part of Outreachy Round 24.
Code: values_from_ads_to_paper_on_wiki.py
Function details: --Feliciss (talk) 08:16, 15 July 2022 (UTC)
- Search all surnames in English from Wikidata. There are about 7,000 surnames in English on Wikidata.
- Use surnames as keys in first_author to find papers in the ADS database where the person who shares the same surname is in the first place of the paper.
- Extract DOI information from the paper and try to find an item on Wikidata
- If a DOI exists in ADS, and
- There's no article containing that DOI on Wikidata.
- ADSBot English Paper will deal with this situation
- There's an article containing that DOI on Wikidata.
- Check if there're values (e.g. page(s), volume, issue, etc.) in the ADS database while the value(s) are not presented in the scholarly articles on Wikidata
- Add these values from the articles in ADS to the same articles on Wikidata
- Check if there're values (e.g. page(s), volume, issue, etc.) in the ADS database while the value(s) are not presented in the scholarly articles on Wikidata
- There's no article containing that DOI on Wikidata.
- If a DOI isn't available in ADS
- Extract title information from ADS and compare the title with an itemLabel (title) on Wikidata in a consent match ratio.
- If a title already exists on Wikidata
- Check if there're values (e.g. page(s), volume, issue, etc.) in the ADS database while the value(s) are not presented in the scholarly articles on Wikidata
- Add these values from the articles in ADS to the same articles on Wikidata
- Check if there're values (e.g. page(s), volume, issue, etc.) in the ADS database while the value(s) are not presented in the scholarly articles on Wikidata
- If there's no such title on Wikidata
- ADSBot English Paper will deal with this situation
- If a title already exists on Wikidata
- Extract title information from ADS and compare the title with an itemLabel (title) on Wikidata in a consent match ratio.
- If a DOI exists in ADS, and
Notes:
- For those who are curious about what statements will be added to Wikidata from the ADS database, there's an item listing that: https://www.wikidata.org/wiki/Q112684896
- There're about 47 DOIs of 50 articles in the ADS database, assuming the DOI-in-articles ratio. The title exists in every paper in the ADS database.
- Consent match ratio: If the title of a paper contains special characters, use difflib from SequenceMatcher in Python to compare two titles above a similarity constant, say, >=0.8.
- Original thoughts come from Pathway 1 on a diagram drafting on Wikimedia Phabricator if anyone's interested: https://phab.wmfusercontent.org/file/data/lnlj5477majaglrd4eas/PHID-FILE-gidyiuwdukmtjap42zgi/Approach_to_Surnames_%282%29.png
- This bot runs regularly in the case a new surname is added to Wikidata.