User:Feliciss/Sandbox

From Wikidata
Jump to navigation Jump to search


ADSBot English Paper[edit]

ADSBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Feliciss (talkcontribslogs)

Task/s: Importing scholarly articles from ADS database to Wikidata, by creating Wikidata Item of a scholarly article (optionally author items) and adding statements and statements-related properties to the item. Part of Outreachy Round 24.

Code: import_papers_from_ads.py

Function details: --Feliciss (talk) 08:16, 15 July 2022 (UTC)

  • Search all surnames in English from Wikidata. There are about 7,000 surnames in English on Wikidata.
  • Use surnames as keys in first_author to find papers in the ADS database where the person who shares the same surname is in the first place of the paper.
  • Extract DOI information from the paper and try to find an item on Wikidata
    • If a DOI exists in ADS, and
      • There's no article containing that DOI on Wikidata.
        • Create an item of a title in the scholarly article (optionally author items) on Wikidata and add statement and statement-related properties to the item.
      • There's an article containing that DOI on Wikidata.
        • ADSBot English Statement will handle this situation.
    • If a DOI isn't available in ADS
      • Extract title information from ADS and compare the title with an itemLabel (title) on Wikidata in a consent match ratio.
        • If a title already exists on Wikidata
          • ADSBot English Statement will handle this situation.
        • If there's no such title on Wikidata
          • Create an item of that title in the scholarly article (optionally author items) on Wikidata and add statement and statement-related properties to the item.


Notes:

  1. For those who are curious about what statements will be added to Wikidata from the ADS database, there's an item listing that: https://www.wikidata.org/wiki/Q112684896
  2. There're about 47 DOIs of 50 articles in the ADS database, assuming the DOI-in-articles ratio. The title exists in every paper in the ADS database.
  3. Consent match ratio: If the title of a paper contains special characters, use difflib from SequenceMatcher in Python to compare two titles above a similarity constant, say, >=0.8.
  4. Original thoughts come from Pathway 1 on a diagram drafting on Wikimedia Phabricator if anyone's interested: https://phab.wmfusercontent.org/file/data/lnlj5477majaglrd4eas/PHID-FILE-gidyiuwdukmtjap42zgi/Approach_to_Surnames_%282%29.png
  5. This bot runs regularly in the case a new surname is added to Wikidata.
  6. From my estimation, there will be approximately 290,000 articles that will be added to Wikidata from this bot run. 290,000 comes from 6994 (surnames in English on Wikidata) * 5 (estimated authors share the same surname) * 10 (estimated average paper per author) * (1 - (13300000 / 749103 / 100)) (percentage of non-existent articles of ADS on Wikidata, 13300000, total articles in ADS and 749103, articles with a value of ADS bibcode on Wikidata)




ADSBot English Statement[edit]

ADSBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Feliciss (talkcontribslogs)

Task/s: Adding missing statements and statement-related properties to existing scholarly articles on Wikidata from the ADS database. Part of Outreachy Round 24.

Code: values_from_ads_to_paper_on_wiki.py

Function details: --Feliciss (talk) 08:16, 15 July 2022 (UTC)

  • Search all surnames in English from Wikidata. There are about 7,000 surnames in English on Wikidata.
  • Use surnames as keys in first_author to find papers in the ADS database where the person who shares the same surname is in the first place of the paper.
  • Extract DOI information from the paper and try to find an item on Wikidata
    • If a DOI exists in ADS, and
      • There's no article containing that DOI on Wikidata.
        • ADSBot English Paper will deal with this situation
      • There's an article containing that DOI on Wikidata.
        • Check if there're values (e.g. page(s), volume, issue, etc.) in the ADS database while the value(s) are not presented in the scholarly articles on Wikidata
          • Add these values from the articles in ADS to the same articles on Wikidata
    • If a DOI isn't available in ADS
      • Extract title information from ADS and compare the title with an itemLabel (title) on Wikidata in a consent match ratio.
        • If a title already exists on Wikidata
          • Check if there're values (e.g. page(s), volume, issue, etc.) in the ADS database while the value(s) are not presented in the scholarly articles on Wikidata
            • Add these values from the articles in ADS to the same articles on Wikidata
        • If there's no such title on Wikidata
          • ADSBot English Paper will deal with this situation


Notes:

  1. For those who are curious about what statements will be added to Wikidata from the ADS database, there's an item listing that: https://www.wikidata.org/wiki/Q112684896
  2. There're about 47 DOIs of 50 articles in the ADS database, assuming the DOI-in-articles ratio. The title exists in every paper in the ADS database.
  3. Consent match ratio: If the title of a paper contains special characters, use difflib from SequenceMatcher in Python to compare two titles above a similarity constant, say, >=0.8.
  4. Original thoughts come from Pathway 1 on a diagram drafting on Wikimedia Phabricator if anyone's interested: https://phab.wmfusercontent.org/file/data/lnlj5477majaglrd4eas/PHID-FILE-gidyiuwdukmtjap42zgi/Approach_to_Surnames_%282%29.png
  5. This bot runs regularly in the case a new surname is added to Wikidata.