User:Lirazelf/Learning

From Wikidata
Jump to navigation Jump to search

This page is for keeping track of the areas I'm working on, and documenting workflows so that I can reproduce them in the future.


Scottish Poetry Library database[edit]

Mix'n'match Scottish Poets Library - checking against existing entries, creating new ones. Using n/a where no description available, only photo. This is to play on safe side for notability, although this may be unnecessary. Data originally imported by user:Pigsonthewing

Next steps - SPARQL query for all items using a Scottish Poetry Library ID

Sheet for potential new statements, QuickStatements for creating those.

Openplaques in Glasgow[edit]

  • Downloaded openplaques info from the data page of open plaques, as csv
  • Imported into google sheets, taking only those from glasgow
  • update (old data dump) with the extras / new ones from the website
  • Fill in spreadsheet in Google Sheets, enable Wikipedia / Wikidata add on.
  • Open new sheet in sheets, and use the following repeated to create commands for input into QuickStatements & bulk create new items.
CREATE
LAST Len "Commemorative plaque to William Thomson" label in english label value
LAST P31 Q721747 instance of commemorative plaque
LAST P17 Q145 Country UK
LAST P131 Q55934339 located in the administrative terri Glasgow City
LAST P131 Q22 located in the administrative terri Scotland
LAST P1893 "1893" Open plaques ID value
  • Copy & paste into QuickStatements v2, Import V1 Commands, Run.
  • Copy Q numbers of newly-created items back into spreadsheet for easier user in QuickStatements later, woohoo.

Entering coordinates through QuickStatements[edit]

Location coordinates in the form of @LAT/LON, with LAT and LON as decimal numbers.

Example: Q3669835 TAB P625 TAB @043.26193/010.92708
  • Spreadsheet with decimal coordinate values from OpenPlaques
  • use formula =CONCATENATE(E42,F42,G42) where E42 is @ symbol
  • enter in / manually
  • Q82545367 P625 @55.86758/-4.27542
  • paste into Quickstatements

Entering images through QuickStatments[edit]

Q82552101 P18 "Plaque on wall of Marks and Spencer, Virginia Street - geograph.org.uk - 1080941.jpg"

Legacies of British Slave ownership[edit]

SPARQL query for all items using P3023

The Archers & related[edit]

Characters in The Archers[edit]

  • link to SPARQL query here
  • Fictional character replaced / deprecated - fictional human

Places located in Borsetshire[edit]

  • link to SPARQL query here

Instances of silent characters[edit]

  • link to SPARQL query here

Adding Political Ideology / Jacobitism[edit]

  • Use PetScan to identify pages in a category
  • Depth is how far down the subcategories you go
  • add wikidata in the option list
  • paste results into notepad as csv
  • upload into excel as csv
  • manually remove any inappropriate items
  • use this list to add correct P and Q numbers, plus source imported from English Wikipedia
  • paste into QS

SPARQL query: https://w.wiki/bM9

Queries I want to keep[edit]

Handy things[edit]

  • "different from" - when items have similar names, place P1889 on each item reffing the other Q number
  • How to find list of articles in a category that don't have a lead image: https://meta.wikimedia.org/wiki/Wikipedia_Pages_Wanting_Photos/Resources
  • asterisks in queries make things slower - try using depth instead (eg:   ?person1 wdt:P19/wdt:P131?/wdt:P131? wd:Q22 . - shows place of birth / located in admin terr within scotland, depth of 2)
  • FILTER NOT EXISTS - takes ages compared to MINUS, if can use that one instead
  • SELECT DISTINCT - takes out duplicates

Scheduled monuments in Scotland without an image on wikipedia:[edit]

https://w.wiki/3hrJ

Queries it would be useful to know how to do[edit]

  • museums in Scotland, where they are, images, wiki article if we have one - (cobble together from MPs without wiki pages and the writers born in scotland queries) - attempted https://w.wiki/3hm3 here but is timing out. This one works, seems to be the coordinates that slow things up? https://w.wiki/3h6M - THERE IS SOME WEIRDNESS HERE WITH ?ITEM AND ?ARTICLE THAT I MESSED UP. -
    • You had a typo - OPTIONAL {?Musem wdt:P625 ?coord } - so WDQS went off to find all the coords for the ?Musem variable = millions of them.
    • More generally, you're relying on the ?Museum having a location (P276). Most geolocatable Scottish items do not have a P276, so it's generally better to identify 'is in Scotland' by using located in the administrative territorial entity (P131) - either ?Museum wdt:P131* wd:Q22 . (it's in Scotland) or ?Museum wdt:P131/wdt:P31 wd:Q15060255 . (it's in a Scottish local authority).
    • I've presumed you want museum type things and not just museums - everything in the Museum class tree - and so I've added /wdt:P279* to the query, and DISTINCT to the SELECT so that we do not get multiple rows for a ?Museum which has multiple qualifying P31 values.
    • Then, when you're using property paths of the form wdt:P131* or wdt:P131/wdt:P31 it often speeds things up enormously to add a hint hint:Prior hint:gearing "forward".. If you consider ?Museum wdt:P131* wd:Q22. the report engine can start at ?Museum follow the P131 path to see if it gets to Q22 (which is 'forward') or it can start at wd:Q22 and work backwards through the set of ^wdt:P131 values to see if ?Museum is found (which is 'backward'). Generally, WDQS defaults to 'backward' (citation needed), but 'forward' is often the much better choice.
    • Finally, for the sort of query where you want to select a bunch of stuff (museum type things in Scotland) and then ask a bunch of questions about that set (is there a coord/wikipedia article &c &c) it's often useful to use a names subquery, such that you select the bunch of stuff in query A, and the examine it in Query B. The advtantage is that the B questions only get asked of the items that satisfy A, rather than getting asked of all the items that the report considers.
    • So all of that, for this report, might lead us to is this any better?
  • museums in scotland without a wikipedia page - doesn't show subtypes of museums (eg art galleries) and times out - https://w.wiki/3h6V - is this any better?
  • Museums & subclasses of museums in scotland..... times out: https://w.wiki/3h6u - is this any better?
  • bubble map of characters/people/ familial relationships - don't have data to do the archers stuff yet. probably some writers?
  • scheduled monuments in Scotland for which we don't have a wikipedia page - use the MPs one, with identifiers from HES?

Building on a simple query....[edit]

Politicians in a relationship / both politicians[edit]

Drawing the Scottish comics industry[edit]

  • could be done with titles
  • publishing houses
  • authors
  • artists
  • bubble map....?
  • what database to use?
  • Modelling currently might be a challenge, compare Q5203542 (DC Thompson, Dundee) and Q373933 (Dark Horse Comics) - hard to search right now for comics alone
  • will there be publishers who do comics only as well as publishers who do lots of other types of literature? or just the former?

Scots wiki articles that use fix scots, group into category[edit]

  • Got the list of Q numbers from PetScan
  • Put PetScan list into Excel, concatenated Q numbers
  • Put Q numbers into query as values for ?item
  • tried to get it to show me the instance of values
  • get unknown error
  • waaaaaaaahhhh
  • >> VALUES used in this way works up to about 1500 wd items in the VALUES list.
  • This is it up to about 1500 items.
  • But how to display this?
  • Need to group now.
  • use ORDER BY ?instanceofLabel (thanks Martin)
  • syndicated query?

pasted below from Request a Query

SELECT ?item ?itemLabel ?article ?P31 ?P31Label where
{
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "sco.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "categorymembers";
                     mwapi:gcmtitle "Category:Pages that wis written by a body that's mither tongue isna Scots" ;         # specifically here
                     mwapi:gcmprop "ids|title|type";
                     mwapi:gcmlimit "max".
     # out
     ?article wikibase:apiOutput mwapi:title.        # en-wikipedia article / category name
     ?item wikibase:apiOutputItem mwapi:item.        # wikidata QId for the article
    }
  filter(bound(?item))
  optional {?item wdt:P31 ?P31 . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
One row per article:
SELECT ?item ?itemLabel ?article (group_concat(?P31Label;separator="; ") as ?P31) where
{
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "sco.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "categorymembers";
                     mwapi:gcmtitle "Category:Pages that wis written by a body that's mither tongue isna Scots" ;         # specifically here
                     mwapi:gcmprop "ids|title|type";
                     mwapi:gcmlimit "max".
     # out
     ?article wikibase:apiOutput mwapi:title.        # en-wikipedia article / category name
     ?item wikibase:apiOutputItem mwapi:item.        # wikidata QId for the article
    }
  filter(bound(?item))
  optional {?item wdt:P31 ?P31 . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                         ?item rdfs:label ?itemLabel .
                         ?P31 rdfs:label ?P31Label . }
} group by ?item ?itemLabel ?article
Try it!
SELECT ?item ?itemLabel ?article ?link ?P31 ?P31Label WHERE
{
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "sco.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "categorymembers";
                     mwapi:gcmtitle "Category:Pages that wis written by a body that's mither tongue isna Scots" ;         # specifically here
                     mwapi:gcmprop "ids|title|type";
                     mwapi:gcmlimit "max".
     # out
     ?article wikibase:apiOutput mwapi:title.        # en-wikipedia article / category name
     ?item wikibase:apiOutputItem mwapi:item.        # wikidata QId for the article
    }
  FILTER BOUND(?item)
  OPTIONAL {?item wdt:P31 ?P31 . }
  BIND (URI(CONCAT("https://sco.wikipedia.org/wiki/", ENCODE_FOR_URI(?article))) AS ?link)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

This should in theory be able to be run through Listeria, but listeria doesn't like big items like country, which this list includes. Hmmmm.

Odds & ends[edit]

Getting stuff to group concat[edit]

#irish women scientists without wikipedia pages, and what their particular occupation is, with the various occupations on one line. 
SELECT ?item ?itemLabel (group_concat(?occLabel ;separator=", ") as ?occupations)

WHERE {
  ?item wdt:P27 wd:Q27;
        wdt:P21 wd:Q6581072 ;
        wdt:P106/wdt:P279* wd:Q901 .
  ?item wdt:P106 ?occ 
       
       MINUS { ?article schema:about ?item }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                           
                            ?occ rdfs:label ?occLabel . #not sure what this bit actually does
                           ?item rdfs:label ?itemLabel . #i have no idea why these two lines are important but it doesn't work without them.
                           }
} 
group by ?item ?itemLabel
Try it!


Pride & prejudice, looking at statement nodes and qualifiers[edit]

should work but doesn't, no data

#productions of Pride and Prejudice and who played Elizabeth Bennet and what kind of thing they were
SELECT ?production ?productionLabel ?actress ?actressLabel (group_concat(?instanceofLabel ;separator=", ") as ?typeofproduction)

WHERE {
  
  ?production wdt:P144 wd:Q170583 ; # thing based on Pride & Prejudice
              p:P161 [ ps:P161 ?actress; pq:P453 wd:Q2223341 ] . # cast member has character role elizabeth bennet
       ?production wdt:P31 ?instanceof     
                               
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
                              ?instanceof rdfs:label ?instanceofLabel . # i don't know what this bit does except that it's needed
                            ?production rdfs:label ?productionLabel .
                            ?actress rdfs:label ?actressLabel .
                            
                         }
}

group by ?production ?productionLabel ?actress ?actressLabel
Try it!

Nicolas Vigneron's query showing graph of connections between people in Pride & Prejudice: https://w.wiki/4kup

Things with Wikisource links[edit]

Trying to make a query that shows me Things with Wikisource links in more than one language.

  • Instances of written works with WS links in en & fr: https://w.wiki/4$FV (80 results)
  • Instances of literary works with WS links in en & fr: https://w.wiki/4$Fg (460 results)
  • Instances of legislation with WS links in en & fr: https://w.wiki/4$Fn (4 results)
  • NOTE - "next step" not actually needed, it does that anyway

Tagishsimon https://w.wiki/4$Gg - gets it to stop timing out using %i, this is interesting. Named subqueries! Wikidata:SPARQL query service/query optimization#Named subqueries (realising that I had heard about named subqueries before, I just hadn't taken it in....)

Gender gap helpful queries[edit]

random stuff and biscuits[edit]

Types of biscuits from different countries i fell down a rabbit hole, what can i say - involves looking for subclass of, not instance of. as you do.

people with articles on scots wiki whose birth month is the same as it is now[edit]

https://w.wiki/7uVK