Wikidata talk:SPARQL query service/queries/Archive/2015

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Comments ?

@Atlasowa, Jura1, Hsarrazin, TomT0m: The first section of the page ("Understanding SPARQL") is an attempt to try to introduce SPARQL enough to get ordinary Wikidata editors up to the point where they can easily follow the examples at mw:Wikibase/Indexing/SPARQL Query Examples - since I feel that page may be more pitched at people who know SPARQL but don't necessarily know so much about Wikidata.

I hope I've now achieved a reasonably workable first draft, but would very much welcome comments (and improvements, and rewrites). I hope it works, but I may have written too much -- I don't want to scare people off SPARQL when really, once you've got the basic idea, it can be pretty simple. That simplicity is what I'd like people to come away with, so I hope I haven't hidden it by over-writing. One section that I think at the moment really does need some improving (mostly streamlining and slimming down) is the one about namespaces, so if anyone wants to have a go at that, I would be very grateful.

Otherwise, please do let me know what you think, whether you think it works, and what could or should be improved.

Thanks, Jheald (talk) 19:48, 21 September 2015 (UTC)

Have more pictures, start more concretely, introduce the query editor sooner

User:GerardM offered the following suggestions, by email:

One thing people need to be told is how to invoke SPARQL. It is the first thing to know. There are so many SPARQL's that it is not obvious.

There are other issues. It takes too much for granted and consequently only the people who get past a certain point may understand it and even use it. Many of the assumptions are assumptions for SPARQL and for many people it will only be an engine and not a tool.

A tool is what harnesses an engine to produce a result. Most people do not care for p31:q5 .. They care for "instance of" "human" or "is een" "mens". When you assume prior knowledge, refer to it at the start. You can help yourself with a liberal sprinkling of images, that will aid understanding. It helps people understand that SPARQL is not English.

The Query editor is key to understanding, it is a tool and it demonstrates SPARQL.. It is more important at the beginning than many other fine details. Once people are comfortable with existing queries, when they have their examples on how to modify them, they will be at the stage where making their own queries becomes something that feels like realistic.

New more concrete section (Madonna's spouses) added at the top, to try to start to address Gerard's spot-on comments. Jheald (talk) 17:18, 22 September 2015 (UTC)
Found it under Wikidata:SPARQL_query_service/queries#Understanding_SPARQL. Screenshot is good. Why not give it a more inviting subsection title like =Hey wikidata, give me a list of the spouses of Madonna!= --Atlasowa (talk) 11:09, 23 September 2015 (UTC)
:-) I might leave that to Wiri Jheald (talk) 13:37, 24 September 2015 (UTC)

Question (last items)

With LIMIT 100, I can get the first 100 items. How do I get the last 100? Or at least entries with a QID beyond Q20000000? --- Jura 17:35, 24 September 2015 (UTC)

@Jura1: To get the last 100, first run a query to find out the total number of items, using COUNT; then, to see the last 100 use LIMIT 100 OFFSET n, where n is the total number - 100. For example tinyurl.com/q35bxza. It probably is possible to do the COUNT and the LIMIT in the same query.
Alternatively, to see entries with a QID beyond Q20000000, you can I think put FILTER (xsd:integer(STRAFTER(str(?item), "Q")) > 20000000) at the end of the SELECT block -- eg tinyurl.com/p7kbajd for some paintings. Jheald (talk) 21:13, 24 September 2015 (UTC)
Though this may be the neatest approach to get the most recent 100: tinyurl.com/o9caom4 -- ORDER BY DESC(?qid) LIMIT 100 Jheald (talk) 21:35, 24 September 2015 (UTC)
Thanks. That worked for me (Special:Diff/253569079). I really should get to learn sparql syntax. --- Jura 07:23, 25 September 2015 (UTC)
Nice. It only just ran within the time-limit though. (90 seconds). Can I suggest putting the query on a sub-page, as per Wikidata:SPARQL_query_service/queries#Showing_only_query_links, which will make it a lot easier to edit and discuss, if you should want to change it? Best, Jheald (talk) 10:31, 25 September 2015 (UTC)
I added a filter to have it skip the first items. BTW, I prefer on-page on-site storage. Personally, I'd avoid all those tinyurls. --- Jura 11:35, 29 September 2015 (UTC)
@Jura1: I agree that the tinyurls are not the best. What I meant though was to move the query to a sub-page, invoked like this diff. I suppose, on the downside, it means you have to watchlist the sub-page separately, but I think it makes everything a bit more readable. What do you think? Jheald (talk) 12:07, 29 September 2015 (UTC)
I think nothing is worse than using tinyurl --- Jura 05:25, 28 October 2015 (UTC)

Count + year

Hello,

I know how to extract the year from a date and count something, but I don't know how to do both. I would like to count the number of people buried in the Père-lachaise cemetery (P119 Q311) by date of death (P570). Pyb (talk) 14:51, 25 September 2015 (UTC)

@Pyb: Something like this: tinyurl.com/ne39vru ? Jheald (talk) 15:18, 25 September 2015 (UTC)
@Jheald: Oh great. thx. Pyb (talk) 15:30, 25 September 2015 (UTC)

Property name

Is it possible to retrieve name (label) for ?p property retrieved as p:Pnnn in this query tinyurl.com/okfa2ld? The service SERVICE wikibase:label converts it to link to Wikidata only. Paweł Ziemian (talk) 13:12, 3 October 2015 (UTC)

@Paweł Ziemian: Hi Paweł !
Yes it is possible, but there are two things that need to be done.
First, SERVICE wikibase:label only finds labels for things in the wd: namespace, so you need to join into the query the underlying entity for the property in the wd: namespace. This can be done by adding the assertion line ?prop wikibase:claim ?p into the query -- the special predicate wikibase:claim connects the wd: namespace entity for the property to its p: namespace respresentation.
Unfortunately, the built-in query optimizer finds the assertion ?prop wikibase:claim ?p very very seductive as a place to start -- it sees that there are less than 2500 solutions for this assertion (there being less than 2500 properties), so it thinks that starting the query here will be a good way to focus the query on a really small solution set, right from the start. Unfortunately the next statement, that will connect ?p to the rest of the query, is ?q ?p ?statement, which matches every single property statement in the database, blowing the size of the solution set wide open.
Therefore, at least for the present (because in time Blazegraph may be able to fix this), if a query includes an assertion like ?prop wikibase:claim ?p, the second thing which it is necessary to do is to turn off the built-in automatic query optimizer, and instead try to work out by hand the best sequence for the assertions in the query. The key principle is to try to narrow down the size of the running solution set as early as possible, to as small a set as possible.
Here's a reordering I made of your query, with a line hint:Query hint:optimizer "None" to turn off the built-in optimizer, and including the ?prop wikibase:claim ?p assertion:
tinyurl.com/pp3prma
I'm telling the query engine to start with a day-specific date, then straight away to try to narrow that down even further by filtering on the day and the month and the year, then to see which statements apply to such dates, and then to filter for those that correspond to items with entries on pl-wiki.
There's a bit more about the query optimizer on the page Wikidata:SPARQL_query_service/query_optimization, including how to get a report on the query execution including how big the solution set is at each step, and how long the engine is spending on it; but really the main example there (in the "A query that has difficulties" section) is exactly the same case as you have encountered above -- how to add a property label, and to do it without the query timing out.
Hope this helps. All best, Jheald (talk) 22:21, 3 October 2015 (UTC)
Thanks a lot. Paweł Ziemian (talk) 13:35, 4 October 2015 (UTC)

Query for non-existent labels

Hi experts! Is it possible to query for items that don't have a label in a certain language? For example, I'm looking for first names without label in English. I tried it by using (with the standard prefixes):

SELECT ?name ?name_label WHERE {
  ?name wdt:P31/wdt:P279* wd:Q202444 .
  optional {
  	?name rdfs:label ?name_label filter (!lang(?name_label) = "en")
  }
} LIMIT 10

However, this leads to a timeout. Other ways don't seem to work for me. Has anyone an idea? Yellowcard (talk) 18:16, 29 October 2015 (UTC)

Hi, there are a couple of idioms in SPARQL to filter for something not being the case. An older way to do it is
SELECT ?name ?name_label WHERE {
  ?name wdt:P31/wdt:P279* wd:Q202444 .
  optional {
  	?name rdfs:label ?name_label filter (lang(?name_label) = "en")
  }
  filter(!bound(?name_label))
} LIMIT 10
Try it!
However, as you perhaps found, this appears to be timing out.
With SPARQL 1.1 the same query can also be written in an alternative way
SELECT ?name ?name_label WHERE {
  ?name wdt:P31/wdt:P279* wd:Q202444 .
  FILTER NOT EXISTS {?name rdfs:label ?name_label filter (lang(?name_label) = "en")}
} LIMIT 10
Try it!
This appears to execute successfully; though at 3.6 seconds is still taking rather longer than perhaps one would think it should.
It ought to be possible to make this query run a lot faster (and others involving similar filtering); I think the Blazegraph engine supports various hints to the query optimiser that can help handle queries with negation, that I would very much like to understand better, but for the moment this is the best that I can offer. Jheald (talk) 23:36, 29 October 2015 (UTC)
@Jheald: For now, this helps a lot. Thank you!
@Jura1: Looks interesting in the first view. However, I did not fully understand the functionality of the tool. Is there any documentation or could you explain in short words, maybe? Also, thank you! Yellowcard (talk) 10:13, 30 October 2015 (UTC)
It finds all articles matching the query that don't have labels in one language, but do in the another. Once the labels are translated, they can be uploaded with quick_statements.php. If you use it on items for people, generally, the labels wont need translation and the output can be used directly in QuickStatements. --- Jura 10:20, 30 October 2015 (UTC)

Migrate Query Examples from Mediawiki.org

Would it make sense to migrate the query examples from mediawiki.org to wikidata.org?

If you do so I could change the query service from fetching the example queries from there.

Cheers, Jonas

It may make sense but I'm not sure what is the best namespace for it... There was talk about having query as official data type or maybe namespace in wikidata, maybe we should see outcome for that discussion Or have space in Help namespace?

--Smalyshev (WMF) (talk) 00:18, 30 December 2015 (UTC)

Requested query

Can somebody write a query to show items without any claim (especially without P18P31(edited)), but with sitelink in specific wikipedia? I tried myself to create this query, but unsuccesfull. --XXN, 01:07, 20 December 2015 (UTC)

Not sure if that works. If the site you are interested it is listed on Wikidata:Database reports/without claims by site that might help you. Otherwise we can add it.
Specifically for P18, you could try some combination with "link[zzwiki]" on https://tools.wmflabs.org/fist/wdfist/ sample for people and ocwiki: [1] --- Jura 01:37, 20 December 2015 (UTC)
Thanks for response. Wikidata:Database reports/without claims by site is usefull for statistical purposes, but i need to find exactly these empty items to begin work on them.
Uh, i messed up... ”instance of = P31” and more important is to add first P31 to items; i edited a bit original message. --XXN, 11:56, 20 December 2015 (UTC)
It's meant to be more than that.
Sample: for iswiki at Wikidata:Database reports/items without claims categories/iswiki, there is a category named "Svarfaðardalur".
If you click on the Autolist link and then "Run" button, you will find those items in that category that don't have any instance of (P31) or subclass of (P279).
The actual number of items list might be higher because items could have other properties (e.g. coordinates) and/or because the database table at Wikidata that counts claims isn't complete yet. --- Jura 12:07, 20 December 2015 (UTC)
@XXN: in case you are interested, I added rowiki. --- Jura 18:26, 20 December 2015 (UTC)
Nice. Thank you, Jura. --XXN, 18:50, 20 December 2015 (UTC)

Spouses of Madonna

I get only Sean Penn as result. No Guy Ritchie. Why? --Jobu0101 (talk) 23:40, 29 December 2015 (UTC)

User:Laboramus's bot seems to interfere with User:Smalyshev (WMF)'s query service. --- Jura 23:50, 29 December 2015 (UTC)
How do you see that? --Jobu0101 (talk) 00:00, 30 December 2015 (UTC)
Have a look at the Guy Ritchie item. --- Jura 00:06, 30 December 2015 (UTC)
I wondering if the result should be the way it is.
For the question "who are the spouses of Madonna?", one could argue that the query should be done the other way round. That does give both. --- Jura 00:19, 30 December 2015 (UTC)
The query result is absolutely correct. The problem is it doesn't answer the question you think it answers - you think it says "who is the current spouse of Madonna?" but it is more like "who is the current spouse of Madonna, or, in case she doesn't have one, who were ones in the past?". Also yes, since "current spouse" and "past spouse" are not the same, it's not symmetrical so you have to watch the direction. --Laboramus (talk) 20:26, 31 December 2015 (UTC)
The query is using wdt:, which is is the "preferred" item - for spouse, usually current spouse. Unless the person is a polygamist - which Madonna AFAIK isn't - that would be one person. If you want all spouses, use p:P26/ps:P26 instead of wdt:P26. I'll fix the example. --Smalyshev (WMF) (talk) 00:24, 30 December 2015 (UTC)
In fact, it's trickier since Madonna (Q1744) does not have current spouse, so both historical husbands have best rank, but Guy Ritchie (Q192990) does have current spouse (Jacqui Ainsley (Q1068279)) so the relationship there is not symmetric towards preferred status. Tricky stuff... So if you want all, use p:/ps:, if you want current, use wdt: but be aware that you may also get past ones if there's no current (can be filtered out by checking for end qualifier). --Smalyshev (WMF) (talk) 00:38, 30 December 2015 (UTC)
Maybe Q34851 works better as a initial sample. No risk of remarriage. --- Jura 00:46, 30 December 2015 (UTC)
Same problem there (once Wikidata is complete) Q464810 is still married. --- Jura 00:56, 30 December 2015 (UTC)
It's a general issue - we use preferred status to indicate something is current. However, wdt: is not "preferred", it's "best" - so if you use it to mean "preferred" it may not work as expected. I guess theoretically one could have "no value" claim as preferred and that would solve the problem, but that looks extremely ugly and formalistic. I know it's not the most intuitive situation, but so far I don't have better alternative. So I guess for now you just need to know what's going on or use p:/ps: combo to always retrieve full sets. Suggestions on improvement welcome, of course. --Laboramus (talk) 20:22, 31 December 2015 (UTC)

Equality in SPARQL

Is it possible in SPARQL to check for equality like ?instance == wd:Q5? --Jobu0101 (talk) 12:58, 30 December 2015 (UTC)

@Jobu0101: To see whether two expressions evaluate to the same item, use the function sameTerm -- for example sameTerm(?instance, wd:Q5) -- which evaluates to TRUE or FALSE, and can be included for example in a FILTER clause.
You can also use = in a FILTER clause to see whether two expressions evaluate to the same value, as for example in filter (lang(?spouse_label) = "en"); but that is slightly less efficient, and only checks whether the expressions have the same value, not whether they evaluate to exactly the same item. Jheald (talk) 19:48, 30 December 2015 (UTC)
Thank you! I didn't do ?item wdt:P31 wd:Q5 because I wanted ?instance to carry the information. --Jobu0101 (talk) 21:18, 30 December 2015 (UTC)
Then you could do VALUES ?instance {wd:Q5 } . --- Jura 21:42, 30 December 2015 (UTC)
If you want to assign stuff to a variable, you can use BIND. --Laboramus (talk) 20:28, 31 December 2015 (UTC)