Wikidata:Contact the development team/Archive/2019/06

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

JSON dumps archives

Hello,

Having access to old Wikidata JSON dumps can be very useful, for example to dig into to the past of the project and to study its evolution (for instance, evolution of humans in Wikidata since 2017). Some old dumps are available at archive.org. Unfortunately, it seems that the synchronization to archive.org is broken since February 2019. Some dumps are also missing, for instance from March to October 2018.

  • Is there a plan to improve the synchronization to archive.org and avoid gaps?
  • Is there a way to recover missing dumps (for example, I don't have any of them in May and July 2018, and they are also unavailable on archive.org).
  • I have some of the missing dumps. How can I help to send them to archive.org?

Regards, Envlh (talk) 06:16, 25 May 2019 (UTC)

Hey Envlh, thanks for letting us know! The sync process is not organized by the development team. As far as we know, a volunteer is doing it. I'm going to investigate and find more information about the process and the people involved. I'll keep you up to date! Lea Lacroix (WMDE) (talk) 11:50, 27 May 2019 (UTC)
Hello @Envlh:, I contacted Hydriz who's taking care of sync with archive.org. They're in the process of re-writing the sync scripts and currently blocked on phab:T159930. They told me that the archiving process would be resumed as soon as possible. Feel free to get in touch if you want to help. Cheers, Lea Lacroix (WMDE) (talk) 08:14, 3 June 2019 (UTC)
Hello @Lea Lacroix (WMDE): Thanks a lot for your answer and your email! I didn't have the time to reply to it yet, but it will be useful :) — Envlh (talk) 21:08, 4 June 2019 (UTC)

problem with a chain of subclasses, subsubclasses,etc

Hello, I wanted to add the property category for alumni of educational institution (P3876) (category of alumni) to Category:Alumni of an École Normale Supérieure (Q17313391) (école normale supérieure), but I am told that among the problems there is a fact that Category:Alumni of an École Normale Supérieure (Q17313391) does not belong to the class of "institut de formation" (which is necessary for the property) or one of its sub-classes. But it does, although through a long chain : it belongs to "grande école" which belongs to "institut d'enseignement supérieur" which belongs, finally, to "institut de formation". Is it a local software bug or is it that the chain stops at subclass, and not at sub-sub-clas or sub-sub-sub class (which it obviously should NOT) ? Thank you very much, --Cgolds (talk) 18:31, 1 June 2019 (UTC)

@Cgolds: hmm, tu viens de tomber sur une piège un peu vicieux (il m'a fallu un peu de temps pour bien comprendre le problème). category for alumni of educational institution (P3876) est fait pour être utilisé sur un institut de formation spécifique mais école normale supérieure (Q135436) n'est pas vraiment un institut de formation, c'est une type (donc une classe) d'institut de formation qui comprend par exemple École Normale Supérieure de Lyon (Q10159) (qui elle est bien un institut). Le problème n'est donc pas que le logiciel ne va pas assez loin dans la chaîne (les chaînes sont parfois bien plus longues, cela ne pouvait donc pas venir de cela) mais plutôt que le premier maillon n'est pas celui attendu (école normale supérieure (Q135436) ne contient pas de P31 où démarrer la chaîne, seulement un P279).
Du coup, je suis désolé de cette non-réponse mais le message d'erreur est correct et la seule solution me semble être de ne pas utiliser cette propriété car elle n'est pas vraiment prévue pour ce genre de cas.
Cdlt, VIGNERON (talk) 12:47, 3 June 2019 (UTC)
PS: pour une prochaine fois, la page de discussion de la propriété ou Wikidata:Bistro me semblent plus appropriés comme endroits où commencer pour ce genre de question.
@VIGNERON: Merci beaucoup. Je n'ai pas moi-même choisi cette propriété, mais je comprends maintenant pourquoi ce n'est pas une bonne idée de l'utiliser là. Malheureusement, typiquement, il y a des wp où des catégories ont été attachées à des ensembles, pas les éléments spécifiques de ces ensembles, et du coup cela semblait naturel de pouvoir y rattacher cela (sémantiquement, disons, c'est bien correct, même si cela ne marche pas avec la construction dans WD). Je pensais que c'était sans doute un bug dans le software (longueur des chaines prises en compte), c'est pourquoi j'ai mis la question ici, désolée. Ce serait bien aussi que le développement permette de traiter des ensembles non comme méta-quelque chose, mais aussi comme quelque chose, mais je comprends que c'est prématuré. Merci encore, --Cgolds (talk) 14:05, 3 June 2019 (UTC)

SPARQL Endpoint - Labels are not returned using ORDER BY | ORDER BY ASC() Argument

I am trying to receive the result for the following query via Python or JavaScript-Code (using the code-generator on the QueryService page):

SELECT ?Literaturpreis ?LiteraturpreisLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?Literaturpreis wdt:P31 wd:Q378427.
}
ORDER BY ASC(?LiteraturpreisLabel)
LIMIT 10
Try it!

in both cases (python/javascript) instead of the label the QID is the value of the result-set object "LiteraturpreisLabel". If i change ORDER BY to DESC() it works fine, also without any ORDER BY argument. Thanks in advance! --Mfchris84 (talk) 09:35, 5 June 2019 (UTC)

I see that those items indeed do not have English labels, so the labels are correct. It is correct also that the missing labels are sorted before the present labels, since they are shorter strings, therefore go before longer actual labels. Smalyshev (WMF) (talk) 05:56, 6 June 2019 (UTC)
@Smalyshev (WMF): Thank you for the information and sorry for that - that's total correct, i don't know whats happend to my mind yesterday. Another question about sorting: Characters with diacritcs (like German Umlaute) are sorted after Z - is there something planned or something to do to sort it more appropriate? --Mfchris84 (talk) 07:09, 6 June 2019 (UTC)
@Mfchris84: Unfortunately, since we have string literals from every language together, we can not do language-specific search in this context, which means it'll be sorted by basic Unicode order. So for some languages, it would not match native dictionary, but for now that's a limitation we can not avoid. Smalyshev (WMF) (talk) 16:13, 6 June 2019 (UTC)
@Smalyshev (WMF), Mfchris84: As I understand it, on some SPARQL platforms (eg ARQ), something similar to the following may be possible:
SELECT ?Literaturpreis ?LiteraturpreisLabel ?LpSort2 WHERE {
  {
    SELECT ?Literaturpreis ?LiteraturpreisLabel WHERE {
      ?Literaturpreis wdt:P31 wd:Q378427.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
  }
  BIND(fn:normalize-unicode(?LiteraturpreispLabel, "NFKD") AS ?LpSort) .
  BIND(REPLACE(?LpSort, "\\p{NonspacingMark}", "") AS ?LpSort2) .
}
ORDER BY DESC(?LpSort2)
LIMIT 100
Try it!
This tries to split any characters with accents or diacritics into a combination of a pair of characters, then to throw the diacritic part away, storing the result in a new variable ?LpSort2 -- in most cases this should give something reasonably close to the sort order one probably wants; though it wouldn't, for example, sort o-umlaut and oe together.
Unfortunately this doesn't work on Blazegraph, as Blazegraph
  • doesn't appear to recognise \p{NonspacingMark} as a valid set of characters -- from the XPATH standard at 5.6.1.5 whether such a block name should be recognised or not should depend on "the version(s) of Unicode supported by the processor".
  • doesn't appear to support fn:normalize-unicode -- this XPATH function is available in ARQ, but is not mandated by the SPARQL 1.1 standard.
I also find the double-escape \\p required by Blazegraph to be a bit odd, and probably a bug, in this context; but that at least is something which can be worked round in this way. Jheald (talk) 17:25, 6 June 2019 (UTC)

Wikimedia RU articles

Hi! We created w:en:Wikipedian in residence article in ru.wikimedia.org wmru:Вики-резидент or chapter:ru:Вики-резидент (Why the chapter:ru command does not work here?). How to add this page to the Q3809586 (for other wiki-project)? Please help us! This also applies to our other articles on the popularization of the Wikimedia movement. Best Regards — Niklitov (talk) 11:19, 6 June 2019 (UTC) (WMRU)

Hello,
At the moment, the chapter wikis or other wikis that are not part of the sister projects are not connected to Wikidata, therefore one can't add interwikilinks.
I'm not sure if it was ever planned or discussed to connect the chapter wikis to Wikidata. Feel free to create a Phabricator task as a subtask of this one for tracking. Lea Lacroix (WMDE) (talk) 09:40, 7 June 2019 (UTC)

Optimize Wikibase handling of country items

As country items get larger, maybe it's worth looking into them from the developer side: ensure that they can be loaded and edited efficiently.

I mention country items, but it should probably apply to any item with lots of statements and incoming links.

From the contributors side, it's probably worth optimizing their content as well, thus Wikidata:Bot_requests#Optimize_format_of_country_items, but probably both are needed. --- Jura 17:09, 14 June 2019 (UTC)

We acknowledge that large items and the difficulty to load and edit them is a problem; unfortunately, we currently cannot focus on this topic.
As for the lag: FYI we're working on including the Query Service lag in the maxlag parameter for bots and tools, see phab:T221774. Lea Lacroix (WMDE) (talk) 12:24, 20 June 2019 (UTC)
Thanks for your feedback. I'm aware of some of that, but I think that editing them may have a disproportionate impact on lag and almost all recent edits haven't been done by bots (mostly useful edits). At least based on the explanation given about editing large items (Wikidata:Project_chat#MicrobeBot_updates) and my somewhat sketchy understanding of replication to client wikis. Maybe updates on these should be batched/bulked differently. --- Jura 13:22, 20 June 2019 (UTC)
Does large items produce deeper problems than user interface issues like huge lags just because item is big? --813gan (talk) 14:31, 20 June 2019 (UTC)

Q36020#P1472 warns so, but the link exists.--Roy17 (talk) 00:33, 15 June 2019 (UTC)

I too have been seeing this a lot in the last month. It looks like the constraint checker may not be picking up the information from the property page that the link should be looked for in the "Creator:" namespace. Jheald (talk) 17:54, 15 June 2019 (UTC)
Maybe the problem comes from the constraint? Does it need to have a "namespace" parameter somehow? Pinging @Jura1: as you worked on this property. Lea Lacroix (WMDE) (talk) 13:23, 18 June 2019 (UTC)
@Lea Lacroix (WMDE): It's got one (or appears to), and has had since July 2017 diff. The database report from 6 days ago is only reporting 8 violations, which looks correct. But spurious lightning flashes keep appearing on the actual item pages, eg Q36020#P1472 as Roy identified. So it looks like the constraint checker for the page might not be picking up the namespace parameter; or alternatively may have had a bad database access moment, which is not correcting. Jheald (talk) 14:18, 18 June 2019 (UTC)

Uncertain dates, that range across century boundaries

The topic was recently raised at Project Chat of how best to represent uncertain dates that range across century boundaries -- for example, a work with its inception (P571) given in a source as (1250-1325).

At the moment, if one gives the statement a value of 1300 with precision century, it is rendered as "13. century".

It occurred to me to wonder whether, when the central point value is between say 1275 and 1325 with precision century, would it be possible to instead render this as "1200s / 1300s" ? (Or, equivalently if easier, 13th/14th century). This would be a good solution to the issue raised in Chat.

Of course, currently there are a lot of statements that were entered as "13th century", to mean some time between 1200 and 1300, that are currently represented as '1300 - precision - century'. This is in fact awkward, because as a result they come up a bit earlier later than they probably should in WDQS searches ordered by date. A fix for this would be to change all values that are '1300 - precision - century' to be '1250 - precision - century' (unless qualifiers indicated something more specific); and to make this the default representation for a value entered as "13th century".

Any such change would need to discussed in a community RfC. But would this, from a technical perspective, be a change that would be relatively easy to implement? Jheald (talk) 12:12, 14 June 2019 (UTC)

Agree that an input of '13th century' should be saved as '1250 - precision - century'! This is what we half-implicitly recommend for manual input actually in Help:Dates#Inexact dates. --Marsupium (talk) 16:13, 21 June 2019 (UTC)

Query server lag increases

Is there something that could be done about it? It was at 0 earlier today, but is now around 2 hours. --- Jura 17:33, 15 June 2019 (UTC)

I guess it's related to this. See also a more general explanation from the team taking care of the Query Service. Lea Lacroix (WMDE) (talk) 15:30, 17 June 2019 (UTC)
Interesting read, thanks. If the things is somewhat fragile, I suppose we have to count on random lag for some time?
If the bot was blocked on Thursday, I suppose the increase (from zero) on Saturday isn't related. Even if we end up accepting notable increases in lag once in a while, it might be worth identifying a series of factors in occasional spikes. --- Jura 16:12, 17 June 2019 (UTC)


Maybe some automated status posted by bot on Wikidata could be helpful, e.g.

  • "query server lag is now above 12 hours: the WMF/WMDE maintenance team is aware of the issue and investigating potential causes and to identify means for mitigation. This incident is tracked under phab ticket <>"
  • "query server lag is now above 12 hours: WMF/WMDE isn't providing support on weekends/public holidays. This incident is tracked under phab ticket <>"
  • "query server lag is now above 2 hours, but below 12 hours: this is considered acceptable lag"
  • "query server lag has no lag (< 10 min)"
  • "query server lag has low lag (> 10 min, but < 2hours)": bots are requested to slow down

Just to manage users' expectations .. --- Jura 14:16, 24 June 2019 (UTC)

Inverse properties and Lua

There is some cases in which some people wants information in an infobox in Wikipedia, and a property exists. Fine ? Nope, it would actually work with the inverse property. So people asks for a new inverse property arguing it’s needed for their infoboxes, but the Wikidata property reviewers still refuses to create an inverse property because it’s an inverse one. Result ? Blocked on the social size because of just technical reason. A deadend for them.

Can’t we really not implement a lua call to retrieve the set of statements the item is value for for a property ? Why don’t we have already a solution for this ? The blob nature of a Wikibase item as far as Mediawiki is aware of ? Is there a ticket for this ? author  TomT0m / talk page 13:55, 14 June 2019 (UTC)

Hello,
At the moment, the best way to get this kind of results is with queries, but it's not possible to mix queries and Lua.
Can you give us a few examples, so we can understand the need better? Lea Lacroix (WMDE) (talk) 12:22, 20 June 2019 (UTC)
J’avais en tête ce sujet sur le bistro (voir en particulier les réponses à mes interventions) ou un utilisateur hésite à demander une nouvelle propriété à cause d’une mauvaise expérience sur Wikidata:Property_proposal/maintain. Malgré l’argument du fait que la propriété est nécessaire pour avoir l’information dans l’infoboîte, elle n’est finalement pas crée. author  TomT0m / talk page 11:23, 22 June 2019 (UTC)
Il y a aussi la découverte automatique de lien interwikis par des modèles qui pourrait en bénéficier, par exemple en prenant une paire de propriété inverse qui existent vraiment, dam (P4792) et dam (P4792) on peut voir en comparant les deux requêtes que [1] et [2] (qui comptent les déclarations par propriété qui vont d’un lac vers son barrage, et du barrage vers son lac respectivement) qu’il n’y a pas le même nombre de déclarations pour les deux même si la propriété existent en pratique. Pour affiner un peu j’ai cherché à savoir pour combien de paires lac/barrage il y avait des articles sur Wikimedia pour les deux éléments, mais pour lesquels il n’y avait pas les deux déclarations inverses ([3]), et il semble qu’il y en ait une quarantaine. Ça veut dire que dans le cadre du projet WD:XLINK (qui est moyennement vivant mais il y a un message dessus aujourd’hui, donc pas totalement mort), il est impossible de capturer correctement les interwikis dans le cas ou on souhaiterai qu’un article sur un barrage dans une langue soit relié automatiquement à un article sur le lac réservoir dans l’autre langue. C’est pas l’aspect le plus bloquant pour faire avancer ce projet mais ça y contribue. La situation est évidemment pire dans le cas ou il n’existe pas de propriété inverse. author  TomT0m / talk page 12:31, 27 June 2019 (UTC)