Shortcuts: WD:PC, WD:CHAT, WD:?

Wikidata:Project chat

From Wikidata
Jump to navigation Jump to search

Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Please use {{Q}} or {{P}}, the first time you mention an item, or property, respectively.
Requests for deletions can be made here. Merging instructions can be found here.
IRC channel: #wikidataconnect
Wikidata Telegram group
On this page, old discussions are archived after 7 days. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/06.

Deletion of entries from databases we agree to upload

Recently there has been debate at Wikidata:Requests_for_deletions concerning entries from a database we agree to upload: The Peerage (Q21401824). Once we agree to upload a dataset should we be deleting entries on an ad hoc basis, or are they sacrosanct once we have reached consensus to upload them? This will come up in the future as we upload more large datasets of people, so best to discuss it now. We already correct errors and merge duplicates within the dataset. --RAN (talk) 02:35, 28 August 2020 (UTC)[reply]

In my opinion discussion should be centralized at Wikidata:Administrators'_noticeboard#Please_restore_the_red_link/links_in_this_family_tree.--GZWDer (talk) 03:58, 28 August 2020 (UTC)[reply]
This is a general question open to everyone, and it affects future entries. Administrators noticeboard is for deletion/restoration arguments among administrators for specific cases. --RAN (talk) 04:17, 28 August 2020 (UTC)[reply]
I don't see why we should be bound by past decisions forever. BrokenSegue (talk) 04:20, 28 August 2020 (UTC)[reply]
  • Nothing is "sacrosanct" just because there has been a discussion earlier that led to an "agreement" of whatever form. If issues arise from a certain import, we need to address them and that can also result in deletions of parts or even a complete import. It is pretty difficult to make an educated decision for or against an import, as many users do not really have an overview about the dataset to be imported.
    For The Peerage (Q21401824) in particular, I have serious concerns about the plenty of items about minors that really should not have been imported here at all. It is not that much a matter of notability; I am concerned about BLP. Not sure whether it has been discussed earlier or before the import, but I really think that all items about minors imported from ThePeerage should be deleted. —MisterSynergy (talk) 08:22, 28 August 2020 (UTC)[reply]
  • English Wikipedia or ThePeerage should not publish such information either, but it is their problem if they decide to do so. In general, I do not think that another project should force us to host data that we find problematic; if they want to publish such data, they need to host it locally.
    Publishing information about living people is always an ethically problematic act and we are already extremely liberal in this regards. The general notion here is that being described by a Wikidata item is a desirable situation and we only assess whether an item is admissible based on the notability policy. This is, however, not generally valid and we regularly see persons who ask to have their item or specific information from it deleted from Wikidata—decisions are made on a case-by-case basis. For minors in particular, we cannot expect them to make an educated decision about this question, thus we should rather err on the side of caution and not host any (personal) information about them here. You can still assign number of children (P1971) claims to an item with a sitelink. —MisterSynergy (talk) 18:13, 28 August 2020 (UTC)[reply]
  • The names of Cumberbatch's two children have been widely reported in the press, as the citations on the en-wiki article show [1]. So why should en-wiki (or any other wiki, deriving from us) not report that already widely-reported information? In fact, this typically will be the case in general: that the information has either been in the press, or publications like Who's Who, Burke's Peerage, or Debrett's Distinguished People, otherwise neither en-wiki nor ThePeerage would know those names. So why should we suppress it? Jheald (talk) 21:19, 28 August 2020 (UTC)[reply]
  • Gossip magazines (and occasionally also the more "serious" press) also infringe on the privacy of Cumberbatch's children as they earn quite some money that way. This is generally so much the case that it is questionable whether being born as a descendant of a prominent person is a gift—or a liability. Here at Wikimedia we should not be part of this problem and be extremely careful with such information, even if it is well-known on the Internet anyways. —MisterSynergy (talk) 21:33, 28 August 2020 (UTC)[reply]
    • I'm fairly certain that this wasn't discussed before upload (except maybe between RAN and GZWDer privately).
      At some point, all items were created and we had to start repairing and completing them. Some of this still needs to be done.
      The database is fairly important for some aspects of the UK before Tony Blair and can be useful for that. Still it includes a large number of items after Tony Blair or unrelated to the UK. Also, apparently TP includes third party database imports that couldn't be referenced otherwise. --- Jura 09:05, 28 August 2020 (UTC)[reply]
Please stop the libelous speculation by writing "wasn't discussed before upload ... except maybe between RAN and GZWDer privately". I had nothing to do with the upload and had no communication privately or publicly with anyone on the subject. --RAN (talk) 15:54, 28 August 2020 (UTC)[reply]
Can you provide links to what you meant with "entries from databases we agree to upload"? You have left unanswered the question about your claim before. I'm really curious about who agreed with GZWDer about "databases we agree to upload". You did write "we". --- Jura 16:17, 28 August 2020 (UTC)[reply]
Ah, I see. I apologize. I meant we, as in Wikidata. Sorry for the confusion. --RAN (talk) 18:33, 28 August 2020 (UTC)[reply]
  • @Richard Arthur Norton (1958- ): We didn't agree to upload The Peerage (Q21401824). The decision to upload was made without seeking an agreement via the bot approval process or an agreement on the project chat. Given the amount of imported data I think an agreement should have be sought via the bot approval process and if that would be the case I think the agreement of that process should be taken into account when discussing whether to delete items but even then we can change our minds. ChristianKl11:16, 28 August 2020 (UTC)[reply]
If for some reason, we agree to delete entries, the criteria should be objective and done by a bot. When an individual chooses what to keep, and what to delete, from a curated collection like TP, we introduce subjective bias. For instance, someone may think wives are not notable enough to be included, which removes them from history. These are in addition to whatever biases may already be in the TP database. --RAN (talk) 18:40, 28 August 2020 (UTC)[reply]
If Wikidata imports someone else's database, I don't think there should be deletions unless there are substantive legal issues involved in retaining such.
If someone thinks that certain QIDs should not be in the database, there should be procedures for defining a separate property (or properties) to indicate the source and utility for different purposes.
EXAMPLE: I once quoted Jones and Libicki (2008)[1] to someone who had spent years in the US military including in senior positions in the US Department of Defense. My respondent complained that the DoD did not think highly of the work of the Rand Corporation on something like this. In my judgment the best response to that kind of complaint would be to get the data used by Jones and Libicki (2008) into Wikidata and add a property or properties to allow others to flag individual cases and redo the analysis using different definitions of how individual cases should be coded, which cases should be included and which not. That should help elevate the debate from "We don't trust" a particular to source to focus on the sources of distrust.
The "Coalition of the Willing" has killed hundred of thousands and maybe millions of lives and spent over three trillion dollars, if we believe the estimate by Stiglitz and Bilmes (2008).[2] If Jones and Libicki are correct, this entire exercise has made the world poorer and less safe.
I believe this kind of research could be crowdsourced on Wikidata. I have so far not been able to initiate such a project, but I hope to in the future if someone else doesn't to it without me.
Secondarily, what are the notability requirements for Wikidata?
I had understood that there weren't any. I've been routinely creating Wikidata items for authors of publications I cite when they do not already have a Wikidata item. Many if not all of the people for whom I create Wikidata entries are not (yet) the subject of a Wikipedia article. (Of course, having them in Wikidata should make it easier for someone in the future to decide if a given author was sufficiently notable to deserve a new Wikipedia article based on the number of publications they've authored that are in Wikidata. However, I don't see that as super relevant to any notability assessment.)
Similarly, a photographer at Wikimania Montreal uploaded File:Spencer Graves-2.jpg to Wikimedia Commons on 2017-08-29. I created a companion Wikidata entry, Spencer Graves, Wikidata Q56452480View profile on Scholia, on 2018-09-03. This person was the author of material that I cited in Wikiversity articles, so I created a Wikidata entry on him. [By the way, "him" is "me", in this case.]
Should I not be using Wikidata in these ways?

References

  1. Seth Jones; Martin C. Libicki (2008). How Terrorist Groups End: Lessons for Countering al Qa'ida. RAND Corporation. ISBN 978-0-8330-4465-5. JSTOR 10.7249/mg741rc. OL 16910145M. Wikidata Q57515305.View profile on Scholia
  2. Joseph E. Stiglitz (2008). The Three Trillion Dollar War. W. W. Norton & Company. ISBN 978-0-393-06701-9. OCLC 181139407. OL 624824W. Wikidata Q7769107.View profile on Scholia
DavidMCEddy (talk) 20:20, 28 August 2020 (UTC)[reply]
  • All large databases, from The Peerage, Geni.com, WikiTree, and even the Encyclopedia Britannica have errors, and nothing is sacrosanct. All entries imported from crowd-sourced databases like Find a Grave and Geni.com should be viewed with a degree of caution and skepticism: duplicate persons are common, and hoaxes (entirely fictional people) are certainly possible (and as I've said before, The Peerage is literally the work of just one guy!). For duplicates we can merely merge items, and note that they may have 2 or more external identifiers. Hoaxes and unverifiable items should probably be deleted, regardless of what disreputable websites claim. External databases should not dictate policies or data curation on Wikidata. It is appropriate to delete items on a case-by-case basis, even if it means mildly inconveniencing a data query. We have to have standards and be willing to say no, otherwise it is inevitable that one day, a devoted bot handler will scrape every person ever named in print or online, from birth records to yearbooks to Facebook profiles (all of you will be items, hooray!), to feed the ever hungry beast that is Wikidata. -Animalparty (talk) 21:20, 28 August 2020 (UTC)[reply]
    • Even Kindred Britain, a more formal genealogy project hosted at Stamford University, contains errors. In particular, Kindred Britain has a tendency to record affairs as marriages, which resulted in Oscar Wilde (Q30875) having simultaneously a wife and a husband. I deprecated the "husband" and inserted the name as unmarried partner (P451). Hoax entries should be retained but deprecated in some way. If you identify it as a hoax, the data has been cleaned and can be ignored; if you delete the hoax, the data will be restored by another user when you aren't paying attention and will again show up in reports. From Hill To Shore (talk) 23:41, 28 August 2020 (UTC)[reply]
So you're advocating that if I were go online and create a profile for the love child of Richard Nixon (Q9588) with Margaret Thatcher (Q7416) (let's call him Baby Adolf Hussein Thatcher-Nixon), and myself (or some dumb robot) dutifully created a corresponding Wikidata item, that that should forever be on Wikidata? That Nixon and Thatcher should have Baby Adolf as a (deprecated) child (P40) because someone on some website said so? Why would we put rubbish on the same level as research? This speaks to deeper question about the guiding, foundational philosophy of Wikidata (if there is any): should Wikidata have all information ever, or good information? -Animalparty (talk) 02:28, 29 August 2020 (UTC)[reply]
That's a great example: If a QID is created for such with a reference to a weblink that actually makes such a claim, I think Wikidata should have a procedure for marking it as "Misattributed", as is done in Wikiquote, e.g., Wikiquote:Abraham Lincoln#Misattributed.
If a QID contains no claimed source, then it might be sensible to delete it, provided the most recent change was at least, say, 48 hours old, so we don't delete a QID that a volunteer is in the process of creating.
Such a "Misattributed" property may need to distinguish between an unreliable source and a source that seems not to provide the claimed information. DavidMCEddy (talk) 05:34, 29 August 2020 (UTC)[reply]
@DavidMCEddy: Common values for the reason for deprecated rank (P2241) qualifier: https://w.wiki/afu One of these should allow you to make the distinction you wish. Jheald (talk) 08:49, 29 August 2020 (UTC)[reply]
As to Animalparty's question, if a widely-used website (like ThePeerage) or source (like old editions of Burke's Peerage) make a claim that we can establish to be false, it is good and useful for us to record that here, precisely so that downstream readers know that this claim does exist, and may be widely repeated, but we have examined it, and established it to be false. Also, long experience tells us that, at least as far as "high-visibility" sites and sources go, if we don't include the claim and note why it is false, then sooner or later somebody will add it assuming it to be true. That doesn't mean that every nonsense on every no-mark website should be included. But any claim from a website or source that people may be likely to find and take seriously probably should be. Jheald (talk) 08:56, 29 August 2020 (UTC)[reply]
@Animalparty: For an example of a hoax entry that we should retain in deprecated form, see Sigrid of Halland (Q75437282). The entry looks plausible but Scandinavian editors advise that it is probably fictitious and there is no evidence to support it beyond sites that refer to the original fictitious account. If more reliable sources appear later, we can restore the details to normal rank. From Hill To Shore (talk) 11:42, 29 August 2020 (UTC)[reply]
User:From Hill To Shore: Thanks very much. This makes a valuable and perhaps definitive contribution to this discussion (with both the property used for such purposes and an example). DavidMCEddy (talk) 11:57, 29 August 2020 (UTC)[reply]
Sometimes I think we need a reason for deprecation that goes beyond hoax (Q190084): an item for "complete and utter bullshit". - Jmabel (talk) 16:58, 29 August 2020 (UTC)[reply]
From Hill To Shore: So you would have no problem with me adding ? -Animalparty (talk) 23:49, 4 September 2020 (UTC)[reply]
@Animalparty: So long as you make sure to properly deprecate it, as with any other statement that is poorly sourced, then go ahead. By deprecating it, you will stop a bot from adding it later as a valid statement. I am curious why you are trying to prove a point with this though. I gave a valid example earlier in the conversation so your attempt to provoke a reaction with an intentionally incorrect statement is confusing me a little. From Hill To Shore (talk) 00:05, 5 September 2020 (UTC)[reply]
It is not appropriate for Wikidata to decide what is or is not correct, only what has been said. Wikidata is a place for machines to faithfully regurgitate the output of other machines. -Animalparty (talk) 04:24, 5 September 2020 (UTC)[reply]
@Animalparty: If that is your belief then start a separate discussion to remove the functionality that allows deprecation. I'm getting the impression that you are trying to provoke me into an argument with these statements but I am not going to indulge you. If you genuinely believe what you are writing, go get a consensus to support you. From Hill To Shore (talk) 08:46, 5 September 2020 (UTC)[reply]

TP cleanup

As RAN made us realize, there was actually no consensus to upload this into Wikidata.

The question is now how to fix it. It seems that there is no agreement to include items about minors from that database and possibly anyone born since Blair. The cutoff could also be calculated as people that are children of the generation born in the 1950s. Further, people born in the 20th century that are not British should probably not have been included either. Are there other groups of people we should identify? --- Jura 04:30, 5 September 2020 (UTC)[reply]

A key issue if we are considering the deletion of entries, is that criteria have to be flexible enough to consider different scenarios. For example, if we have an item for a 10 year old child where the Peerage is the only source, then I would feel uncomfortable about retaining it. If that 10 year old child is independently notable (perhaps a movie star) then the Peerage entry could appear on that item. I am not sure why nationality is an issue here; I have come across many notable entries of non-British people that have a Peerage ID. Surely, if an entry shouldn't be here for a valid reason, the same reason would apply if the person was from any country? From Hill To Shore (talk) 08:56, 5 September 2020 (UTC)[reply]
Another issue is such data may be included in numerous other genealogical databases, plus Burke's Peerage, plus other books.--GZWDer (talk) 09:35, 5 September 2020 (UTC)[reply]
So then, criteria for identifying a set of items to further examine: birthdate 2000-present, birthdate has no references except TP, there are no external identifiers except TP. How hard is it to write a query to find those? — Levana Taylor (talk) 13:19, 5 September 2020 (UTC)[reply]
  • I think the main point problem of TP is that it's also a genealogical database. If we include them, there is no notability left. Obviously, if items are otherwise notable, they wont be deleted, but I don't think they have been created in the TP batch anyways. --- Jura 09:07, 6 September 2020 (UTC)[reply]
Why is Blair a relevant factor here? BrokenSegue (talk) 14:26, 5 September 2020 (UTC)[reply]
Blair or rather House of Lords Act 1999 (Q120826) changed the relevancy of TP. This also explains why non-British TP entries aren't notable per se. P27 is fairly easy to query. --- Jura 09:07, 6 September 2020 (UTC)[reply]
There are still no consensus for special treatment of living minors. If this is a concern, deletion is also probably not the only solution - anonymization is another.--GZWDer (talk) 08:48, 6 September 2020 (UTC)[reply]
Anonymisation is not an option. The objective of Wikidata is to link. Thanks, GerardM (talk) 09:24, 6 September 2020 (UTC)[reply]
So do you agree we can not have items with label such as "Eldest daughter of Kobe Bryant" with link refers to (external) webpages mentioning them?--GZWDer (talk) 13:10, 6 September 2020 (UTC)[reply]
When a name is not mentioned and it is this easy to find one, it is not anonymisation. I do not agree to anything, what I did / do is point out that it is not an option. Thanks, GerardM (talk) 16:49, 6 September 2020 (UTC)[reply]
So this may means we will store every names widely published in reliable sources, without any needed of agreement. In Wikidata the subject of article does not need to agree when an article is created; But as how Wikidata works, we do not require significant coverage, and having such items for children of notables will be convenient to describe (e.g. newspaper) articles where the children are mentioned.--GZWDer (talk) 20:26, 6 September 2020 (UTC)[reply]
It does not mean it. For me it is not a given that every database that is free is to be included in tutu in Wikidata. With biased information like "peerage" and information that hardly serves a purpose like "German companies", we really need to be more exclusive (for the import of entire datasets). Thanks, GerardM (talk) 08:39, 13 September 2020 (UTC)[reply]

Ranks and the UI

Currently, ranks are only displayed through a small icon that's easily ignored by newcomers. What do you think about bolding the item name of every statement with preferred rank and using strikethrough for deprecated statements? ChristianKl21:03, 4 September 2020 (UTC)[reply]

This problem is being discussed at phab:T206392. ---MisterSynergy (talk) 21:16, 4 September 2020 (UTC)[reply]
  • Maybe using a gray background for deprecated statements could do. Somehow I'd avoid making preferred statements too prominent as it could users to conclude that there should always be a preferred statement. --- Jura 08:51, 12 September 2020 (UTC)[reply]
    • I think making the text lighter, rather than the background darker would be clearer. If we could make the whole statement 70% transparent, that would be very clear, but I don't know if that is possible. Another approach would be to add a bold red "⊗" next the statement to make it clear that it's not valid. The Erinaceous One 🦔 21:33, 18 September 2020 (UTC)[reply]

Citing from Ancestry.com

Now that wiki-projects have been allowed access to Ancestry, we ought to agree on a standard way of citing information found in their scanned historical documents. I don't have an answer yet, just throwing it out there. Also I'm not clear on whether the terms of use for the wiki users allow us to download and share pages (like for example this one which was stored in order to be cited by Wikitree). If so, then there ought to be a specialized property for the link. — Levana Taylor (talk) 15:32, 5 September 2020 (UTC)[reply]

We might want to choose a path that’s also suitable for other like Matricula (for Austria). It won’t be easy and I don’t have any good solutions yet. --Emu (talk) 21:13, 5 September 2020 (UTC)[reply]
Is there a tutorial on how make the documents shareable, or does the standard url to the document allow you access without an account? --RAN (talk) 21:59, 11 September 2020 (UTC)[reply]
I've checked a bit more about document sharing on Ancestry, and here's what I see. When you're logged in, you open the "toolbox" on any document image and go to Share > Email, where you can create a public link and e-mail it to yourself or anyone. Anyone can use that link without an account. I think that it will even remain available if your account is closed, but I haven't found anywhere in their help pages that says so explicitly. The url format is like this: https://www.ancestry.com/sharing/21488143?h=85516f (you get sent a link url with a whole lot of other stuff on it but this part is all that's necessary). You can also download the page image as a jpg. — Levana Taylor (talk) 09:49, 14 September 2020 (UTC)[reply]
For what it's worth, I've been using stated in (P248):Ancestry (Q26878196) and section, verse, paragraph, or clause (P958) for the specific collection in Ancestry (e.g. "Oregon, Death Index, 1898-2008"). Ideally a document level instead of a collection level citation should be used, but I couldn't figure out how to do that. Gamaliel (talk) 12:19, 14 September 2020 (UTC)[reply]
We do have items for all of the specific collections on FamilySearch; so then the document citation is the collection and the ARK location of the record. In Ancestry, there are 32,825 collections (list). Potentially same citation format, you just have to create the document link instead of finding it. — Levana Taylor (talk) 14:51, 14 September 2020 (UTC)[reply]

Possibly rewriting Wikidata:Notability?

Hi all. It might be worth thinking about rewriting Wikidata:Notability - from my experience here, our treatment of Commons is a bit odd, as is how Wikidata:Requests for deletions works in practice. In general it's weird that the guideline focuses on sitelinks rather than concepts/items. I've started a sandbox at Wikidata:Notability/sandbox - input/edits would be welcome. Thanks. Mike Peel (talk) 22:58, 5 September 2020 (UTC)[reply]

I think we should discuss Wikidata:Verifiability first as notability relies on that.--GZWDer (talk) 01:27, 6 September 2020 (UTC)[reply]
You are mixing several issues here which should not be covered in a single policy:
A policy that tries to conflate all these issues in one place is crap and very difficult to work with. Do not make the mistake and take WD:N as the only important policy to govern admissibility here. There are several policies to consider, in fact.
The problem with Commons is that their content which we treat here (mainly Category pages) is auxiliary content at Commons and usually not subject to any identifiability or verifiability requirements, unlike practically all other Wikimedia projects. There is plenty of dubious content at Commons, and we are consequently a little more cautious with their categories than we are with content from other projects.
MisterSynergy (talk) 08:03, 6 September 2020 (UTC)[reply]
  • Also (probably BTW), even if sensitive data about minors is a concern deletion is not the only solution. Anonymize them is another (which means removing sensitive information and if name should not be included, replace them with a general one like "Son of John Smith and Jane Doe".--GZWDer (talk) 08:41, 6 September 2020 (UTC)[reply]
The biggest issue that I have with the notability is that the order in which the three requirements are is wrong. First should be does it fulfill a structural need. After that the other two arguments, the most hairy ones, are moot.
In Commons we can search in any language using: Special:MediaSearch it relies on Wikidata and as a consequence, every use of a Wikidata in Commons fulfills a structural need. When Commons for its reasons decides to remove media files it may result in lack of notability. That is however not for us to consider. Thanks, GerardM (talk) 08:28, 6 September 2020 (UTC)[reply]
Also be careful not to freeze too much in stone, I'm afraid that this will attract people who could be described as temple guardians and who would spend too much time on the community pages and do not contribute anything in term of contents in the elements. I also think that the idea is not excellent to create rules whereas it would be for example a question of discussing before importing a database. Jérémy-Günther-Heinz Jähnick (talk) 08:32, 6 September 2020 (UTC)[reply]
Hello Jérémy-Günther-Heinz Jähnick. I am curious about your concept of "temple guardians", do you feel they are necessary? Should they be recognised as such and hold accountable for? What is the right balance between community interactions and content contributions? So far I have only made "community interactions" because I do not have the "tools" that I need to do my work (properties, structure, language code...). Does that mean that I am a candidate to temple guardian?--MathTexLearner (talk) 13:24, 6 September 2020 (UTC)[reply]
At this date, you are just a new user, and we can't juge on few contributions. Jérémy-Günther-Heinz Jähnick (talk) 16:22, 6 September 2020 (UTC)[reply]
Who will judge me when I have more contributions? Are there temple guardians over there that oversee membership to the Wikidata Order?--MathTexLearner (talk) 17:34, 6 September 2020 (UTC)[reply]
@Jérémy-Günther-Heinz Jähnick: There's always a balance between letting people do whatever they want and coming to consensus. We are at a point where the amount of edits that we have holds some people back from contributing because the Wikidata can't handle as much edits as people want to make. Given that the amount of edits that Wikidata can handle is a scarce resources enforcing rules for bigger uploads does become more important. The bot-approval process was designed in the beginning to handle large database uploads and currently often gets circumvent by people using QuickStatements.
A situation where we don't have an import of German companies because the person who wanted to upload actually sought consensus while we do have big uploads from people who don't seek consensus is unfair and not desireable. In we revise policy then it makes sense to revise it in a way where people are not punished for seeking consensus and rewarded for circumventing it.
Over the last year, Wikidata did well at growing overall editorship but we didn't do well in community consensus building and the quality and consistency that comes from good consensus finding. ChristianKl18:16, 6 September 2020 (UTC)[reply]
These arguments are not about notability, they are about fear. Yes, we do not want nor need all German companies or companies of any other country but we do want a mechanism whereby the data can be linked to Wikidata. I am sure that German companies like Dutch companies are known by a number issued by a registry (in the NL the KVK number), for those companies that are notable at a Wikipedia level, we want those numbers. We have had big uploads without consensus? Export them to an instance of Wikibase and have links to Wikidata where on the Wikipedia level we want this link.
We have a really large amount of data about science, scientists and scientific publications. This data is becoming increasingly irrelevant because tools to maintain them became unavailable. The argument was that because of too many duplicate ORCID id we had an unmaintainable situation. Except, the absence of the tool did not diminish the problem and with a tool, these duplicates were merged. There was silence when it was requested to reinstate the tool. So we increasingly have a situation where Wikidata is increasingly problematic in its data because of a lack of consensus. Yes, we will get more data but it will also help establish relevance in notability issues at Wikipedia.
As to this genealogy thing (Burkes), at the time it was pointed out that it is an extremely biased publication. It is lily white and irrelevant outside of England. The use of the data makes only sense when you intend to link notable people that are inside Wikipedia. A link to Burkes establishes such credentials and that is all the use I can think of.
So yes, when a large collection is imported that is not notable bin it. That is not "temple gardianship" it is weeding out the bad stuff. What would help is when we are more explicit in what data we do want. Mind you everything that is equivalent to what we have for the first world, is notable when it is from the second, third world. Thanks, GerardM (talk) 06:03, 7 September 2020 (UTC)[reply]
According to Mike Peel's draft all German companies (which do have official IDs) would be notable. The government register for companies is a reliable source. ChristianKl13:54, 7 September 2020 (UTC)[reply]
This is the same for many other countries; and also the chairman of those countries (strictly said, many items deleted as "spam" or "promotional" will be notable by this criterion). I once proposed two properties - this is a fairly reliable database describing more than 300 million companies and more than 200 million people, but it is under a paywall.--GZWDer (talk) 16:57, 7 September 2020 (UTC)[reply]
The question should not be is it notable and can we duplicate it, the question should be do want that and does it scale.. Germany is only one country and do you want all businesses of all countries all the time. What is the point, what do we achieve, what does it cost and is it worth it. Who is going to maintain the companies for countries that currently have no interest like Senegal, Angola et al?? !! Thanks, GerardM (talk) 06:29, 8 September 2020 (UTC)[reply]
@GerardM: Our notability policy is the policy for what items we keep and not delete. If you want to delete certain items you need to word the policy in a way to allow deleting those items. ChristianKl12:13, 9 September 2020 (UTC)[reply]
My point is that something can be notable in its own right but not as part of an import duplicating what we can can find elsewhere. What possible added value is given to German companies when we can link to the authoritative source for companies we include for other reasons? When German companies are notable, so are Chinese companies. Ask yourself, what is the point. Thanks, GerardM (talk) 12:45, 9 September 2020 (UTC)[reply]
@GerardM: if we want to have a policy according to which "something can be notable in its own right but not as part of an import duplicating what we can can find elsewhere" we need to decide how such a policy looks like. One way of doing that is to require consensus decisions for larger data-uploads. How do you think a policy should look like that enforces such a criteria? ChristianKl10:04, 10 September 2020 (UTC)[reply]
  • We are talking the import of databases. The first requirement is, for the topic of the data there is already a subset in Wikidata. There is a property that links to the database whose import is requested. The data serves a purpose; there must be an application for the data. The notion that it is free, we can have it does not cut it. When the data is biased to one country or area it needs to be considered what happens when it is scaled up to a global level. How will the database be maintained, at first and at scale. Will we have active cooperation with the organisation that provides us with the data?
  • A positive example
We link to a subset of the content of OpenLibrary. It wants people to read books, we want to share in the sum of all knowledge. Our aim is achieved when we enable more people to read OL books. Thanks to links to VIAF and LoC, we have the means to easily link much more than that we do. We have cooperated with OL and it reciprokes our identifier in its database. Our and their data has been synchronised in the past.
  • A positive example
We link to a lot of scholarly publications. ORCID is a database where we can discover many publications and their associated authors. ORCID is interested in working with us. What they can offer, what we can offer is for a scientist to start a process and update/import his or her publications and associated co-authors. This continuous process will allow us to discover modern sources to the subjects written in Wikipedia. Thanks, GerardM (talk) 18:42, 10 September 2020 (UTC)[reply]
  • Yes, a list of all German companies could possibly fit in Wikidata under the current rules, but it it is not a priority. I call this the telephone book conundrum. We are looking for information dense data sets. A list of companies and an tax ID number is information sparse and not very useful, just as a telephone directory is sparse, with just two data points. It also takes us a year of work to merge entries from large databases like The Peerage, so always best to be slow and get them done properly before we move to he next large project. --RAN (talk) 02:51, 11 September 2020 (UTC)[reply]
  • I think a full list of companies, with any significant information like directors and ownership, would be very hard to keep up-to-date. A single large company group can have dozens or hundreds of subsidiary companies, constantly changing. Ghouston (talk) 22:19, 11 September 2020 (UTC)[reply]
  • We are not looking for data dense sets of data. There is no point to be all inclusive. What we should be looking for is data that serves a purpose, data that scales the whole world. When an external database (data rich or not) has identifiers, we should always link. Maybe adopt when the data is no longer maintained but only adopt when it serves a purpose, our purpose. Thanks, GerardM (talk) 05:27, 14 September 2020 (UTC)[reply]
  • Is ORCID really a positive example? Granted, the author items work within the confinement of the “papers portion” of Wikidata – but reusing them in other contexts or simply avoiding duplicates is a nightmare not unlike The Peerage. --Emu (talk) 08:15, 14 September 2020 (UTC)[reply]
I think you do not get that the data from ORCID is and has been the basis of what and how we know about scientific developments in Wikidata. Tooling like Scholia expose not only the individuals position (papers and co-authors) but does so for awards, organisations, topics.. Your problem is not with the data from ORCID, it is with the substandard way we import data into Wikidata. We do not maintain integrity, technical we have a big issue there. This is about the value of the data; check. This is about cooperation with other orgs; check. Name me one other example that has this kind of impact on our mission, sharing the sum of all knowledge. Thanks, GerardM (talk) 05:39, 15 September 2020 (UTC)[reply]

Applicance of television show media content ratings on WD

Let's say that a television show (whether it being a single season or the entire series) have been released on DVD and said DVD release have received a 'Mature' rating from a governmental media content rating organization.

Should the rating of said DVD release apply to:

  • The items of the individual episodes included on the DVD
  • The items of the season(s) included on the DVD release
  • The item of the television show
  • All of the above

When a episode/season have received a rating it's usually understood as also applying to the television show as well. If the rating only applies to specific seasons or episodes then this can be specified with applies to part (P518) and excluding (P1011). Hence why i prefer the fourth option. --Trade (talk) 08:23, 7 September 2020 (UTC)[reply]

I have several DVD sets in my collection where each disk has its own age rating; the collection then takes its age rating from the strictest rating of those disks.To qualify for a particular age rating, producers may edit the original show to remove certain scenes or they may include bonus scenes not shown in the original broadcast. Because of this, we probably need to apply age ratings to particular version, edition or translation (Q3331189) that represent the DVD set rather than to an item about the show itself. It is also worth noting that several different editions of a DVD will be released at the same time for different countries, each with their own edits and rating (the DVD set released in Australia may have slightly different content to that released in the UK and USA). I think it will be a pain to map this in Wikidata though. From Hill To Shore (talk) 09:43, 7 September 2020 (UTC)[reply]
What are the chances of two episodes on the same disc having different ratings? @From Hill To Shore: --Trade (talk) 08:38, 14 September 2020 (UTC)[reply]
  • Interesting point. My first pick would probably have been season, but maybe show is sufficient (both with appropriate qualifiers). @Máté: has probably a more qualified view on this. --- Jura 08:44, 14 September 2020 (UTC)[reply]

It really depends on the system. Most countries rate either by the episodes or by the relase which usually contains one season. Now, these seem quite straightforward, if they rate by the episode, let's add the rating on the item of the episode but if they rate by the season, let's add it to the item of the season (or to the series if it only consists of one season). In the former case, usually the srictest rating will apear on the DVD, while in the latter it is uniform, so no trouble. The issue arises when they release DVDs that only contain like four episodes that do not make up an entire season (i.e. the season is released in several parts). The systems that rate by releases will have a rating for, say, season 1 episodes 8-12. We don't have an item for that. I would discourage adding the rating to each episodes (conflicting certificate IDs and statements that are not strictly true). I personally don't think those ratings should be included in Wikidata since the subject of the statement is just not notable. ALternatively, we could add these to the smallest unit that includes all concerned episodes, and use P518. – Máté (talk) 12:18, 14 September 2020 (UTC)[reply]

Q61642210#P8326 lists the name of every episode in the first (and only) season as an 'alternate title'. Would it be reasonable to assume that the rating applies to all episodes on the DVD release? @Máté: --Trade (talk) 07:44, 15 September 2020 (UTC)[reply]
In this case the rating applies to the series. – Máté (talk) 18:13, 15 September 2020 (UTC)[reply]
And the first (and only) season i presume?--Trade (talk) 12:56, 16 September 2020 (UTC)[reply]
As there is only one season, they are the same entity, and we won't have separate items for them. Máté (talk) 16:48, 16 September 2020 (UTC)[reply]
I have yet to see any database that rates per episode. Which ones are you referring to?--Trade (talk) 16:54, 16 September 2020 (UTC)[reply]
E.g. NMHH in my native Hungary does that but I've seen it in a few other markets as well. Máté (talk) 17:11, 16 September 2020 (UTC)[reply]

COVID cases

I've noticed that ranking (P1352) is used in pandemic in France (Q83873593) (I don't know if it is used somewhere else) to rank daily updated count of cases, but IMO series ordinal (P1545) would be more suitable there. — Draceane talkcontrib. 12:55, 10 September 2020 (UTC)[reply]

Also COVID-19 pandemic in Norway (Q86886544) is/was using ranking (P1352). Pmt (talk) 05:29, 11 September 2020 (UTC)[reply]
@Pmt: Do you have any idea how to fix it? — Draceane talkcontrib. 16:24, 13 September 2020 (UTC)[reply]

I'm very tired of this situation now.

More than 250 characters can be written in the English description section. But other languages have limits. I have asked this before again here. Also, Q9175810 has the same label on Q8478926. But when I do this in Turkish, he doesn't accept it. Why does it accept it in English? Why is there discrimination in language?

In addition, all the institutions of the United States are mentioned as if they control the whole world. So generalizations are always made. He does this a lot in Britain and Britain and Australia. You have to see this kind of injustice. USA, UK and Australia are not unique in the world. Each country has its own institution and starts with the country name.

So I'm really tired of this situation. When I see these, I don't want to contribute. --Sezgin İbiş (talk) 21:39, 10 September 2020 (UTC)[reply]

For the second point, do you mean labels like on Department of Statistics (Q17010279), where it's just "Department of Statistics" and doesn't include the country name? I assume that it's done that way because the organization is known as simply "Department of Statistics" within Bermuda. Ghouston (talk) 23:29, 10 September 2020 (UTC)[reply]
For the first point, you may want to read the section further up the page where Wikidata developers are asking for feedback on how easy/difficult you have found it to request improvements. If you have made a formal request and it has been ignored, this is a good opportunity to raise it and find out what happened. If you have just complained on project chat and expected something to happen, this is a good opportunity to make it a formal request.
Another example of your second point is Ministry of Justice of Spain (Q3181035), which in its native Spanish label is listed solely as "Ministerio de Justicia." Many items in their native language do not indicate their jurisdiction in the label. A generalisation that it is only applicable to English language labels suggests that you are not considering other languages to the same degree. From Hill To Shore (talk) 00:00, 11 September 2020 (UTC)[reply]
For your point regarding Template:Wikidata (Q8478926) and Template:Wikidata (Q9175810) the duplication error considers the Label and Description together. If both the Label and Description are the same on both items, an error is generated. I had assumed that the same behaviour worked for all languages. Have you tried setting the same label but different descriptions in Turkish? From Hill To Shore (talk) 00:07, 11 September 2020 (UTC)[reply]
Thanks again for your work labelling properties in Turkish. Every property you label will appear on thousands or millions of items, so it is a great service to Turkish language speakers. From a personal point of view, it also means that Wikidata:Entity Explosion will also work really well in Turkish now. I wonder if the different character limit is related to different character encoding. Because some langauges have more characters, they require more bytes to specify each character. In any case, usually 250 characters would be too long for a good description, so it's better if we all keep them short. --99of9 (talk) 01:54, 11 September 2020 (UTC)[reply]
OK. Q9175810 and Q8478926 issue was caused by the label and description being the same. I tested 250 characters on test.wikidata.org and here. There is a 250-character limit for new entries for English. I've overcome these problems. OKAY. Selfishness, which does not have a language problem and is a problem for the USA, England and Australia, is not the subject here. It's the problem of these countries. (If you look carefully, you can see that other countries do not have such a problem anyway.) Understood. Thank you. --Sezgin İbiş (talk) 07:23, 11 September 2020 (UTC)[reply]

Please assume good faith (not just because it's policy). Of course all ministries of finance should just have the label "Ministry of Finance", because that's their (translated) name. But everyone is probably apt to think of other country's ministries as ".. of <country>", more so than for their own. I doubt that US/UK editors are any more likely to do so. It happening more for English labels than others can be explained by there just being more editors from those countries.

In any case, I just checked a search for Ministry of Finance, and the vast majority of English labels are correct. As a contrast, searching for Finanzministerium (the German equivalent) returns country-qualified labels almost exclusively. --Matthias Winkelmann (talk) 21:52, 16 September 2020 (UTC)[reply]

Evolution of the community communications roles for Wikidata

Hello all,

For the past four years, I’ve been working in the software department at Wikimedia Germany, taking care of the communication between the Wikidata development team and the community, announcing new features, collecting bug reports and feature requests from you. On top of that, I’ve been coordinating various projects, bringing the WikidataCon to life, coordinating the Wikidata decentralized birthday, creating a prototype for Wikidata Train the Trainers, and taking care of various onsite or online workshops, meetups and other events.

Over the past years, with the Wikidata community growing, the development team growing as well, more and more events happening, and the ecosystem of Wikibase users forming a distinct group with different needs, it became pretty clear that one person was not enough to keep track of everything and provide the best support for the Wikidata editors. That’s the reason why, earlier this year, we had the pleasure to announce the arrival of a new colleague who you already know from being an active Wikidata editor, Mohammed Sadat (WMDE).

We already started a smooth transition of our roles: while Mohammed will become the main person in charge of community communications for the Wikidata and Wikibase communities, I will focus more on organizing Wikidata-related events and supporting community members with their own events and projects. As you may have noticed, Mohammed already took over editing the weekly newsletter, monitoring the social media, and various announcements for Wikidata and Wikibase. As for myself, I will not disappear completely from the Wikidata channels: I will keep supporting Mohammed on community communication, for example with projects like the Wikidata Bridge, in which I’ve been involved since the start.

During this transition phase, we will review and improve our existing communication processes, and you can for example give feedback on the experience you had while reporting bugs or feature requests. Feel free to reach out to Mohammed if you have any questions regarding Wikidata’s development roadmap.

I’m looking forward to continuing working with you on various projects: feel free to contact me if you want to discuss Wikidata-related events, training, online events, or any other ideas you have in mind to gather the Wikidata community and onboard new editors.

Cheers, Lea Lacroix (WMDE) (talk) 08:08, 11 September 2020 (UTC)[reply]

@Lea Lacroix (WMDE): thanks for the update and great to see you are still onboard! − Pintoch (talk) 08:30, 11 September 2020 (UTC)[reply]
@Lea Lacroix (WMDE): Thanks for all your hard work, I was wondering what you'd be doing now that Mohammed is on board, glad to see what your plans are, keep up the great work! ArthurPSmith (talk) 17:26, 11 September 2020 (UTC)[reply]
@Lea Lacroix (WMDE): (and @Mohammed Sadat (WMDE):) I have a question regarding Wikidata Bridge, are you going to adapt it to be used also in Wikisource? I have seen that users are trying to connect Wikidata with book templates, however it seems that the Bridge could work better in that context. What do you think?--MathTexLearner (talk) 20:43, 11 September 2020 (UTC)[reply]
  • A suggestion for this years birthparty (as it can't really happen in person): how about handing out 1000 EUR to each of the 100 most active Wikidata contributors and/or ask them to vote which development to assign it to? --- Jura 11:38, 12 September 2020 (UTC)[reply]
@MathTexLearner: Thanks for your question. Theoretically, the Bridge is a feature that can be used on any wiki that is connected to Wikidata. So in the future, it could very well be used on Wikisource. However, I don't see that happening any time soon: the Bridge is at its very first steps (first version is currently deployed on one Wikipedia and supports one data type), so there will be quite a lot of development, time and community feedback before it reaches a point where it can be deployed on other wikis.
@Jura1: The idea of asking the Wikidata community about what their priorities would be for the roadmap is definitely interesting. However, I don't think that it should go with money attribution. Also, in the development world, 1000€ is basically nothing. As one can see for example in the project grants related to software, the costs range is much bigger. And why should it be the top 100 editors, aren't the newcomers and casual editors interested in selecting projects that could make their life easier as well? Finally, I think throwing money to people to develop features is not the right way to go. Features should be developed in a consistent way, attached to the existing codebase and making sure that the existing development team has the resources to maintain them in the future. It is less a matter of money than priorities and sustainability. And BTW, the birthday celebrations are definitely happening - plenty of cool events taking place online and offline. Lea Lacroix (WMDE) (talk) 07:01, 14 September 2020 (UTC)[reply]
Supposedly the 100 people don't all have different view ..
It seems to me the last birthday celebrations were mainly attended by some of the 100 most active contributors and did seem to cost 1000 per person. Essentially all money that wasn't spent on development, so we still haven't gotten to the bottom of the Query Server problem .. (after 2 or 3 years?) --- Jura 17:26, 15 September 2020 (UTC)[reply]
  • It's quite easy to understand being a part of the community what the developers do. We can look at Phabricator and see what tickets they are working on and when needed give feedback. On the other hand the amount of community tasks that are visible to the community don't seem to be big enough to occupy two people in addition to a project manager. Northcote Parkinson studied how the British Civil Service manages to increases it's headcount while the amount of thing it actually got done shrank. The WMF has a history of engaging in make-work for an increasing number of employees while editorship numbers fell in the early part of the last decade that to me seems like it mirrors what Parkison studied.
It seems like Jura1 is suggesting that a lot of money was spent for the birthday party that could have been spent better on development.
When it comes to "community communication" it would be good to have a clearer idea about what tasks are done. I good portion is likely Parkinsonian make-work that wouldn't need to be done. Another portion could be tasks that the community could do and where facilitating the community to do the tasks would be better then doing them with paid labor.
When it comes to missing community communication, we have performance issues with the Query Server. It would be good to have a general guide about how data modeling decisions made on Wikidata affect the performance as that allows us to get most out of what the query server can deliever.
Earlier this year Mohammed Sadat (WMDE) posted two weeks queries that suggested that certain people who aren't slave holders are slave holders. He promized to do everything he could to not do the error again. The suggestion to do through a standard quality assurance process like the 5-Why's to fix whatever went wrong was ignored. That leaves me with the question of whether I should believe that future promises to do things will also be insincere.
I also do believe that it's worthwhile to be transparent about the community communication processes to allow for improvement. ChristianKl11:14, 16 September 2020 (UTC)[reply]
@Jura, ChristianKl: I don't want to discuss about WMDE's (or WMF) budget and strategy here. It's much more complex than "throwing more money at deployment" (just like the issues with the Query Service are much more complex than "let's buy more servers") and it's a topic where it is necessary to have the full picture in mind to have constructive discussions. It's not my role to discuss these choices with the community, and I'm sure that people who want to discuss about how money is spent have been following the various discussions and projects around the movement strategy and the Wikidata strategy.
About the Query Service, our colleagues from WMF are doing a lot of efforts to communicate about the status of the Wikidata and Commons query service (see emails from Guillaume Lederey on the Wikidata mailing-list). They also stay very accessible per email or during office hours, and they are not hiding anything about the issues and doubts they encounter: so people who are willing to get more technical details about the Query Service can definitely find information.
We saw your suggestion about the 5 whys and we decided to not answer that. The apologies have been done, the issue has been discussed internally with our manager, and we will not expose the details of these discussions out here.
Providing transparency regarding Wikidata's development does not mean being accountable to the editors for our daily work, neither disclosing precise information about the tasks we do. I don't think knowing exactly what we do of our day is going to support you directly in having a good experience while editing Wikidata. If you disagree with that, and since you seem to think that our work is useless, "Parkinsonian" whatever that means, and easily replaceable, feel free to contact our manager who will be happy to continue this discussion. As for me, you can consider this message as being my last interaction with you two on this topic. Lea Lacroix (WMDE) (talk) 13:57, 16 September 2020 (UTC)[reply]

Items for specific YouTube video

See history of Q96362432. Should we instead create items for each YouTube videos describing specific people?--GZWDer (talk) 20:25, 11 September 2020 (UTC)[reply]

That said, YouTube video ID (P1651) seems close to video (P10), so I don't see why one couldn't use it in this case.
Neither property should be used to add dozens of videos to a single item,
nor add videos of type "item [] explained by John Doe" to dozens of items. --- Jura 20:34, 11 September 2020 (UTC)[reply]
Corrected my comment above. --- Jura 07:35, 12 September 2020 (UTC)[reply]
  • Why is that clear? It's not at all clear to many editors who use this properly in this way. Gamaliel (talk)
That doesn't address the issue of YouTube video ID (P1651) being used incorrectly. Gamaliel (talk) 22:40, 13 September 2020 (UTC)[reply]
If items for specific YouTube videos are allowed, you should create items for them and link the person item via described by source (P1343).--GZWDer (talk) 23:03, 13 September 2020 (UTC)[reply]
Again, that has nothing to do with category YouTube video ID (P1651) being used incorrectly. Gamaliel (talk) 23:24, 13 September 2020 (UTC)[reply]

Hello,

I thought about queries in the last weeks. [2]. From my point of view this is the most efficient way for that. I have found a query for the selecting the number of people in Wikidata and this query [3] times out. Is there a possibilty to query Wikidata with Cirrus Search and what happens to results I dont see at the first page. Are they also saved when I get the result of the first page or are they queried after I look for the next results. In the cases I tried things with Cirrus search I got the number of results soon and now I want to understand the reasons why it is faster as the query service and if it were also faster if there were a direct possibility to export the results.--Hogü-456 (talk) 21:49, 11 September 2020 (UTC)[reply]

Here are some answers:
  1. SELECT (COUNT(?item) AS ?count){?item wdt:P31 wd:Q13442814} times out, but SELECT (COUNT(*) AS ?count){?item wdt:P31 wd:Q13442814} returns the result immediately. The first query materializes all the values of ?item, and if it is bound (not NULL in terms of SQL), it counts it. The second query just returns the precalculated value of triples. Ideal SPARQL engine could deduce that in the first query all selected variables are bound, but there are no ideal engines in general.
  2. CirrusSearch will be progressively slower for each next page and will eventually time out after 20 seconds. The rate of timeout depends on query complexity: regex queries might timeout on the first page, some of simple queries may not time out.
  3. CirrusSearch coexist with WDQS: mw:Help:CirrusSearch#Deepcategory uses SPARQL service, mw:Wikidata Query Service/User Manual/MWAPI allows using CirrusSearch in WDQS. And in cases, where you actually need materialization and query times out, you can use MWAPI, like here: Wikidata:SPARQL query service/query optimization#A query that has difficulties. Note that MWAPI is limited by 10000 rows. --Lockal (talk) 08:09, 14 September 2020 (UTC)[reply]

I'm not sure about it. located in or next to body of water (P206) or significant place (P7153) with a qualifier? Which qualifier to use? Here are a few examples:

--Stevenliuyi (talk) 03:26, 12 September 2020 (UTC)[reply]

located in or next to body of water (P206). It is a catch-all property for stuff under water, surrounded by water at the surface or close to water but actually not even in contact with it. And by 'water' I mean everything from lakes to rivers to seas. Thierry Caro (talk) 13:11, 12 September 2020 (UTC)[reply]
What about cause of destruction (P770)? --Jklamo (talk) 11:50, 13 September 2020 (UTC)[reply]

Hello, I would be interested to see a list, eg, of all the articles in English Wikipedia Category:Japan (or its eg Portuguese Wikipedia equivalent) and its subcategories (following the English Wikipedia Category:Contents>Articles>Main topic classifications>World>Countries>Countries by continent>Countries in Asia>Japan etc structure), all such articles, that lack an interlanguage link, supported via Wikidata, to an article on eg Japanese Wikipedia. This would make the task of identifying (and addressing) missing interlanguage links much easier. Many thanks, Maculosae tegmine lyncis (talk) 08:54, 13 September 2020 (UTC)[reply]

You can query for items with country (P17)Japan (Q17), an article on English Wikipedia but none on Japanese Wikipedia. Or you can use petscan to find Japanese people from English Wikipedia without an article on Japanese Wikipedia. The category w:Category:Japan contains too many pages, so it is likely easier to only query subcategories and find interlanguage links for these articles. --Pyfisch (talk) 09:54, 13 September 2020 (UTC)[reply]
I believe this type of query is possible with PetScan. Charles Matthews (talk) 09:53, 13 September 2020 (UTC)[reply]
For smaller Wikipedia categories you can make a SPARQL query to find all articles in the category and subcategories using the MWAPI service and deepcategory search, and then remove articles with language links to a certain language from the results. However w:Category:Japan have too many subcatagories (more than 256) to make a deep category search. --Dipsacus fullonum (talk) 11:00, 16 September 2020 (UTC)[reply]

Family name and disambiguation merges

User:Materialscientist (talkcontribslogs) seems to be systematically merging family names with disambiguation pages. Is this the way Wikidata is meant to be used, I thought there was a purpose to having separate items? Just see their user contributions. One example I stumbled upon is Kapanen (Q21491226)/Kapanen (Q1728350). --Kissa21782 (talk) 09:12, 13 September 2020 (UTC)[reply]

No, they should definitely be kept separate. All those edits need to be reverted. The user was already informed about this by Charles Matthews (talkcontribslogs) last year but didn't reply. Pyfisch (talk) 09:33, 13 September 2020 (UTC)[reply]
That's right. Some items that are said to be for disambiguation pages are actually for family names, and the instance of (P31) statement can be changed in those cases: but merges are a big negative and cannot necessarily be undone quickly (because of the items linking to the family name). Charles Matthews (talk) 09:51, 13 September 2020 (UTC)[reply]
I've left a message at Materialscientist's enwiki page to inform them of this discussion and a related discussion at Wikidata:Administrators' noticeboard#Vandalism bei User:Materialscientist. A note on their enwiki page says they have turned off ping notifications on their account, so they may not have seen the message in 2019; hopefully the user will now engage in discussion. From Hill To Shore (talk) 10:41, 13 September 2020 (UTC)[reply]
I've started to revert some of their merges but we now have a secondary problem. Because of the redirect, bots have gone through and switched the link to what is now the disambiguation item instead of the family name item. See https://www.wikidata.org/wiki/Special:WhatLinksHere/Q1233159 as an example. From Hill To Shore (talk) 11:04, 13 September 2020 (UTC)[reply]

Please discuss first and stop reverting. Some wikis, e.g. de.wiki, mark pages as disambiguation even if the pages clearly say they are lists of family names (and manual checking confirms that). As a result, wikidata editors automatically separate those pages, without checking. Materialscientist (talk) 11:11, 13 September 2020 (UTC)[reply]

Well, your merge of Dobbs (Q1233159) and Dobbs (Q56245396) was definitely wrong on two counts. 1. You merged the family name item into the disambiguation item when you wanted to clear the disambiguation; doing it the other way around would have stopped all the links from having to be redirected by bot (and now reversed). 2. The italian wiki page w:it:Dobbs is a disambiguation page that mentions a ferry and some otehr sort of item; this one should not have merged. From Hill To Shore (talk) 11:22, 13 September 2020 (UTC)[reply]
First, I am not perfect. Second, the ferry was listed as "see also", which does not make it a disambig (the ferry should be moved to the article body) - it is still a family name page. Third, I was advised to use a script for merging, which I do, so either fix or disable the script. Fourth, if you manually check Q1233159, you will see that the interwikis there are not disambigs. Materialscientist (talk) 11:29, 13 September 2020 (UTC)[reply]
I have checked. The Ukranian wiki is also not a family name page as it includes a link to Richard Dobbs Spaight (Q878563) with Dobbs as a given name and not a surname. The standard script for merging allows you to reverse the merge target with a click of the button. However, rather than using the merge script, I would advise you to use the "Move" gadget (in your preferences) to move individual links to the correct page. If you are able to read the language and confirm that it is linked to the wrong page then move that link. If you are unable to read the language, leave it for another editor to make the move. If all the links have gone, we can then decide how to handle the empty disambiguation page. From Hill To Shore (talk) 11:36, 13 September 2020 (UTC)[reply]

Just to add one information: Yes, de.wiki has a rather strict distinction between family name and disambig pages. That’s not some sort of faulty behavior as suggested here but rather a choice that makes a lot of sense. --Emu (talk) 11:43, 13 September 2020 (UTC)[reply]

I guess Dobbs in Richard Dobbs Spaight is a middle or surname name, not a given name (I would prefer professional opinion though). Materialscientist (talk) 11:48, 13 September 2020 (UTC)[reply]
  • Even if one or the other sitelinks are on the wrong item, this doesn't mean that items should be merged. Sitelinks can be moved between items.
I don't think it's a reason to move sitelinks if one has a different view of what a Wikipedia language version ought to do with their articles or pages.
Merging items for different types breaks internal and external links and generally leads to wrong descriptions in countless languages. This is clearly destructive.
For wikis that want sitelinks to any type of page, Wikidata offers a LUA module. --- Jura 11:56, 13 September 2020 (UTC)[reply]

Category:Ministers of a country

I have added information on what is in specific categories of Ministers of a country .. eg Health Ministers of Cameroon. This category is included in a category Government ministers of Cameroon. Does it make sense to define it like I did? Is it possible to query it? It could find people who are a minister but not in a sub category, it could find specific types of Ministers (a subcategory that is not defined as that specific ministers for that country).. Any suggestions, comments. Thanks, GerardM (talk) 11:40, 13 September 2020 (UTC)[reply]

YouTube Category ID

I noticed that we do not currently have a way to map Wikidata Entities to a YouTube Category ID. YouTube has expanded its Data API and now provides query by content type such as a Video Category[4].

It would be nice to hold mappings of the YouTube categories to Wikidata entities so that applications can be built more easily without having to reconcile (I.E. we apply human logic and reconcile the categories to Wikidata entities and store the mapping inside Wikidata). This is similar to some of the Schema.org work and other relational mapping projects we have done in the past. Mappings can be N:1 (many to one) from the titles that I see from the API.

Some example mappings that could be done with this new proposed property:

Wikidata entities: music (Q638) --> YouTube Category ID: 10 (title: music)

Wikidata entities: sport (Q349) --> YouTube Category ID: 17 (title: sports)

Wikidata entities: film (Q11424) animation (Q11425) --> YouTube Category ID: 1 (title: Film & Animation)

Wikidata entities: motor car (Q1420) vehicle (Q42889) --> YouTube Category ID: 2 (title: Autos & Vehicles)

Wikidata entities science and technology (Q34104) --> YouTube Category ID: 28 (title: Science & Technology)

For example, pet (Q39201) and animal (Q729) would both have this proposed new property YouTube Category ID: 15

   {
     "kind": "youtube#videoCategory",
     "etag": "ra8H7xyAfmE2FewsDabE3TUSq10",
     "id": "15",
     "snippet": {
       "title": "Pets & Animals",
       "assignable": true,
       "channelId": "UCBR8-60-B28hp2BmDPdntcQ"
     }
   },


There is support for regional language as well. This is the same ID 15 but returned in ES - Spanish

   {
     "kind": "youtube#videoCategory",
     "etag": "c2Mmk_FJb3mloyX5XIxQpJ4QFT0",
     "id": "15",
     "snippet": {
       "title": "Mascotas y animales",
       "assignable": true,
       "channelId": "UCBR8-60-B28hp2BmDPdntcQ"
     }
   },

And in DE - German

   {
     "kind": "youtube#videoCategory",
     "etag": "CMisNBqXGAfiHedBBZqtmUssOjc",
     "id": "15",
     "snippet": {
       "title": "Tiere",
       "assignable": true,
       "channelId": "UCBR8-60-B28hp2BmDPdntcQ"
     }
   },

Thoughts? --Thadguidry (talk) 16:16, 13 September 2020 (UTC)[reply]

not entirely sure what you're proposing. but a bot fetching all of this and storing it (as a string or mapped to a property) seems useful to me. are you suggesting we make an item for each of their codes? BrokenSegue (talk) 19:03, 13 September 2020 (UTC)[reply]
No, we have the items already. We need a single new property to map with the identifier code (like the 3 other YouTube properties we already have), I'd propose "YouTube Category ID". --Thadguidry (talk) 22:39, 13 September 2020 (UTC)[reply]
I'm not sure I like that implementation because (if I'm understanding you right) it won't make a link between the youtube channel and the concepts. Instead both will have a property mapping themselves to an integer/string. If instead we made properties for the various youtube categories we could link the channel to the category items and the category to the underlying concepts. We could maybe re-use has characteristic (P1552) for this but optimally we'd make a new property also. Thoughts? Do we do things like this elsewhere? I'm considering importing "music genre" information from a source and there might need to be a similar thing. BrokenSegue (talk) 23:11, 13 September 2020 (UTC)[reply]
You are misunderstanding the need here. It's not to make formatter links, but simply hold identifiers for programmatic purposes by others in the community. On YouTube's side, the Data API already has those links of channel ids to their category id. That is part of YouTube's existing internal process and channel content creators can create new categories. You might want to look at the links I provided above for more details. On Wikidata's side, our need is simply to map the various YouTube Category IDs to their semantic Wikidata Entity equivalents, so that application developers both in Wikidata community and elsewhere will not have to hold their own reconciled mapping of QIDs <--> YouTube Category IDs. Instead, the Wikidata community maintains the mapping inside Wikidata statements, for the whole community to enjoy and use, through a single new proposed property. Regarding Genres: that is entirely different and we already have entities for the various instances of music genre (Q188451) like jazz (Q8341) and a property to use genre (P136), ready for your needs. --Thadguidry (talk) 01:08, 14 September 2020 (UTC)[reply]
Hmm, perhaps I just disagree on the scope of the need. Yes, we should map their IDs to QIDs but the mapping should be done in an item. So we will have an item representing, say, id 15 with all the names of the id in different languages and with linked concepts. Then we can scrape the API and tag wikidata yt video items with those entities. This seems like a better way of representing the data more generally. I bring up the genre issue as how do we map spotify's "genre id" cleanly to one of our genre QIDs. It's a similar problem solved in a similar way. Imagine solving that the same way you are proposing solving the yt category id problem. BrokenSegue (talk) 01:56, 14 September 2020 (UTC)[reply]
We don't need or want to replicate Other services entities as Wikidata entities. (your suggestion of making a Wikidata item "15" that is constrained to API endpoint URL (P6269) "YouTube"). Instead, we link to them from existing Wikidata items that have the same semantic meaning (this helps in other areas such as Latent semantic analysis (Q1806883) and pragmatics (Q181839)). That's how Wikidata and Linked Data best practices work[5] . --Thadguidry (talk) 21:06, 16 September 2020 (UTC)[reply]

One abuse filter

Please, move Abuse filter №123 to special user right for group Users which is necessary to edit User name space (see also: Special:ListGroupRights#Namespace restrictions). 217.117.125.72 18:08, 13 September 2020 (UTC)[reply]

Wikidata weekly summary #433

Proposed change of procedures for CheckUser requests

Hello all. Coming to think of it, I think we need a better way of doing followup requests for users. Right now, reporting further socks after the initial report takes the form of additional comments on the original request. This looks unwieldy, especially for long-term sockpuppeteers. Thus, I propose that all new requests be filed using the (n) nomenclature, e.g. the second request for John Doe would be Wikidata:Requests for checkuser/Case/John Doe (2). This will provide better organization and will also provide a measure of a user's sockpuppetry (one could then say "this user has had seven CheckUser cases opened" or something like that). I'm open to alternatives too.

Separately, we've gotten a lot more requests than I expected. I think we need to make the archiving more aggressive; having new pages for new cases would make it easier for us to tell what cases need to be fulfilled, and fulfilled requests can be archived immediately without fear of having to re-transclude for a new case.--Jasper Deng (talk) 19:22, 14 September 2020 (UTC)[reply]

How do you feel about 'threads gets archived after 7 days without activity'? @Jasper Deng: --Trade (talk) 07:29, 15 September 2020 (UTC)[reply]
So, why use subpages at all?--GZWDer (talk) 10:30, 15 September 2020 (UTC)[reply]
@Trade: That seems like a sensible archiving scheme. @GZWDer: Not using subpages would be even less organized, and importantly, all the edit histories for different cases would be on the same page, making it hard to keep track of any individual case.--Jasper Deng (talk) 16:16, 15 September 2020 (UTC)[reply]

Temporary and acting

I have come across a problem with modelling the military ranks of Bertram Francis Eardley Keeling (Q96084085). He was a British engineer who joined the army at the outbreak of the first world war and was given a temporary rank. During the war he was given promotions to his temporary ranks. Overlapping his temporary ranks, he was also given acting ranks. For example, in April 1917 he was a temporary Captain but was given an acting rank of Major.[6] In February 1918 he was promoted to temporary Major, with the implication that his acting rank ceased.[7] I am not sure how to model this in the system; so far I have used acting (Q4676846) for these transitory ranks on other human items but here we have a case of someone holding two different transitory ranks at the same time. Any suggestions on how to model this? I am guessing that I will need to have separate items for "acting" and "temporary" but I'm not sure how to express them both as distinct concepts that won't cause confusion elsewhere. Perhaps I should start with a focused concept and create a new item with label "Temporary rank" and description "status of a rank in the British armed forces" with a said to be the same as (P460) to acting (Q4676846). Future editors can then expand the concept of that item, if they feel it is appropriate. From Hill To Shore (talk) 21:39, 14 September 2020 (UTC)[reply]

category description merge conflicts

What's the best way to deal with the subj?

I tried to use Special:MergeItems to merge Category:Gases (Q9713877) into Category:Gases (Q7215014) and was not able to do so, because both contain some generic descriptions ("Wikimedia category" etc) that for some reason differ for a couple of languages (e.g. zh: "维基媒体项目分类" in Category:Gases (Q9713877) vs "维基媒体分类" in Category:Gases (Q7215014)). I tried to clear conflicting descriptions in Category:Gases (Q9713877) first, but quickly abandoned that idea for a number of reasons:

  • There are multiple collisions that need to be resolved, and Special:MergeItems is only showing one. Sure, I can write some clever UNIX shell one-liner to find them all in an automated fashion¹, but not every future Wikidata user would be comfortable with solving this problem by copy-pasting some strange text from the Internet into their terminal emulators.
  • I simply have no idea what description should be preferred for each language. What if I should clear some descriptions in Category:Gases (Q7215014) instead? Or discard both and put in something else?
    • And why would I even want to spend my time on cherry-picking automatically inserted descriptions?
  • And even if it's OK to just clear out all descriptions from a single page, it's still a lot of (keyboard, mouse) button presses to do.
    • Also, rate limits. After clearing just all zh-* descriptions using regular interface² I got, after some delay, a message about abuse and waiting and whatnot. Amused, I went to the history and found a lot of individual change entries, one for each language. (Sure, not everyone does these kinds of changes without bots; still surprising.)

In this particular case I'm tempted to follow manual merge instructions and discard descriptions for Category:Gases (Q9713877) altogether. This, however, is a pretty simple case (just 1 label and 1 WP link to move), so I think it's still worth documenting these obstacles here.

¹ in fact, here is one:

$ q() { curl -s "https://www.wikidata.org/wiki/Special:EntityData/$1.json" | jq -S '.entities[].descriptions'; };  diff -u <(q Q7215014) <(q Q9713877)
--- /dev/fd/63	2020-09-15 04:17:40.312486792 +0300
+++ /dev/fd/62	2020-09-15 04:17:40.312486792 +0300
@@ -37,7 +37,7 @@
   },
   "be-tarask": {
     "language": "be-tarask",
-    "value": "катэгорыя ў праекце Вікімэдыя"
+    "value": "катэгорыя Вікімэдыя"
   },
   "bg": {
     "language": "bg",
@@ -61,7 +61,7 @@
   },
   "bs": {
     "language": "bs",
-    "value": "Kategorija Wikipedije"
+    "value": "kategorija na Wikimediji"
   },
   "bug": {
     "language": "bug",
@@ -161,7 +161,7 @@
   },
   "gl": {
     "language": "gl",
-    "value": "categoría de Wikipedia"
+    "value": "categoría de Wikimedia"
   },
   "gn": {
     "language": "gn",
(…etc etc)

² That change has since been reverted

--46.151.157.21 02:06, 15 September 2020 (UTC)[reply]

  • Merging should be done with the merge gadget (see Help:Merge). --- Jura 08:49, 15 September 2020 (UTC)[reply]
    • Well, it did the job, although I'm not quite happy about it. There's no (obvious) way to activate it without logging in (I don't visit Wikidata frequently enough to not be bothered about doing so), and its interface is confusing (what way would it merge if "always merge into the older entity" checkbox is deselected? Unlike widget, Special:MergeItems is clear about the direction). Also, did it just disregard all description conflicts? If so, is it some special case for instance of (P31) Wikimedia category (Q4167836) or something like that? If not, that's quite dangerous instrument, and I'd rather avoid it in the future. --MetaWat (talk) 16:35, 15 September 2020 (UTC)[reply]
      • There is some discussion with developers about what should be done automatically and what shouldn't. The result is that the gadget now does what I personally would have expected from the special pages.
In any case, I think at some point a group of users want to block people who aren't logged in and/or autoconfirmed from merging items. --- Jura 16:41, 15 September 2020 (UTC)[reply]
It seems to me that logging in is less bother than performing a merge manually, writing shell scripts, etc. Ghouston (talk) 02:08, 16 September 2020 (UTC)[reply]

Merge request

Marge doesn't work for me, so could somebody merge Allium siculum (Q2291021) and Allium siculum (Q12292911)? Thanks, Abductive (talk) 03:05, 15 September 2020 (UTC)[reply]

✓ Done @Abductive: merging done without problems--Estopedist1 (talk) 05:16, 15 September 2020 (UTC)[reply]
Thanks. Did I type "Marge"? Yeah, she doesn't work for me... Abductive (talk) 05:19, 15 September 2020 (UTC)[reply]
@Estopedist1, Abductive: This merge was apparently in error. Allium siculum (Q12292911) was originally Nectaroscordum siculum bulgaricum before an IP changed the English label to Allium siculum. On Wikidata, we generally have separate items for every synonym of a taxon, as those multiple names generally have unique external identifiers, as well as distinct histories, and may be considered valid by different authorities. See GBIF values for Nectaroscordum siculum subsp. bulgaricum vs. Allium siculum subsp. dioscoridis vs. Allium dioscoridis. Properties like taxon synonym (P1420) and basionym (P566) can connect various names. In truth it's quite a mess, with some thinking the items for names should only refer to the name alone, while biological traits like distribution, mass, litter size, etc. should be split into new "organismal" items, regardless of how many names it has, for ontological purity (but utter chaos for mere mortals). See a previous discussion, What heart rate does your name have?. -Animalparty (talk) 18:31, 15 September 2020 (UTC)[reply]
@Abductive, Animalparty: strange that your pinging doesn't work. To the topic: yes, maybe should be reverted. Principle that "every synonym means distinct Wikidata entry" is clear, but unfortunately I guess the chaos should be the result. I even don't imagine how many synonyms + combinations for taxons could be. Maybe over 1,000,000,000. Some taxons have over 30 synonyms. I guess rational solution is that one taxon with its numberous synonyms+combinations together (like Wikispecies already does). But I am not sure ... --Estopedist1 (talk) 05:33, 16 September 2020 (UTC)[reply]
What matters is that all the wikis are on one or the other. Abductive (talk) 05:35, 16 September 2020 (UTC)[reply]

Wikipedia in described at URL (P973) ?

Somehow I thought statements like [8] should be avoided. There are now 2000+ pointing to dewiki. @Anvilaquarius:. I think @GZWDer: once made a property proposal for this type of section link. --- Jura 11:14, 15 September 2020 (UTC)[reply]

This is useful because it opens up Interwiki links in cases where one Wiki has a page for a topic and another Wiki handles the topic as a subpage. Policy-wise, I think we should remove the existing links to dewiki and add a note on described at URL (P973) that it's not to be used for linking to Wikimedia projects. ChristianKl12:06, 15 September 2020 (UTC)[reply]
Yes, but only if there's some sort of sitelink to replace the P973 link. Otherwise, removing these links would be mere vandalism. --Anvilaquarius (talk) 15:26, 15 September 2020 (UTC)[reply]
Cleaning up after incorrect uses is never vandalism. Please refrain from such qualifications. How did you get to use this property? --- Jura 15:34, 15 September 2020 (UTC)[reply]
Doing it automatically would likely need a dewiki bot approval. Those seem to be generally hard to get, but there are no principled reason why it can't be done. Given that it's not straightforward I do favor removing the incorrect uses. ChristianKl10:20, 16 September 2020 (UTC)[reply]
Even if one could get bot approval, it might be tricky to get the redirects properly set up (depending on how rigorous the wiki is the title to use).
Personally, i'd favor an approach like Wikidata:Property proposal/described in Wikimedia article. We could easily migrate all there. --- Jura 10:57, 16 September 2020 (UTC)[reply]
Many users on the German Wikipedia are unfortunately heavily opposed to redirects with disambiguation parentheses, arguing that they clutter up the search results and cause other problems. So they invariably will get listed on maintenance lists and then deleted sooner or later. --Kam Solusar (talk) 00:50, 17 September 2020 (UTC)[reply]
« Somehow I thought statements like should be avoided. » →‎ I cleary think that statements like this are usefull. Visite fortuitement prolongée (talk) 18:48, 15 September 2020 (UTC)[reply]
If statement are noted that way instead of being noted with sitelinks then Interwiki links don't work. If OpenStreetMap for example would want to display an link to the German Wikipedia for Galeriegebäude Herrenhausen (Q98809154) they can when the proper links are used but they can't when described at URL (P973) is used. ChristianKl10:00, 16 September 2020 (UTC)[reply]

Can P31 properties have a preffered rank?

I noticed several cities, notably Cairo (Q85), Montevideo (Q1335), Copenhagen (Q1748) and Lima (Q2868) don't show up as an instance of (P31) of city (Q515). Turns out, are listed as cities, but the capital city (Q5119) value has the preffered rank set, so it supersedes all other values. Is this correct? I changed it for other cities, but these are semi-protected.

Svízel přítula (talk) 21:01, 15 September 2020 (UTC)[reply]

I suppose preferred ranks would be needed if there were instance of (P31) statements that were outdated and no longer valid, but then all the currently valid statements should be preferred. I don't see any reason to give preferred rank in a case like capital vs city. Ghouston (talk) 01:52, 16 September 2020 (UTC)[reply]
An example of outdated-vs.-permanent P31 values is Dadra and Nagar Haveli district (Q46107). But the historical value is not marked as deprecated (nor the current value preferred). —Scs (talk) 11:53, 16 September 2020 (UTC)[reply]
When we use rank, we don't deprecate historical values but generally qualify them with end time (P582) and then set the current value as preferred. There are plenty of cases like Dadra and Nagar Haveli district (Q46107) where it would make sense to use ranks but we currently don't as nobody did the task to set the rank. ChristianKl12:01, 16 September 2020 (UTC)[reply]
With Dadra and Nagar Haveli district (Q46107), it may be better split into separate items, given significant changes in status. It also has a dissolved, abolished or demolished date (P576) which means that "current value" has no meaning. Ghouston (talk) 12:29, 16 September 2020 (UTC)[reply]
I'd also prefer using the dedicated property capital of (P1376) to indicate that relationship instead of instance of (P31). Ghouston (talk) 01:54, 16 September 2020 (UTC)[reply]
I agree. The property is better to communicate the information. ChristianKl12:01, 16 September 2020 (UTC)[reply]
As an aside, remember that straight P31 relationships are not generally a reliable way to query for is-a relationships, anyway. For example, Boston (Q100) isn't (directly) a city, either, and the reason has nothing to do with ranks. wdt:P31/wdt:P279* is your friend! —Scs (talk) 12:09, 16 September 2020 (UTC)[reply]
True, but that also won't work here, since capital city (Q5119) is not a sublass of city (Q515). I also query for "is a capital" using capital of (P1376) and "is a location" using coordinate location (P625), as that's faster. Svízel přítula (talk) 15:39, 16 September 2020 (UTC)[reply]

"Meaning overlaps" relation

Question for the ontologists: Getty AAT has a relation property "meaning/usage overlaps with". (See example at liqueur glasses.) Is there a mapping relation equivalent we can use in Wikidata? I have been tempted to use both "said to be the same as" and "different from" but I am not sure that really conveys the proper message. - PKM (talk) 22:17, 15 September 2020 (UTC)[reply]

Yes, it's partially coincident with (P1382). Ghouston (talk) 01:48, 16 September 2020 (UTC)[reply]
partially coincident with (P1382) equivalent property (P1628) http://purl.obolibrary.org/obo/RO_0002008 appears to be about spatial extent, not conceptual. Pelagic (talk) 00:43, 19 September 2020 (UTC)[reply]
For that we have territory overlaps (P3179). If identical, there is coextensive with (P3403). --- Jura 05:15, 19 September 2020 (UTC)[reply]

I think they are the duplicates of each other and should be combined.Smiley.toerist (talk) 08:37, 16 September 2020 (UTC)[reply]

  • Given that EnWiki has separate pages for each of them, they aren't doublicates for Wikidata. It might be necessary to clarify either item to make it more clear how they differ but they aren't duplicates. ChristianKl09:25, 16 September 2020 (UTC)[reply]
The main difference seem to be that a motorcoach can pull other trailers and can function as a locomotive. Railcars are lighter. The definitions are not air-tight, as some railcars can pull exceptionaly pul a trailer. The distinction between a articulated vehicle and a multiple unit is also not clear.Smiley.toerist (talk) 11:55, 16 September 2020 (UTC)[reply]
I think a lot of 'railcars' are wrongly classified as 'motor coaches'. Motor coaches are used in combination of several other rail vehicles, not singly. Railcars are easily identifiable by having a driving post at both ends of the vehicle, as they have to be driven in both directions as a single car. They can often be coupled to other 'railcars' to from a train or pull a trailer. Mostly one, as railcars are motorised for only a single vehicle. Only vehicles powered by internal combustion engins can be considered as a railcar. A counter example is File:BT BDe 3-4 43.jpg where motor coach is used as a locomotive. Diesel powered multiple units quite often have a 'motor coach' for the engine power. (File:01.08.92 Liefkenshoek 4006 (5804300936).jpg). These motor coaches never have a separate article, but are part of an multiple unit train type.Smiley.toerist (talk) 09:49, 17 September 2020 (UTC)[reply]
PS:The example used in eo:Relaŭto is articulated and should be considered as multiple unit, not as a railcar. As I dont know the language I cannot check if the text is correct.Smiley.toerist (talk) 09:58, 17 September 2020 (UTC)[reply]
I have now transferred from 'motor coaches' to 'railcars' for the the following languages: cs, da and pt. De definitions vary widely. 'motor coaches' ave mostly electric examples. in many languages there is no distinction between electric or combustion engine energie source.Smiley.toerist (talk) 11:32, 17 September 2020 (UTC)[reply]
  • It would be helpful if non-specialists could understand from both statements on the items and (English) descriptions the difference between the two items. --- Jura 09:34, 20 September 2020 (UTC)[reply]

Significant Reasonator bug

Where is the best place to report problems with Reasonator? I ask because a number of other tools, e.g. Mix'n'match, use the Reasonator-generated text summaries, so problems with the summaries are significant. I just noticed that Reasonator does not disregard deprecated statements, and if for example you have several birth date statements for a person, it will put the first one into the summary even if it is deprecated (example). — Levana Taylor (talk) 14:40, 16 September 2020 (UTC)[reply]

Map publisher Wagner & Debes

Is there no data-item for the map publisher? see Commons:Category:Wagner & Debes.Smiley.toerist (talk) 14:56, 16 September 2020 (UTC)[reply]

There is one now. :-) Wagner & Debes (Q99398846). - PKM (talk) 22:00, 16 September 2020 (UTC)[reply]
Thanks I now use the item in structured data.Smiley.toerist (talk) 09:29, 17 September 2020 (UTC)[reply]

Constraint violation

I have just added Gasser (Q99371476)family name (P734)Gasser (Q21506814) but there is a constraint violation since the Q-item Gasser (Q99371476) doesnt have a first name. From family name (P734) this seems to be the right property to link lineage / noble families to the "common name" (see here_ but obviously Gasser (Q99371476) doesnt have a first name. Can we somehow change the constraint so that it does not apply to instances of family (Q8436) ? Best --Hannes Röst (talk) 15:50, 16 September 2020 (UTC)[reply]

Country property for boxers

Hi. Is there any property to specify that a boxer is fighting under the flag of X country? --ԱշոտՏՆՂ (talk) 16:06, 16 September 2020 (UTC)[reply]

Qualifier for type of postal address

Hello all. I am wondering if there is currently a way to qualify a street address to indicate the type of address it is? For entities at the University of Washington that we are creating items for, there are frequently multiple addresses, one for the physical address and one for the mailing address. Today I had a department that gives a mailing address and an address for deliveries on its website. I would like to indicate that one address is for mail and the other is for deliveries: see https://familymedicine.uw.edu/about/contact/ Here's another example where there is a physical address and a mailing address: https://globalhealth.washington.edu/contact

Thanks for any advice you could give. --Adam Schiff (talk) 19:09, 16 September 2020 (UTC)[reply]

I can't see any sign that anyone has tried to do that before. I think the qualifier object has role (P3831) would be suitable, if you created new Q items for the address types. address (Q319608), street address (Q24574749), and post office box (Q1162282) do exist. Ghouston (talk) 23:51, 16 September 2020 (UTC)[reply]

Does anyone know where I can edit the translation of the sidebar? I have been looking in Special:Translate and Translatewiki but couldn't find it. Right now the translation for Indonesian is inconsistent with the word "item" being translated both into butir and item at the same time. RXerself (talk) 00:04, 17 September 2020 (UTC)[reply]

@RXerself: Add ?uselang=qqx will show where the messages are.--GZWDer (talk) 00:14, 17 September 2020 (UTC)[reply]
Yes, I have been looking in "Translate to Bahasa Indonesia" but couldn't find the terms from the sidebar. RXerself (talk) 00:43, 17 September 2020 (UTC)[reply]
@RXerself: Maybe https://www.wikidata.org/wiki/Special:AllMessages?prefix=wikibase&filter=all&lang=id&limit=50 is a good start? It has links to translatewiki where this type of update should be done. --- Jura 10:52, 17 September 2020 (UTC)[reply]
I searched there for the term "sembarang" and there was no match. :( I'm gonna also ask in the Indonesian Wikipedia community whether anyone there know the place since I'm sure on of the users there translated it first years ago. RXerself (talk) 11:46, 18 September 2020 (UTC)[reply]
Hi, RXerself! Are you talking about “Item sembarang” from d:mediawiki:Randompage/id versus “Buat butir baru” from d:mediawiki:Special-newitem/id? (I’m not an interface guru, I just remember the information about where these strings are stored did arise during a discussion on w:en.) Pelagic (talk) 02:54, 19 September 2020 (UTC)[reply]
Yes! There are also some other links that still uses "item" instead of "butir". RXerself (talk) 08:07, 19 September 2020 (UTC)[reply]

Tools for faster wikidata entry?

I'm fairly new to Wikidata and SPARQL. Just wondering if there are any tools that speed up the process of adding statements to wikidata. Ideally, I don't want to be doing data entry one by one.

For example: I want to add a statement of 'part of' with value of 'Sengoku period' to a list of 1000 items. Is this possible?

I've heard of Quick Statements but wondering if there are other tools.

Cheers  – The preceding unsigned comment was added by 2600:1700:3520:1550:481a:6319:9fda:aa25 (talk • contribs) at 08:32, 17 September 2020‎ (UTC).[reply]

The way I do things like your example is to use a SPARQL query to create the list of 1000 QIDs, use a text-editor to mass-transform them into the desired Quickstatements command, and run that through Quickstatements. — Levana Taylor (talk) 17:21, 17 September 2020 (UTC)[reply]
Yes. Thank You. Thats what I ended up doing. Not ideal but it sort of works. The trick is trying to get lucky in finding a common statement for all the ID's I want to add new statements to. I guess this can be done with a some general query and then filter the query down.  – The preceding unsigned comment was added by Nonoumasy (talk • contribs).
Check out Help:Navigating Wikidata/en#Searching with statements. The "haswbstatement" search in some cases is more appropriate than SPARQL, and certainly easier to use! — Levana Taylor (talk) 23:23, 17 September 2020 (UTC)[reply]

Hello @Nonoumasy, Levana Taylor: another tool besides QuickStatemens to add statements is PetScan. (PetScan#Add/remove_statements_for_Wikidata_items) This tool can filter for example based on categories of the article, age of the article, a manual list of articles, a sparql query, and so on. In the small box in the right down corner you can a list of statements (format Pxxx:Qyyy) to execute on the selected entries. --M2k~dewiki (talk) 23:32, 17 September 2020 (UTC)[reply]

Thanks @Levana Taylor, Nonoumasy:

Model item for criminals

What do you others think about this modeling? Too many qualifiers in 'significant events'? Paul Moore (Q99343501)--Trade (talk) 07:37, 17 September 2020 (UTC)[reply]

  • I see no reason to have a significant event for the conviction as that's already covered by convicted of (P1399).
place of detention (P2632) is currently not used and if it would be used with start time (P580) you wouldn't need imprisonment as significant event.
charge (P1595) is currently not used.
Using occupation (P106) for something that a person did once in their lives feels wrong. The person likely has an occupation in which he spend much more time then in either of the three currently listed. ChristianKl08:26, 17 September 2020 (UTC)[reply]
So, when am i allowed to use murderer as an occupation?--Trade (talk) 14:02, 17 September 2020 (UTC)[reply]

Section for lexicographical data in the sidebar

It's very common for new editors to mistakenly create Lexemes instead of Items. We need an introductory explanation on Special:NewLexeme (MediaWiki:Wikibase-newlexeme-summary) similar to MediaWiki:Wikibase-newitem-summary but, apart from this, right now both options ("Create a new Item" and "Create a new Lexeme") appear one below the other in the same section of the left navigation bar. Would you like to have a specific section "Lexicographical data" with the option "Create a new Lexeme" and possibly a link to recent changes focused on the Lexeme namespace? --abián 10:10, 17 September 2020 (UTC)[reply]

✓ Done. --abián 17:45, 18 September 2020 (UTC)[reply]

There is a bit of a mess here. Most of wikis currently located at the child (anecdote joke (Q2374151)) seem to belong to the parent (anecdote (Q193206)), property topic's main category (P910) clashes, etc. Please help to assign inter-wikis correctly. Cheers, Henry Merrivale (talk) 10:46, 17 September 2020 (UTC).[reply]

  • It seems these wiki articles actually form quite a spectrum of descriptions: short accounts that are not at all necessarily humorous (en:Anecdote, simple:Anecdote, it:Aneddoto); short, usually humorous story (fi:Anekdootti, de:Anekdote); short, (necessarily) humorous story, but without any national specifics (ru:Анекдот, be_x_old:Анэкдот, be:Анекдот, sk:Anekdota_(zábavný_príbeh), bg:Анекдот, mk:Анегдота); short, humorous story, just like the previous one, but in the context of Russian and Soviet culture (en:Russian_jokes, tl:Mga_birong_Ruso). This probably needs more wikidata items, but I'm not sure how much. --MetaWat (talk) 19:31, 17 September 2020 (UTC)[reply]

Odd infobox result

Any idea why the Wikidata infobox at C:Category:Fires in the 1900s gets the particular image it does (an early airplane hanging from the ceiling of a museum?) - Jmabel (talk) 15:59, 17 September 2020 (UTC)[reply]

It's because that was the image for 1900s (Q36574). Anyone know why the category infobox uses the decade image? — Levana Taylor (talk) 17:40, 17 September 2020 (UTC)[reply]
It's because the WIkidata Infobox tries to shove as much info as possible in front of peoples' faces with litter regards to relevance. Any intersecting categories have especially redundant or irrelevant data barf. -Animalparty (talk) 17:50, 17 September 2020 (UTC)[reply]
…well, that sounds like a brilliant plan… - Jmabel (talk) 23:21, 17 September 2020 (UTC)[reply]

Modeling place of first and last match for a sportsperson

Hello! I'm trying to model the date and place for the professional debut of a sportsperson and also the date and place of the last match. I find it specially difficult: Alexis Apraiz (Q12253165) is a way to do it, but I'm not sure if it is the best one. Another option is at Aimar Olaizola (Q3752396), but I can't find a way to show the last location. What do you think? -Theklan (talk) 16:52, 17 September 2020 (UTC)[reply]

Merge two objects

Q2386476 and Q86966830 are the same object. I have copied the apporpriate information so that Q2386476 now is the correct one. I would like that someone merged these so that Q2386476 is the only one remaining. Thanks in advance! Pjotr'k (talk) 18:07, 17 September 2020 (UTC)[reply]

One item is for a mountain and the other is for a protected area. Do the boundaries of the mountain perfectly align with the boundaries of the protected area? If not, they should be kept as separate items. There may also be other reasons that I haven't thought of for why they should be separate. From Hill To Shore (talk) 18:19, 17 September 2020 (UTC)[reply]
The protected area is for the mountain. Pjotr'k (talk) 19:35, 17 September 2020 (UTC)[reply]
Usually a protected area has specific borders, and a mountain does not. - Jmabel (talk) 23:23, 17 September 2020 (UTC)[reply]
A inception (P571) statement would make it obvious whether an item is for the mountain or the protected area. Protected areas and mountains are quite different things. Ghouston (talk) 06:22, 18 September 2020 (UTC)[reply]
Thank you all for the input. I withdraw my request and have also withdrawn my edits in the objects in question. Pjotr'k (talk) 13:20, 18 September 2020 (UTC)[reply]

Discography of American Historical Recordings database at the Library of Congress

The Discography of American Historical Recordings (Q42800691) is a database curated by University of California, Santa Barbara Library. It contains historical recording artists. We have Property:DAHR artist ID as an identifier for humans already in Wikibase. I have been discussing loading the entire DAHR artist database into Wikibase with the person that is currently maintaining it. How do we get started, where do I propose it, how do we go about it? I am assuming like SCOPUS and The Peerage, the description will just be "DAHR artist ID=XXXX". You can go to Property:DAHR artist ID and click on an entry to see the contents. Mix-and-Match will be easy since the DAHR database contains LCCN numbers. --RAN (talk) 00:14, 18 September 2020 (UTC)[reply]

Great project! I'm curious about how you plan to deal with the peculiarity that DAHR assigns multiple artist IDs based on roles? ("as composer", "as arranger", "as conductor", etc.) Moebeus (talk) 00:54, 19 September 2020 (UTC)[reply]
They realized that was not the way to go, and assigned new IDs. The new IDs have combined the disparate roles. I was curious why the old IDs were being deleted here at Wikidata, and am in contact with the point person for the new project. They like our linkage because they can import the images we house at Commons into their database. We imported the entire Library of Congress image database and I have been working on linking the images to Wikidata. The project to identify the people named in the photos is happening at Flickr Commons with 50 new images each week. The last batch added is here. The Bain collection is heavy with opera singers. --RAN (talk) 21:19, 19 September 2020 (UTC)[reply]

Possible duplicated information

See Q3132861#P1343 and Q3132861#P4823. Now we have three ways to link an item to an ANB article (using described by source (P1343)=American National Biography (Q465854); using American National Biography ID (P4823); creating an item for the specific ANB article). In my opinion the second is useful (convienent for query); the third can provide some meta-informtion about ANB articles, but what is the proper way to link such items from the subject item. @Gamaliel:.--GZWDer (talk) 04:04, 18 September 2020 (UTC)[reply]

Yes, I think when we have an identifier statement like American National Biography ID (P4823), then described by source (P1343) is redundant and should be omitted -- keep described by source (P1343) for sources without an identifier property. The statement is subject of (P805) qualifier on the American National Biography ID (P4823) statement for the item specifically representing this article is a nice touch, and IMO a good way to record this linkage. Jheald (talk) 20:28, 18 September 2020 (UTC)[reply]
I agree, generally when we add a new identifier we delete the now redundant way of linking. We remove "described at url" and "described by source". --RAN (talk) 21:35, 19 September 2020 (UTC)[reply]

I'm trying to understand the difference between the above three way of classifying collections of things. My current understanding is as follows:

However sets are class (Q5127848), so should part of (P361) be a subproperty of instance of (P31)? Or should we say elements are instance of (P31) sets? Should groups be instance of (P31) class (Q5127848)? At the moment metaclass (Q19478619) subclass of (P279) class (Q5127848) of class (Q16889133). Should it instead be metaclass (Q19478619) subclass of (P279) class (Q16889133) of class (Q5127848) so set (Q36161) can be instance of (P31) metaclass (Q19478619)? (edited to include class (Q16889133)) --Cdo256 (talk) 06:26, 18 September 2020 (UTC)[reply]


Maybe looking at the (English language) descriptions (in addition to the statements on the these items) can help:
  • set (Q36161): well-defined mathematical collection of distinct objects
  • group (Q16887380): well-defined, enumerable collection of discrete entities that form a collective whole
  • class (Q5127848): group of things derived from extensional or intensional definition (philosophy)
--- Jura 06:33, 18 September 2020 (UTC)[reply]
I got confused between class (Q16889133), and class (Q5127848), but I meant class (Q16889133).
The en description of class (Q16889133): collection of items defined by common characteristics is very terse and doesn't distinguish itself well from group (Q16887380): well-defined, enumerable collection of discrete entities that form a collective whole.
I guess what I really want is examples of:
--Cdo256 (talk) 06:57, 18 September 2020 (UTC)[reply]


Cdo256, these are really good questions; I've been considering some of the same items and how to clarify the distinctions between them. So, what are the differences between set (Q36161), group (Q16887380), class (Q16889133), and class (Q5127848) and when should or shouldn't each be used? (I will also include class (Q217594) in my reponse).
  • Based on the linked Wikipedia page, class (Q16889133) is specifically a class in a "knowledge represenation" (i.e. an ontology). Thus Q16889133 is a good item to use when talking about Wikidata classes. You can be more specific, however, by using metaclass (Q19361238) for classes of classes and first-order class (Q21522908) for classes of instances.
  • The item class (Q5127848) refers to the philosophical concept of a "class," and is broader than class (Q16889133). A class has to actually be included in an ontology to be an instance of class (Q16889133) (which presumably means its insances have some shared characteristic) but every collection of things is a class (Q5127848), so the set "{1, apple, every left shoe}" is a class (Q5127848) but not a class (Q5127848).
  • The Wikidata items class (Q217594) and set (Q36161) are types of mathematical objects that should be used only within the scope of mathematics. There are several formal (not-quite-equivalent) definitions of "class" and "set," but for the sake of understanding the difference between a set and a class, a "set" is defined such that it generally matches your intuition for a colleciton of items except that that certain collections are prohibited. You cannot, for example, have a set that contains all sets that don't contain themselves (otherwise you break mathematics!). A "class" is then either a set or a collection of sets that is not itself a set. (An example of class that is not a set is the class that contains all sets.) See Wikipedia:Class (set theory) for details.
  • To reverse engineer a definition for group (Q16887380), I translated all the descriptions to English. A summary of common descriptions are "entities with similar characteristics," "set of things or people," "group of living things," "two or more objects," or "entities with similar characteristics and coexistence." (My favorite Google translation, however, is: "what is and what is what is" 😂) My assesment, then, is that a "group" is an exaustive collection of two or more concurrent physical things (real or fictional) that have an mutual association or defining charactersitic. So examples of groups are: The Beatles (Q1299), the stars in our galaxy, and Bonnie and Clyde (Q219937). Examples that are not groups are John Lennon (only one item), the set {John Lennon and Paul McCartney} (not exhaustive), the set of real numbers (not physical), presidents of the United States (not concurrent), and the set { New York City (Q60) and Julius Caesar (Q1048) } (no association). So every group is a class (Q16889133), but not every class (Q16889133) is a group.
I'll leave the question of the difference between part of (P361) and instance of (P31) for another time and/or person :)
The-erinaceous-one (talk) 10:53, 18 September 2020 (UTC) (please ping me in responses)[reply]
The-erinaceous-one, thank you for the very detailed response. That's cleared up my main confusion. I'll have to have a think about this for a few days for it to sink in properly. --Cdo256 (talk) 02:52, 19 September 2020 (UTC)[reply]
  • More generally, Wikidata doesn't work that well for general abstract concepts with several definitions for the same/similar names. This can be because the Wikipedia articles tend to combine them, because various Wikipedia versions combine them differently or because different Wikidata contributors try to insert these elements in various ways into the P279-tree. It can be solved, but generally requires creating new well-defined and described items and having some sitelinks on items for Wikimedia page relating two or more distinct concepts (Q37152856). Individual instances are generally better maintained (and easier to maintain). --- Jura 08:45, 19 September 2020 (UTC)[reply]

--- Jura 08:45, 19 September 2020 (UTC)[reply]

Cdo256, no problem! I appreciated the nudge to start sorting out our various types of collections on Wikidata.
My assesment of group (Q16887380) was a bit off, however: it matched the descriptions, but looking at all the direct subgroups [10], we find that groups also can contain events and abstract objects, and there are subclasses set of 0 (Q39604693) and monad (Q39604065) which have fewer than 2 items by definition. This seem to be the result of inconsisent modeling, however, and I am tryig to fix it.
In addition to the items mention above, I've discovered that there's also class (Q28813620): collection of items defined by common characteristics, which is not clearly defined at all. So this all goes to say that there's a lot of muddled modeling when it comes to collections on Wikidata. In order to organize all the information about these various types of collections, I've made a page where I'll be trying to sort this all out: User:The-erinaceous-one/types of collections. I would welcome any contributions!
The Erinaceous One 🦔 09:50, 19 September 2020 (UTC)[reply]

Notification of global ban proposal who were active on this wiki

This is a notification of global ban discussion per the global ban policy.

これは、グローバル禁止ポリシーに基づくグローバル禁止議論に関する通知です。

Regards, smb99thx email 09:27, 18 September 2020 (UTC)[reply]

This is copied from WD:AN. smb99thx email 09:27, 18 September 2020 (UTC)[reply]

Q47012826 and Q5471247

Are Fort Green (Q47012826) and Fort Green (Q5471247) the same? --RAN (talk) 13:04, 18 September 2020 (UTC)[reply]

They have separate enwiki pages, so a merge of those articles would need to be considered before any merge here. From Hill To Shore (talk) 14:34, 18 September 2020 (UTC)[reply]
It's possible that one can refer to the historic fort itself, while the other the modern Census designated place. Although these may not need to be treated separately, as in the case of Fort Liberty (Q991369) (edit: oh, and see Fort Bragg (Q8962062). -Animalparty (talk) 16:44, 18 September 2020 (UTC)[reply]
I will make one the fort and the other the modern populated area, currently the Wikipedia links are a mix of the two and the descriptions are the same. --RAN (talk) 20:56, 19 September 2020 (UTC)[reply]

duplicate any Japanese elections

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Eien20 (talk) 15:49, 19 September 2020 (UTC)[reply]

I want to merge those items. --Eien20 (talk) 00:29, 19 September 2020 (UTC)[reply]

There is information about how to merge items and where get more help, if needed, at Help:Merge. ~ The Erinaceous One 🦔 05:45, 19 September 2020 (UTC)[reply]
@Eien20: These are not the same things, and should not be merged. In each case only one is the election; the other is the legislative term of the House of Representatives that then follows that election. --Oravrattas (talk) 09:22, 19 September 2020 (UTC)[reply]
Thank you for telling me. I almost used it by mistake. --Eien20 (talk) 15:49, 19 September 2020 (UTC)[reply]

Creating a database for scientific purpose.

How to create an accurate database for biological research purposes? and What are the main steps to follow up? I basically a Biologist (PharmD Student).  – The preceding unsigned comment was added by Kone Boï (talk • contribs) at 1:54, September 19, 2020 (UTC).

If you are trying to create a database outside of Wikidata, this isn't the place to get help. Maybe try stackoverflow.com? If you want to use Wikidata to acheive your goals, then you'll need to provide more details about what you are trying to do, what you have tried, and where you got stuck. Nobody will do your work for you, but there are many people who help you once they see you've put in the necessary effort.
P.S. Please sign your comments by typing "~~~~" at the end. The Erinaceous One 🦔 05:52, 19 September 2020 (UTC)[reply]
If you want to use any of the data on Wikidata for research, you might as well just quit grad school. Massively incomplete, haphazard, and next to zero quality control here. Wikidata is simply a way to link Wikipedia articles, and a platform for nerds to amuse themselves with trivia like how many roads in Sweden start with the letter T and are less than 4 km long. It serves no greater function. -Animalparty (talk) 07:10, 19 September 2020 (UTC)[reply]
Animalparty, I wouldn't be too pessamistic about the utility of Wikidata. I have a friend who develops natural language processing AIs and uses Wikidata as a source. But if you are trying to use it for a case where you need 99.99% reliable data, then yeah, this isn't the place! — The Erinaceous One 🦔 10:15, 19 September 2020 (UTC)[reply]
Animalparty Wikidata is already in use by industry and researchers. Turns out you don't need a perfect or complete dataset to be useful for various things. BrokenSegue (talk) 14:57, 19 September 2020 (UTC)[reply]
I would like to see a reference for this statement. --SCIdude (talk) 16:28, 19 September 2020 (UTC)[reply]
@SCIdude: Which? The use in industry or research? Research has evidence everywhere. For example: Facebook research published work on wikidata. For industrial use it's harder to demonstrate because I have insider knowledge I can't share. But you probably interact with services using wikidata more often than you think. BrokenSegue (talk) 17:01, 19 September 2020 (UTC)[reply]

Information retrieval (SPARQL query) on multiple datasets?

When you want to do a SPARQL query on Wikidata, you go to https://query.wikidata.org/ . When you want to do a SPARQL query on OSM, you go to https://query.wikidata.org/ . When you want to do a SPARQL query on DBpedia you go to https://dbpedia.org/sparql .

Is there a way to do a SPARQL query on multiple datasets or can you do it from any one of these locations?

What I'm hoping to do is to be able to query any dataset from one place to leverage the benefits of this LOD using RDF.

--Nonoumasy (talk) 02:54, 19 September 2020 (UTC)Template:Nononumasy[reply]

The concept you are looking for is federated queries. You can query one endpoint and, within the query, reach out to other endpoints. There is a description with examples for WDQS, too. Jneubert (talk) 14:48, 19 September 2020 (UTC)[reply]

Thanks.--Nonoumasy (talk) 20:59, 19 September 2020 (UTC)Template:Nonoumasy[reply]

"Last, First" for references rather than "First Last"

I'm trying to build out a complete reference so that I'll be able to use Wikidata for a Wikipedia application. However, for the author (P50) field, it displays on Wikipedia as "First Last", rather than "Last, First" as Wikipedia style normally dictates. I've filled out the first and last name fields on the author's item, so Wikidata should be able to handle this, but it doesn't seem able to yet. Could this be addressed? {{u|Sdkb}}talk 08:01, 19 September 2020 (UTC)[reply]

Using en:Template:Cite Q? It's using the author (P50) field on the publication (and you'd have to check the template source code to find what it takes from the author item), but it's an issue for that template specifically. There must be lots of such examples throughout Wikipedia. It says on the documentation:

Order of precedence for rendering author names:

   stated as (P1932) qualifier on author (P50)
   author name string (P2093)
   author (P50) label in English
   author (P50) label in any other language

Ghouston (talk) 08:35, 19 September 2020 (UTC)[reply]

@Ghouston: Thanks for the link. I'm not sure how I'd use that template within w:Template:Wikidata, and in any case, it doesn't appear to handle the author name properly itself.
If you all want some incentive to fix this, the page where I'm trying to do this is on the path to soon becoming a featured list. The Wikidata links are much less likely to survive the upcoming FLC review if en-WP editors can point to a way in which the Wikidata-derived citations are inferior than if they display identically. So this issue will potentially determine whether or not we're able to get some featured-level Wikidata-integrated content on en-WP. {{u|Sdkb}}talk 03:24, 20 September 2020 (UTC)[reply]
I don't think that using en:Template:Wikidata to generate text, as on that page, is permitted in the English Wikipedia in any case (en:Wikipedia:Wikidata#Appropriate_usage_in_articles). At best, Wikidata can supply data for infoboxes, and references e.g., via en:Template:Cite Q. I guess the names in a citation should be taken as written in the work itself, instead of generated from everything that we may know about the author. But trying to convert a value from object named as (P1932) automatically may be complex and error-prone. The text I quoted from Cite Q above is actually from the "issues" section on the template, so may not be the way it's currently done. Another listed issue is "Author name should display as "Last, First Middle" to match Wikipedia house style". The template authors have probably realised that it's difficult. Ghouston (talk) 04:17, 20 September 2020 (UTC)[reply]
@Ghouston: I just added that last listed issue earlier today haha; there wasn't anything previously. I'm not sure who to ping who might be interested in working on this, but there are probably quite a few instances of w:Template:Wikidata that include a reference with an author name (and if there aren't currently, we hope there ultimately will be), so fixing this will have a widespread impact. {{u|Sdkb}}talk 06:53, 20 September 2020 (UTC)[reply]
Well, I think the only way it could be done reliably is by specifying the names manually in the right order, either in a new qualifier for the publication items in Wikidata, or in Wikipedia as a parameter to the template. Ghouston (talk) 08:03, 20 September 2020 (UTC)[reply]

Wikipedia Infoboxes to Wikidata?

Is it possible to extract Wikipedia Infoboxes to Wikidata?

I'm interested in extracting the property & value of -Belligerents -Commanders -Strength for eg. https://en.wikipedia.org/wiki/Battle_of_Un_no_Kuchi

I'm hoping to add this data to the respective Wikidata item: https://www.wikidata.org/wiki/Q2890433

Or would this be best done using the Wikipedia REST API?

--Nonoumasy (talk) 10:11, 19 September 2020 (UTC)Template:Nonoumasy[reply]

Is there any links on how to use Harvest Templates? The help button is disabled.  – The preceding unsigned comment was added by [[User:|?]] ([[User talk:|talk]] • contribs).

Main points are: login, click load, wait, once it's loaded, start, .. eventually try "stop", "publicly save".
Give it a try with some edits. If something goes terribly wrong, one can easily revert them. BTW please sign your posts with ~~~~ --- Jura 15:52, 19 September 2020 (UTC)[reply]

Learning SPARQL

Hi,

I'm just wondering if there are some resources to learn SPARQL. I'm happy at my progress but would like to accelerate the process. I'm sure just 'doing it' has its virtue but if there are video tutorials on intermediate/advance queries, pls let me know. Also, would learning more SQL help? --Nonoumasy (talk) 20:55, 19 September 2020 (UTC)Template:Nonoumasy[reply]

Maybe Wikidata:SPARQL_query_service/Query_Helper works for you. --- Jura 15:55, 19 September 2020 (UTC)[reply]
I have found this wikibook answering all my questions: https://en.wikibooks.org/wiki/SPARQL --SCIdude (talk) 16:25, 19 September 2020 (UTC)[reply]

This is cool! Thanks. I am going thru this documentation https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial and its helping a lot --Nonoumasy (talk) 04:55, 20 September 2020 (UTC)Template:Nonoumasy[reply]

Images from the State Library and Archives of Florida

I am making entries for images from the State Library and Archives of Florida that correspond to an identifiable person. I make an entry in Wikidata and import the image. They may be sparse in information like Elizabeth Harris (Q99481187) when I can't find more information, or more information dense like Oscar Fitzalan Peek (Q99479376) where I can create an entry in Familysearch and link them to primary documents. Does anyone have any objections? --RAN (talk) 21:31, 19 September 2020 (UTC)[reply]

how many? it would be good if you could at least put a century on the person to avoid future conflations. i think we have a property that just indicates they existed at a certain point in time (if birth/date are totally unknown). ah it's floruit (P1317) BrokenSegue (talk) 01:29, 20 September 2020 (UTC)[reply]
Sure! I am doing them one at a time, I don't see a way to automate them, so no count yet. I think I will be able to get a birth year for everyone from the census, Florida was sparsely populated in the early 1900s. --RAN (talk) 02:26, 20 September 2020 (UTC)[reply]
Thank you for doing this!!! I hope you find some existing duplicates to merge along the way. — Levana Taylor (talk) 03:07, 20 September 2020 (UTC)[reply]

Importing a template

I'd like to import the w:Template:Please see template to Wikidata so that we can use it here. I haven't done this before, but I can't find any help pages at all about templates on Wikidata. Is there anything I should know before I go ahead and do it? (And I assume there's no alternative to just copying and pasting, as much as forking pains me?) {{u|Sdkb}}talk 03:36, 20 September 2020 (UTC)[reply]

Items concerning autism

Hi,

I noticed last night that things are misconnected with our autism-related articles: the interwiki's are mixed up.

We've got classic autism (Q38404) and the autism spectrum (Q1436063) mixed up with each other, at least the English, German and Dutch articles don't match. Also Q1104126 (Kanners syndrome) seems to be in the mix. The diagnostic manual DSM has been changed in 2013, and translations of this change came out only in recent years, that may be why. Can some one untangle? Ciell (talk) 07:30, 20 September 2020 (UTC)[reply]