Wikidata:Project chat/Archive/2024/09

From Wikidata
Jump to navigation Jump to search

请求合并 Q31395213 和 Q4089188

它们描述了同样的内容。Kone718 (talk) 09:17, 1 September 2024 (UTC)

Wikidata weekly summary #641

Thanks for providing this :) So9q (talk) 05:27, 1 September 2024 (UTC)

Property:P1813 short name with countries

The usage of this property differs from language to language. Looking at USA and UK, some write their form of "USA", while others write "United States" (hence UK and United Kingdom). I'm looking for a more or less reliable field to retrieve a short name (not an abbreviation!) and I'm asking myself if this would be the one, I could use for that. UK I would rather expect at international license plate code or something. I changed it for English and German in the UK and the US, but now I start to worry, that this might cause problems elsewhere. I would also like to change the value at Holy Roman Empire to Holy Roman Empire instead of HRE. Any advice on the topic? Flominator (talk) 05:10, 29 August 2024 (UTC)

Also states, like Kentucky, use it as KY, where I would have expected just "Kentucky" as opposed to the official name "State of Kentucky". Of course, I could also use the label, but that would be another API call. The claims I already have at hand at that point. --Flominator (talk) 05:35, 29 August 2024 (UTC)

Looking at the examples of this property, both abbreviation and "short text but not abbreviated" look to be both accepted https://www.wikidata.org/wiki/Property:P1813#P1855 Bouzinac💬✒️💛 08:05, 29 August 2024 (UTC)
Its aliases "acronym" and "initialism" do make it ambiguous. Splitting off an "acronym" property might be best. Mind you, that wouldn't help the OP who naturally refers to USA and US in their post, as we all do, UK/GB are a muddle, and you'd code UK as both a short form and an acronym, and no-one has attempted to unravel Foreign, Commonwealth and Development Office (Q358834) !! Vicarage (talk) 08:22, 29 August 2024 (UTC)
Actually, it would help him a lot, because he could go then and set "United States" and "United Kingdom" as value for this property without getting his butt kicked (or at least to defend himself with a line of argumentation, in case it happens anyway). Flominator (talk) 09:01, 29 August 2024 (UTC)
Unfortunately, it looks like your proposal has been tried already in January of this year: Wikidata:Property proposal/abbreviation --Flominator (talk) 09:13, 29 August 2024 (UTC)
That proposal was very confused. What we'd want is 'initialism' ('acronym' is a a pronounceable word), but as Michael Caine would say, not a lot of people know that. But its not something that impacts me. Vicarage (talk) 16:52, 29 August 2024 (UTC)
Thanks. Let's hope Wikidata:Property proposal/initialism is less confused. --Flominator (talk) 10:03, 30 August 2024 (UTC)
That sounds like what the label is for. The label is supposed to be the name the item is most commonly known by (see Help:Label). We normally use official name (P1448) for the long/formal name. I don't know why United States of America (Q30) has the formal name as the label, even the English Wikipedia article is "United States". - Nikki (talk) 05:28, 1 September 2024 (UTC)

Persia

Persian langauge is missing i cant add Baratiiman (talk) 05:35, 2 September 2024 (UTC)

Already exists: Q9168 Carl Ha (talk) 06:58, 2 September 2024 (UTC)
or do you mean in the box at the top of each item? there you have to type "fa" as language code. Carl Ha (talk) 06:59, 2 September 2024 (UTC)
@Baratiiman: what are you talking about? On items like femininity (Q866081) you did edit in Persian. But on Hawk tuah (Q127159727) you wrongly edited in English. Is your interface in Persian? If so, you should see Persian. Cheers, VIGNERON (talk) 09:23, 2 September 2024 (UTC)
@Baratiiman: You may want to add a Babel template like {{#babel:fa}} to your user page. Alternatively, enable LabelLister.--GZWDer (talk) 11:25, 2 September 2024 (UTC)

Why does this list not properly sort?

Wikidata:WikiProject sum of all paintings/Exhibitions/0,10 The first column should be sorted by number but they are in wrong order. Carl Ha (talk) 06:57, 2 September 2024 (UTC)

Announcing the Universal Code of Conduct Coordinating Committee

Original message at wikimedia-l. You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

The scrutineers have finished reviewing the vote and the Elections Committee have certified the results for the Universal Code of Conduct Coordinating Committee (U4C) special election.

I am pleased to announce the following individual as regional members of the U4C, who will fulfill a term until 15 June 2026:

  • North America (USA and Canada)
    • Ajraddatz

The following seats were not filled during this special election:

  • Latin America and Caribbean
  • Central and East Europe (CEE)
  • Sub-Saharan Africa
  • South Asia
  • The four remaining Community-At-Large seats

Thank you again to everyone who participated in this process and much appreciation to the candidates for your leadership and dedication to the Wikimedia movement and community.

Over the next few weeks, the U4C will begin meeting and planning the 2024-25 year in supporting the implementation and review of the UCoC and Enforcement Guidelines. You can follow their work on Meta-Wiki.

On behalf of the U4C and the Elections Committee,

RamzyM (WMF) 14:05, 2 September 2024 (UTC)

Wikidata weekly summary #643

This seems like a bit of a stretch or is that just me? Trade (talk) 16:42, 27 August 2024 (UTC)

Doesn't seem far-fetched to me. Do terrorists not seek political or social change? One man's freedom fighter is another man's terrorist. -Animalparty (talk) 01:41, 28 August 2024 (UTC)
Googles Dictonary suggests that acting the in the pursuit of political aims is part of the definition of what makes someone a terrorist. ChristianKl15:29, 2 September 2024 (UTC)

Help the Wikimedia Foundation learn more about on-wiki collaborations

The Campaigns team at the Wikimedia Foundation is exploring how to expand it's work on campaigns, to support other kinds of collaboration. We are interested in learning from diverse editors that have experience joining and working on WikiProjects, Campaigns, and other kinds of on-wiki collaboration. We need your help:

Whatever input you bring to the two spaces will help us make better decisions about next steps beyond the current tools we support. Astinson (WMF) (talk) 18:54, 2 September 2024 (UTC)

Label of P813

Hi! I think the label of P813 was changed by mistake. It has Arabic in an English field. Thanks WhisperToMe (talk) 22:09, 2 September 2024 (UTC)

It got fixed. Thank you WhisperToMe (talk) 22:23, 2 September 2024 (UTC)

Please help me

Hi. I wanna to link this article with it's persian translate (this article). But It's Wikidata's Page is locked. Can somebody help me with fixing this 2 link together to solve my problem? Hulu2024 (talk) 10:41, 3 September 2024 (UTC)

✓ Done — Martin (MSGJ · talk) 10:59, 3 September 2024 (UTC)

Have your say: Vote for the 2024 Board of Trustees!

Hello all,

The voting period for the 2024 Board of Trustees election is now open. There are twelve (12) candidates running for four (4) seats on the Board.

Learn more about the candidates by reading their statements and their answers to community questions.

When you are ready, go to the SecurePoll voting page to vote. The vote is open from September 3rd at 00:00 UTC to September 17th at 23:59 UTC.

To check your voter eligibility, please visit the voter eligibility page.

Best regards,

The Elections Committee and Board Selection Working Group

MediaWiki message delivery (talk) 12:13, 3 September 2024 (UTC)

Proposal to remove all case data from all "COVID-19 in <Place>" items

Special:Contributions/CovidDatahubBot added a number of statements about COVID-19 cases in items such as Q83873387. Such data are now largely out-of-date and boost the item to the limit Wikidata can handle (and thus long not updated). It is better expressed such data in Commons dataset instead. Also, many item can not be edited further since it is reaching the size limit of Wikidata items, and causes issues like phab:T373554. GZWDer (talk) 13:28, 2 September 2024 (UTC)

I agree that Tabular Data is a better way to store this data. While this is really part of a bigger problem (see Special:LongPages), it's good to explore simple solutions first. The removals should be performed in batches to reduce the number of edits made (if this proposal gets accepted). Dexxor (talk) 09:33, 3 September 2024 (UTC)
Agree Vojtěch Dostál (talk) 09:56, 3 September 2024 (UTC)
@GZWDer and @Dexxor , after deleting all the outdated data on Q83873387, my bot was able to link the article to the item using Pywikibot. I'm the reporter of the mentioned Phabricator ticket. I just wanted to mention that there is en:w:Template:COVID-19 data/data (see here for a clearer view of the data) updated daily by a bot, with the last update on 17 August 2024. I believe there will be no more updates from the endpoint. Thanks. Aram (talk) 19:27, 3 September 2024 (UTC)
So big decision, so fast? @GZWDer, @Dexxor, @Vojtěch Dostál:. So now queries like in Talk:Q84098939 are all broken. Who will fix them? --Infovarius (talk) 21:37, 12 September 2024 (UTC)

Conflation

These need help: Charles-Louis-Achille Lucas (Q19695615) Wolf Laufer (Q107059238) Fakhr al-Dīn Ṭurayḥī (Q5942448) RAN (talk) 08:54, 4 September 2024 (UTC)

The date of death of Q19695615 has now been changed to 1905[1]; the source says 20th September - is there a reason it should be 19th? I reverted the recent additions to Q107059238 - I would have moved them to a new item, but one of the references for a 1601 date of death is claimed to have been published in 1600, and the links don't work for me ("The handle you requested -- 21.12147/id/48db8bef-31cf-4017-9290-305f56c518e9 -- cannot be found"). Q5942448 just had an incorrect date (1474 should have been 1674 - I removed it and merged with an item that already had 1674). Peter James (talk) 13:07, 4 September 2024 (UTC)
Regarding Charles-Louis-Achille Lucas (Q19695615), the death certificate has been established on September 20, but the death happened the day before, on September 19. Ayack (talk) 14:46, 4 September 2024 (UTC)
We have that happen with obituaries all the time, people add the date of the obituary rather than the date of death. --RAN (talk) 19:17, 4 September 2024 (UTC)

Property for paused, interrupted etc.

Trying to model "Between 1938 and 1941 it was reunited with Lower Silesia as the Province of Silesia" in Upper Silesia Province (Q704495). Is there any quantifier that says "not from ... to ...." or something? --Flominator (talk) 11:50, 4 September 2024 (UTC)

Brazilian Superior Electoral Court database

Hello everyone!

At Wikimedia Commons, I made a proposal of batch uploading all the candidates portraits from Brazilian elections (2004-2024). The user @Pfcab: has uploaded a big chunk and while talking with @DaxServer:, he noticed that since the divulgacandcontas.tse.jus.br has so much biographical data (example), it could be a relevant crosswiki project.

Would this be possible? There's any bot that could - firstly - look for missing names in the Wikidata (while completing all the rest), exporting the missing Superior Electoral Court biographical data, and adding the respective images found in Category:Files from Portal de Dados Abertos do TSE?

Thanks, Erick Soares3 (talk) 16:37, 4 September 2024 (UTC)

Dating for painting at Q124551600

I have a painting that is dated by the the museum as "1793 (1794?)", so it seems it was likely made in 1793 but there is a small chance that it was only made in 1794. When I type them both I get an error report. How to fix that? How to give one the preffered rank, I don't find a fitting field. Carl Ha (talk) 06:42, 1 September 2024 (UTC)

Mark the one with the highest chance as "preferred"? And add a 'reason' qualifier to indicate it preference is based on higher chance? The "deprecated" qualifier (reason for deprecated rank (P2241)) for the statements has hundreds of reasons (there is list of Wikidata reasons for deprecation (Q52105174) but I am not sure it is complete; I think my SPARQL query earlier this week showed many more). Similarly, there is reason for preferred rank (P7452) and maybe most probable value (Q98344233) is appropriate here. Egon Willighagen (talk) 08:09, 1 September 2024 (UTC)
How do I mark it as preferred? Which qualifier do I use? Carl Ha (talk) 08:11, 1 September 2024 (UTC)
Ranking is explained here: https://www.wikidata.org/wiki/Help:Ranking
I would suggest the qualifier property reason for preferred rank (P7452) with the value most probable value (Q98344233). Egon Willighagen (talk) 10:42, 1 September 2024 (UTC)
Thank you! Carl Ha (talk) 11:12, 1 September 2024 (UTC)
What should be do if we have a work where there is no consensus by art historians what the "preferred" dating is? The dating of Q570188 is disputed but Wikidata wants me to prefer one statement. Carl Ha (talk) 08:38, 2 September 2024 (UTC)
@Carl Ha I don't think Wikidata constraints are written in stone and the real world sometimes brings challenges that no constraint can predict. In this case, in my view, you can disregard the exclamation marks, just leave it as it is for now. Wikidata constraints are here to serve us, not the other way round. Vojtěch Dostál (talk) 06:38, 5 September 2024 (UTC)

Could we include things that have P31 as "icons" (Q132137) as this is a type of painting. I don't know how to include that technically in the wiki code. Carl Ha (talk) 08:36, 1 September 2024 (UTC)

It now works, I had a typo. Carl Ha (talk) 18:35, 5 September 2024 (UTC)
I tried and now it seems that it just includes all elements including sculptures etc.

Wikidata Query Service graph split to enter its transition period

Hi all!

As part of the WDQS Graph Split project, we have new SPARQL endpoints available for serving the “main” (https://query-main.wikidata.org/) and “scholarly” (https://query-scholarly.wikidata.org/) subgraphs of Wikidata.

As you might be aware we are addressing the Wikidata Query Service stability and scaling issues. We have been working on several projects to address these issues. This announcement is about one of them, the WDQS Graph Split. This change will have an impact on certain uses of the Wikidata Query Service.

We are now entering a transition period until the end of March 2025. The three SPARQL endpoints will remain in place until the end of the transition. At the end of the transition, https://query.wikidata.org/ will only serve the main Wikidata subgraph (without scholarly articles). The query-main and query-scholarly endpoints will continue to be available after the transition.

If you know to want more this change, please refer to the talk page on Wikidata.

Thanks for your attention! Sannita (WMF) (talk) 13:41, 4 September 2024 (UTC)

I would very much like to avoid a graph split. I have not seen a vote or anything community related in response to the WMF idea of splitting the graph. This is not a good sign.
It seems the WMF have run out of patience for this community to try to mitigate (e.g. by deleting the part of the scholarly graph not used by any other Wikimedia project) and thus freeing up resources for items that the community really care about and that has a use for other Wikimedia projects.
This is interesting. I view this as the WMF technical management team has in the absence of a timely response and reaction from the Wikidata community themselves decided how to handle the issues that our lack of e.g. a mass-import policy has created.
This sets a dangerous precedent for the future of more WMF governance which might impact the project severely negative.
I urge therefore the community to:
  • address the issue with the enourmous revision table (e.g. by suggesting to WMF to merge or purge the revision log for entries related to bots so that e.g. 20 edits in a row from a bot on the same date get squashed into 1 edit in the log)
  • immediately stop all bots currently importing items no matter the frequency until a mass-import policy is in place.
  • immediately stop all bots making repetitious edits to millions of items which inflate the revision table (e.g. User:LiMrBot)
  • immediately limit all users to importing x items a week/month until a mass-import policy is in place no matter what tool they use.
  • put up a banner advising users of the changes and encourage them to help finding solutions and discuss appropriate policies and changes to the project.
  • take relevant community steps to ensure that the project can keep growing in a healthy and reliable way both technically and socially.
  • assign a community liason that can help communicate with WMF and try avoid the graph split becoming a reality.
WDYT? So9q (talk) 09:36, 5 September 2024 (UTC)
Also see
M2k~dewiki (talk) 09:42, 5 September 2024 (UTC)
Also see
M2k~dewiki (talk) 09:57, 5 September 2024 (UTC)
Regarding the distribution / portion of content by type (e.g. scholary articles vs. streets / architectal structure) see this image:
Content on Wikidata by type
M2k~dewiki (talk) 11:46, 5 September 2024 (UTC)

Just to say that the problems with the Wikidata Query Service backend are being discussed since July 2021, and that the split in the graph has been introducted as a possibility in October 2023, and that we communicated about it periodically (maybe not to the best of our possibilities, for which I am willing to take the blame, but we've kept our communication open with the most affected users during the whole time).

This is not a precedent for WMF telling the community what to do, the community is very much in its own right to make all the decisions it wants, but we need to find a solution anyway to a potential failure of the Wikidata Query Service that we started analysing in mid-2021. I want to stress that the graph split is a technical patch to a very specific problem, and that in no way WMF or WMDE are interested in governing the community. Sannita (WMF) (talk) 13:02, 5 September 2024 (UTC)

I understand, thanks for the links. I have no problem with WMF or any of the employees. I see a bunch of people really trying hard to keep this project from failing catastrophically and I'm really thankful that we still have freedom as a community to decide what is best for the community even when we seem to be on a reckless path right now.
What I'm trying to highlight is the lack of discussion about the growth-issue and how to steer the community to grow by quality instead of quantity overall. Also I'm missing a discussion and information e.g. to bot operators of the technical limitations that we have because of hard- and software and a governance that ensures that our bots do not break the system.
A perhaps horrifying example is the bot I linked above which makes 25+ edits in a row to the same items for millions of items potentially.
In that specific case we failed:
  • to inspect and discuss the operations of the bot before approval.
  • failed as a community to clearly define the limits for Wikidata so we can make good decisions about whether a certain implementation of a bot is desired (in this case make all the changes locally to the item, then upload = 1 revision).
Failings related to responsible/"healthy" growth:
  • we have failed as a community to ask WMF for input on strategies when it comes to limiting growth.
  • we have failed as a community to have discussions with votes on what to prioritize when WMF is telling us we cannot "import everything" without breaking the infrastructure.
  • we have failed as a community to implement new or update existing policies to govern the growth and quality of the project in a way that the community can collectively agree on effectively manages the issues the WMF have trying to tell us about for years.
We really have a lot of community work to do to keep Wikidata sound and healthy! So9q (talk) 17:21, 5 September 2024 (UTC)
@So9q I see your points, and I agree with them. So much so, that I'm writing this message with my volunteer account on purpose and not my work one, to further stress that we need as a community to address these points. For what it's worth, I'm available (again, as a volunteer) to discuss these points further. I know for a fact that we'll have people who can provide us with significant knowledge in both WMF and WMDE, to take an informed decision. Sannita - not just another it.wiki sysop 17:53, 5 September 2024 (UTC)
See this. There is yet another reason/thing to take in consideration: Most of Wikipedia language versions refrained from using Wikidata in infoboxes or articles. Doubting on Wikidata was some reason. Now as WP communities see that WD works and data can be used they will use WD more and more. For one example: We started to use Wikidata within the infobox for american settlements several items, e.g. time zone, FIPS and GNIS, inhabitants, area and several more. We might add telephone area code and ZIP code in the next future. Some of the are still crosschecking only if specific data in Wikidata and Wikipedia are the same but might switch on Wikidata only every time. All the language versions will make greater use of Wikidata in the future. If WMF tells us we're breaking the infrastructure they didn't do their job or did it wrong. Matthiasb (talk) 01:14, 6 September 2024 (UTC)

Question about P2484

Dear all,

Please see this question about property P2484: Property_talk:P2484#Multiple_NCES_IDs_are_possible

Thank you WhisperToMe (talk) 21:09, 5 September 2024 (UTC)

symmetric for "subclass of (P279)" ?

Hi, why is there not a symmetric "has subclass" for the property "subclass of (P279)" ? I contribute on social science concepts and it is honestly complicated to build concepts hierarchies when you are not able to see the "subclasses item" from the "superclass item" page. Making a property proposal is a bit beyond my skills and interests, anyone interested to look into the question ?

Thanks Jeanne Noiraud (talk) 17:37, 4 September 2024 (UTC)

We're unlikely to ever make an inverse property for subclass of because there are items with hundreds of subclasses.
If you're using the basic web search you can search with haswbstatement. To see all subclasses of activity (Q1914636) you would search haswbstatement:p279=Q1914636. Or you could use the Wikidata class browser (Q29982490) tool at https://bambots.brucemyers.com/WikidataClasses.php . And finally you could use Wikidata Query Service at https://query.wikidata.org/. William Graham (talk) 18:26, 4 September 2024 (UTC)
Thanks, the class browser is helpful ! The others are a bit too technical for me. Jeanne Noiraud (talk) 13:13, 6 September 2024 (UTC)
Relateditems gadget is useful for a quick look of subclasses. Though sometimes it gets overcrowded if there's too many statements about an item. You can enable it here. Samoasambia 19:01, 4 September 2024 (UTC)
Thanks, useful tool indeed ! Jeanne Noiraud (talk) 13:13, 6 September 2024 (UTC)

Merge?

These are the same supercomputer with performance measured at different times, perhaps with slight mods at each performance measurement, should they be merged? Ranger (Q72229332) Ranger (Q73278041) Ranger (Q72095008) Ranger (Q2130906) RAN (talk) 23:45, 5 September 2024 (UTC)

Yes. I used it Vicarage (talk) 04:35, 6 September 2024 (UTC)
Not entirely sure but probably yes, at least some of them (maybe not the last one?) ; and if not merged, these items should more clearly differentiable. It needs someone who understand exactly what this is about. The first three were created the bot TOP500_importer maybe Amitie 10g can tell us more. Cheers, VIGNERON (talk) 12:16, 6 September 2024 (UTC)

Islamic dates versus Christian dates

See: Ibrahim Abu-Dayyeh (Q63122057) where both dates are included. Do we include both or just delete the Islamic one? It triggers an error message. RAN (talk) 19:12, 5 September 2024 (UTC)

@Richard Arthur Norton (1958- ): it's an hard question. We don't have a clear and easy way to indicated Islamic dates (which is a big problem in itself), right now the Islamic dates are stored as Julian or Gregorian dates, this is wrong and so they should probably (and sadly) be removed. Cheers, VIGNERON (talk) 12:30, 6 September 2024 (UTC)
I deleted the Islamic date it was interpreted as a standard AD date and was triggering an error message. --RAN (talk) 00:25, 7 September 2024 (UTC)

How best to model a long development project for a property

For Harbor Steps (Q130246591), [2] page 7 gives a good table summarizing the process of assembling the land, planning, and construction, especially of the dates associated with three different phases of work. I imagine this can be appropriately modeled using existing properties, but I do not know how, not a sort of thing I've ever seen modeled here. - Jmabel (talk) 20:33, 6 September 2024 (UTC)

Merging items (duplicates)

Can someone please merge these two pages, since they are about the same hospital. The current name of the hospital is AdventHealth Daytona Beach.

Florida Hospital Memorial Medical Center (Q30269896) to Florida Hospital Memorial Medical Center (Q130213551)

Catfurball (talk) 20:25, 6 September 2024 (UTC)

✓ Done Ymblanter (talk) 19:01, 7 September 2024 (UTC)

KUOW

KUOW (Q6339681) is just a separately licensed transmitter for KUOW-FM (Q6339679). No programming content of its own, really just a repeater. I suppose it still merits a separate item, but I suspect the two items should somehow be related to one another, which they seem not to be currently. - Jmabel (talk) 05:42, 3 August 2024 (UTC)

(Reviving the above from archive, because no one made any suggestions on how to do this. - Jmabel (talk) 20:36, 6 September 2024 (UTC))

Would Property:P527 work for this case? Ymblanter (talk) 19:03, 7 September 2024 (UTC)

Creating new property for Panoramio user ID

Hi. I wanted to add the panoramio ID for a Commons photographer, but found that there is no property for it. I looked at Flickr user ID (P3267) and started creating Q130293265 before realizing that I needed to create a property and not an item. Could somebody with sufficient permission delete Q130293265 and possibly also create the new property? Thanks in advance. Cryptic-waveform (talk) 14:07, 13 September 2024 (UTC)

You can propose a new property if you want. See the relevant help pages. So9q (talk) 14:15, 14 September 2024 (UTC)
@Cryptic-waveform Wikidata:Property proposal RVA2869 (talk) 14:32, 14 September 2024 (UTC)
✓ Deleted

How to find specific properties?

Hello, I am looking for some specific properties or a way to enter certain technical data in items. This includes, for example:

  • resolution (e.g. 6,000 x 4,000 px or 24 megapixels) or burst mode speed (e.g. 10 shots per second) for digital cameras
  • angle of view, maximum magnification, number of aperture blades and number of lens groups and elements for camera lenses
  • nominal impedance and signal-to-noise ratio for microphones

Unfortunately, I have not found any suitable properties or do not know whether there is a way to search for these other than by name (as there are no categories for items that could be searched - are there perhaps other ways?).

Thanks --Аныл Озташ (talk) 15:02, 8 September 2024 (UTC)

Besides looking for the properties directly the other way is to look at existing items to see what properties those items use. Ideally, there would be a model item (P5869) for digital cameras/camera lenses/microphones where you can see what properties get used to model those. ChristianKl15:27, 8 September 2024 (UTC)

Q107019458 and Q924673 merge?

The New York Herald (Q107019458) and New York Herald (Q924673) or separate incarnations? RAN (talk) 00:24, 7 September 2024 (UTC)

The different identifiers are because the New York Herald was combined with the New York Sun to form The Sun and the New York herald (Q107019629) from February to September 1920 (https://www.loc.gov/item/sn83030273). The New York Herald (Q107019458) is from October 1920, when it became a separate newspaper again, to 1924. I don't know if the P1144/P4898 identifiers and 8-month gap are enough for separate items. Peter James (talk) 11:03, 7 September 2024 (UTC)
I added the dates to help distinguish the two. --RAN (talk) 04:30, 9 September 2024 (UTC)

Statement for Property "Forced Labor"

On the new WD item Q130221049 (Dressmakers of Auschwitz), the top statement instance of (P31) could do with the qualifier Q705818 -- Deborahjay (talk) 07:55, 9 September 2024 (UTC)

Adding Para-swimming classification to a para-swimmer's WD item

Detailed this difficulty on the Talk:Q129813554 page. -- Deborahjay (talk) 08:00, 9 September 2024 (UTC)

Eldest child

How do we represent eldest child and/or a certain order of the child in the lineage? DaxServer (talk) 12:59, 9 September 2024 (UTC)

@DaxServer: just add the birthdate (place of birth (P19)) on each child item, then you can easily retrieve the eldest one. Cheers, VIGNERON (talk) 13:48, 9 September 2024 (UTC)
@VIGNERON The date of birth (P569) is not known, only that he's the eldest one among the six sons DaxServer (talk) 14:29, 9 September 2024 (UTC)
@DaxServer: ah I see, then you could use series ordinal (P1545) = 1, 2, 3 etc. in qualifier, like on Q7810#q7810$2B9F1E2C-7583-4DD8-8CEB-F367FC0641E1 for instance. Cdlt, VIGNERON (talk) 15:01, 9 September 2024 (UTC)

Merging items (duplicates)

Hello, could someone please merge the following items or explain to me how I can do this? I have tried it via Special:MergeItems, but it won't load for me (the gadget is already enabled in the preferences). The entries are duplicates.

Thanks Аныл Озташ (talk) 22:58, 5 September 2024 (UTC)

It looks like User:RVA2869 has taken care of most of these. On long-focus lens (Q11022034) vs telephoto lens (Q516461) - they each have many separate sitelinks, so a variety of languages seem to think they are distinct. For example en:Long-focus lens vs en:Telephoto lens which seems to clarify the distinction. ArthurPSmith (talk) 21:27, 9 September 2024 (UTC)

How to add units to a thermal power plant?

I think I asked this question before but if so it was so long ago I have forgotten the answer.

In a thermal power plant, such as a coal-fired power plant, the number of units and their capacity in megawatts are very basic pieces of information. For example https://globalenergymonitor.org/projects/global-coal-plant-tracker/methodology/ “ database tracks individual coal plant units”.

I am writing on Wikipedia about coal-fired power plants in Turkey and I pick up the infobox data automatically from Wikidata. At the moment I am editing https://en.wikipedia.org/wiki/Af%C5%9Fin-Elbistan_power_stations. Ideally I would like the infoboxes to show that the A plant has 3 operational units of 340 MW each and one mothballed unit of 335 MW all of which are subcritical and 2 proposed units each of 344 MW, and that the B plant has 4 units each of 360 MW all operational.

If that is too ambitious just the number of units would be a step forward, as shown in infobox params ps_units_operational, ps_units_planned etc. Is that possible? Chidgk1 (talk) 09:09, 6 September 2024 (UTC)

https://www.wikidata.org/wiki/Q85967587#P2109 ? Bouzinac💬✒️💛 15:29, 6 September 2024 (UTC)
And you might use this query to list power wikidata items by MW power : https://w.wiki/B7U9 Bouzinac💬✒️💛 19:27, 6 September 2024 (UTC)
@Bouzinac Thanks for quick reply but I don’t quite understand - maybe my question was unclear? Chidgk1 (talk) 08:18, 7 September 2024 (UTC)
@Chidgk1: I think Bouzinac thought you were asking about "units" in the sense of the "MW" part of 340 MW, not "units" in the sense of components. It probably doesn't make sense to create separate wikidata items for each "unit" of a power plant; rather I think the property to use is has part(s) of the class (P2670) with a suitable item for a power plant unit (not sure there is one right now so maybe that does need to be added) with qualifier quantity (P1114) and any other qualifiers you think appropriate. ArthurPSmith (talk) 21:35, 9 September 2024 (UTC)

Empty page

Why Special:MostInterwikis is empty? It is very useful to me to know what is the pages with the most interwikis... 151.95.216.228 07:55, 9 September 2024 (UTC)

Presumably because there are no other language versions of Wikidata; as the message at en:Special:MostInterwikis points out, the special page really counts interlanguage links, not (other) interwiki links, so (IMHO) it’s arguably misnamed. Lucas Werkmeister (WMDE) (talk) 10:44, 9 September 2024 (UTC)
If you want to get the items with the most sitelinks, you can do that with a query. Lucas Werkmeister (WMDE) (talk) 13:58, 9 September 2024 (UTC)
Most special pages are not designed for Wikidata (phab:T245818).
As for a replacement, Special:PagesWithProp works, too. --Matěj Suchánek (talk) 15:52, 9 September 2024 (UTC)

Wikidata weekly summary #644

Deletion request for Q95440281

The painting is not at all in the style of the credited author. It even looks if the person who uploaded it on Commons maybe even painted it by themself. Carl Ha (talk) 20:34, 8 September 2024 (UTC)

@Carl Ha: an author can have several very different styles (compare Picasso first and last paintings for an obvious case) and here the style is not that different (and the theme is the same). But indeed Three master on vivid Sea (Q95440281) has no references which is bad. @Bukk, Bybbisch94: who could probably provide reliable sources. Cheers, VIGNERON (talk) 14:09, 9 September 2024 (UTC)
Yes but that painting is very low quality. There is no way a Academical trained artist of the 19th century painted something like that. There are just too many basic flaws somebody with a professional training wouldn‘t do. 2A02:8109:B68C:B400:AC37:147E:C9F3:A2E5 20:35, 9 September 2024 (UTC)
Again, there could be a lot of explanation, it could be a preparation work, a study, etc. Sadly, without references there is no way to know... Cheers, VIGNERON (talk) 09:43, 10 September 2024 (UTC)

Creating double entries

@Beleg Tâl: We have Death of an Old Pilot (Q130262449) and Joseph Henderson (1826-1890) obituary (Q114632701). I think the duplication is not needed. It seems that the rules at Wikisource are that is you add a hyperlink to the text, you have created a new "annotated" version that now requires a separate entry at both Wikidata and Wikisource. Wikisource has their own rules, but perhaps we can stop the unnecessary duplication on this end, and merge the two. A quick ruling will decide if more are going to be made. I do not see any utility, at all, in duplicating every obituary because Wikisource insists on keeping two copies. RAN (talk) 04:28, 9 September 2024 (UTC)

As cs.wikisource user I would say that's one news article. Is known another edition (in another newsapaper)? if not, this should be merged. JAn Dudík (talk) 12:27, 9 September 2024 (UTC)
Have a look at s:en:Wikisource:Wikidata#Works.
  • "Works" and "Editions" are not the same thing, and are modelled separately on WD. The obituary itself is a "work"; the copy of it in the October 8, 1890 edition of the New York Evening Post is an "edition".
  • If you only want to create one item on Wikidata, I recommend that you create the "edition" item (instance of (P31) of

version, edition or translation (Q3331189)). If you only create the "work" item, you should not link it to enWS unless it is to a Versions page or a Translations page.

  • The version of the obituary that you annotated with wikilinks, can probably be considered the same "edition" as the clean version. This is kind of a grey area and might need to be discussed at s:en:WS:S. You will note that the annotated version is NOT currently linked to Wikidata, and I personally would leave it that way.
Beleg Tâl (talk) 14:00, 9 September 2024 (UTC)
I agree with Beleg Tâl. It's maybe a bid ridiculous for mono-edition work but that the way we do it on Wikidata for ages, see Wikidata:WikiProject_Books. Also, some data are duplicated on both item and should be only in one. Cheers, VIGNERON (talk) 14:12, 9 September 2024 (UTC)
  • The only annotation is linking some of the people to their Wikidata entry. Imagine if I clipped the obit from all three editions of that days newspaper and created a Wikidata entry for each. The text has not changed. We have multiple copies of engravings from different museums because they have different wear and tear. --RAN (talk) 18:30, 10 September 2024 (UTC)

Issues with rules for WDQS graph split

In case no-one is watching that page, please note that I raised some potential issues at Wikidata talk:SPARQL query service/WDQS graph split/Rules. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:57, 10 September 2024 (UTC)

Would it be possible to have in the column "inventory number" has just the inventory numbers connected with Art Culture Museum Petrograd and not the ones connected with other institutions (in this case always the Russian Museum) that later owned the paintings? Because now it is not sortable after the inventory number of the Art Culture Museum. Thanks! Carl Ha (talk) 09:40, 1 September 2024 (UTC)

I don't know if I am understanding your question but f.ex. St. George (II) (Q3947216) has both the museums Russian Museum (Q211043) and Art Culture Museum Petrograd (Q4796761) with the values ЖБ-1698 resp. 353 as inventory number (P217) with the museums as qualifier. And we have collection (P195) with both the museums and startin years for both and their respective P217. Matthiasb (talk) 18:17, 7 September 2024 (UTC)
Yes I know, but how can I change the table on this page, so that it only shows the inventory numbers connected with Art Culture Museum? Is that possible? Carl Ha (talk) 18:20, 7 September 2024 (UTC)
Why would you want to do so? As I understood the paintng firstly was in Q4796761 and after the museum was dissolved was brought to Q211043 where it got a new inventary number. So before 1926 it was 353, later ЖБ-1698. Assuming that this informations are correct all appears good to me. Well I don't know how to query, but I guess you must restrict the query not to include inventary numbers valid before 1926 respectively only them after 1926 if you need it the other way around. Matthiasb (talk) 21:52, 7 September 2024 (UTC)
I want to sort the table after the inventory numbers of the Art Culture Museum what now is not possible. My question is exactly how to change the query this way. Carl Ha (talk) 21:57, 7 September 2024 (UTC)
Hello, help regarding queries could be found for example at
M2k~dewiki (talk) 10:14, 11 September 2024 (UTC)
M2k~dewiki (talk) 10:16, 11 September 2024 (UTC)

How to reference an old book

i wanna use c:Category:國立北京大學畢業同學錄 as reference for a statement. how do i do that? RZuo (talk) 06:16, 11 September 2024 (UTC)

You should create an item for that book. GZWDer (talk) 11:24, 11 September 2024 (UTC)

WDQS infrastructure

I am curious over currently used infrastructure for running WDQS and its costs? Where could I find this information? Zblace (talk) 18:05, 1 September 2024 (UTC)

Try here this page has a summary https://diff.wikimedia.org/2023/02/07/wikimedia-enterprise-financial-report-product-update/ Baratiiman (talk) 05:37, 2 September 2024 (UTC)
That has nothing to do with the Query Service Zblace is asking for. LydiaPintscher (talk) 07:40, 2 September 2024 (UTC)
I'm also interested in that information. It would be interesting to see expenditures over time and number of machines/VMs. So9q (talk) 08:13, 7 September 2024 (UTC)
@So9q exactly. I am much involved in few projects (some super fragile like Wikispore) and I am curious how much is invested in different components and why we are not having more (as in options) and better (as in modern) infrastructure situation now (as in recent years). I understand that running old software until full substitute is in place is a must, but do not understand the accumulated technical debt, while having such a huge finnancial reserve. Zblace (talk) 06:38, 8 September 2024 (UTC)
The WikiData Query Service (WDQS; the part that lets you do queries) should be load-balanced on 18 bare-metal servers if Grafana is any indication.
The problems Wikidata have can't be fixed by throwing more servers at the problem. The graph is already unwieldingly big, and that's the core issue. Graphs doesn't scale all that well when you get to Wikidata's size. There are other ways to store and process data however, maybe in 20 years Wikidata will run on a document-oriented database. Maxlath should have scripts to filter and import Wikidata into CouchDB IIRC for those that would like to experiment. Infrastruktur (talk) 20:35, 8 September 2024 (UTC)
also wikitech:Wikidata Query Service#Hardware is related —MisterSynergy (talk) 20:51, 8 September 2024 (UTC)
Here is the repo with the WD->CouchDB code by @Maxlath. It seems to have been archived years ago though, can anyone confirm that it is working and document the benefits of storing the information in CouchDB if any? So9q (talk) 13:05, 9 September 2024 (UTC)
I asked GPT-3.5 and it output the following:
Storing Wikidata's JSON dump in CouchDB can offer several benefits, especially when dealing with the challenges posed by large graph databases like Blazegraph. Here are some potential advantages:
  1. Scalability: CouchDB is designed to handle large amounts of data and can scale horizontally. This means you can add more nodes to your CouchDB cluster to accommodate growing data sizes without significant performance degradation.
  2. Document-Oriented Storage: CouchDB is a NoSQL database that stores data in a document-oriented format (JSON). This aligns well with the JSON dump of Wikidata, allowing for easy ingestion and retrieval of data without the need for complex transformations.
  3. Flexible Schema: CouchDB's schema-less design allows for flexibility in data structure. As Wikidata evolves, you can easily accommodate changes in the data model without needing to perform extensive migrations.
  4. Replication and Availability: CouchDB offers built-in replication features, which can enhance data availability and fault tolerance. This is particularly useful for distributed applications or when you need to maintain multiple copies of the data across different locations.
  5. RESTful API: CouchDB provides a RESTful HTTP API, making it easy to interact with the database using standard web protocols. This can simplify integration with web applications and services.
  6. MapReduce for Queries: CouchDB supports MapReduce for querying data, which can be beneficial for aggregating and processing large datasets. You can create views to efficiently retrieve specific subsets of data.
  7. Versioning and Conflict Resolution: CouchDB has built-in support for versioning and conflict resolution, which can be advantageous when dealing with collaborative data updates or when multiple sources are contributing to the dataset.
  8. Ease of Use: For developers familiar with JSON and RESTful APIs, CouchDB can be easier to work with compared to more complex graph databases. This can speed up development and reduce the learning curve.
  9. Offline Capabilities: CouchDB can be used in offline applications, allowing for local data storage and synchronization when connectivity is restored. This can be useful for mobile or distributed applications.
  10. Community and Ecosystem: CouchDB has a strong community and a variety of tools and libraries that can facilitate development and integration with other systems.
I then asked for an example how to query the Wikidata CouchDB for all documents with label = Sweden. Here is the response
As is evident in the response having an updated CouchDB instance with all the data from Wikidata would be very valuable as one could more reliably query and manage it compared to only having the dump and Blazegraph.
I suggest we ask WMF to provide a CouchDB hosted updated copy of Wikidata. I would be happy to have it updated weekly like QLever and the dumps.
So9q (talk) 13:16, 9 September 2024 (UTC)
I was surprised GPT was able to answer that as well as it did. The upside is the horizontal scalability. One major drawback hosting on a document-oriented database is that you loose the advanced query possibilities you have with SPARQL, something that is so nice that I'm sure people will be unwilling to let it go until they have to. The data will also loose it's graph nature, I have no idea about what the implications of that will be, but presumably it will impact the areas that graphs are good for. When you query with mapreduce you only get one aggregation step so it is very limited from what we are used to. Much more labor involved in processing the data into a useable format in other words. Things like following property chains will probably be impossible. No graph traversals. Don't think you can even do joins. To loose all that will be painful.
Modest federation and a new query engine should buy us plenty of time, it will probably be very long until Wikidata is forced to switch to a different model. A document-oriented database instance of Wikidata could be interesting as a research project however, whether it runs CouchDB, Hadoop or something else. Infrastruktur (talk) 16:56, 9 September 2024 (UTC)
In inventaire.io (Q32193244), we use CouchDB as a primary database, where Wikidata uses MariaDB. Like Wikidata, we then use Elasticsearch as a secondary database to search entities (such as the "label=Sweden" query above). CouchDB doesn't solve the problem of the graph queries bottleneck, and I'm very much looking forward to what Wikimedia will come up with to address the issue. The one thing I see where CouchDB could be useful to Wikidata is indeed to provide a way to mirror the entities database, the same way the npm (Q7067518) registry can be publicly replicated.
As for https://github.com/maxlath/import-wikidata-dump-to-couchdb, it was a naive implementation; if I wanted to do that today, I would do it differently, and make sure to use CouchDB bulk mode:
- Get a wikidata json dump
- Optionally, filter to get the desired subset. In any case, turn the dump into valid NDJSON (drop the first and last lines and the comma at the end of each lines).
- Pass each entity through a function to move the "id" attribute to "_id", using https://github.com/maxlath/ndjson-apply, to match CouchDB requirements.
- Bulk upload the result to CouchDB using https://github.com/maxlath/couchdb-bulk2 Maxlath (talk) 17:58, 9 September 2024 (UTC)
Speaking of mirroring the entities database. I see CouchDB as a far superior alternative to the Linked DataFragments [3] instance of Wikidata. What LDF promises and what it delivers is two different things. Downloading a set of triples based only on a single triple pattern is close to useless, it is also a bandwidth hog. If we just want to download a subset of Wikidata then CouchDB will allow us to specify that subset more precisely, and might be a welcome addition to doing the same with SPARQL CONSTRUCT or SELECT queries. Syncing a whole copy of Wikidata will also save tons of bandwidth compared to downloading the compressed dump file very week. Infrastruktur (talk) 16:54, 11 September 2024 (UTC)

Hello, I created an item for Die Ikonographie Palästinas/Israels und der Alte Orient (Q130261727), which is a series of books, or more accurately 4 volumes of a book, each contain a catalogue of ancient near eastern art from a different period. I wonder what is the right instance of (P31) for this item. From comparison with other similar series, I think the possibilities are: publication (Q732577), series (Q3511132), editorial collection (Q20655472), collection of sources (Q2122677), catalogue (Q2352616), collection (Q2668072), book series (Q277759) and research project (Q1298668). They all fit, but it seems too much to put all of them. How should I know what are the right items to choose here? פעמי-עליון (talk) 17:41, 8 September 2024 (UTC)

I guess it depends. If you create items for each of the 4 volumes, then I would suggest book series (Q277759). That only makes sense if they are needed as seperat items. If there will be no items for each volume, I would suggest publication (Q732577). catalogue (Q2352616) could be added additionally if there are numbered items in the book.
collection of sources (Q2122677), editorial collection (Q20655472) and research project (Q1298668) doesn't seem to be ontological correct. Carl Ha (talk) 20:43, 8 September 2024 (UTC)
Thank you! פעמי-עליון (talk) 16:26, 11 September 2024 (UTC)

New WikiProject CouchDb

I launched a new project Wikidata:WikiProject_Couchdb. Feel free to join and help explore CouchDb as a new type of scalable backend for Wikidata. So9q (talk) 17:10, 11 September 2024 (UTC)

More general item needed

Q122921190 is only for Zweisimmen, but there a lot of other Railroad gauge system changers. In Spain there are a lot of them.Smiley.toerist (talk) 09:04, 12 September 2024 (UTC)

Mass-import policy

Hi I suggested a new policy similar to OpenSteetMap for imports in the telegram chat yesterday and found support there.

Next steps could be:

  • Writing a draft policy
  • Deciding how many new items users are allowed to make without seeking community approval first.

The main idea is to raise the quality of existing items rather than import new ones.

I suggested a limit of 100 items or more to fall within this new policy. @nikki: @mahir256: @ainali: @kim: @VIGNERON: WDYT? So9q (talk) 12:11, 31 August 2024 (UTC)

@So9q: 100 items over what time span? But I agree there should be some numeric cutoff to item creation (or edits) over a short time period (day, week?) that triggers requiring a bot approval at least. ArthurPSmith (talk) 00:32, 1 September 2024 (UTC)
QuickStatements sometimes returns the error message Cannot automatically assign ID: As part of an anti-abuse measure, this action can only be carried out a limited number of times within a short period of time. You have exceeded this limit. Please try again in a few minutes.
M2k~dewiki (talk) 02:07, 1 September 2024 (UTC)
Time span does not really matter, intention does.
Let me give you an example: I recently imported less than 100 banks when I added Net-Zero Banking Alliance (Q129633684). I added them during 1 day using OpenRefine.
Thats ok. It's a very limited scope, we already had most of the banks. It can be discussed if the new banks which are perhaps not notable should have been created or just left out, but I did not do that as we have no policy or culture nor space to talk about imports before they are done. We need to change that.
Other examples:
  • Importing all papers in a university database or similar totaling 1 million items over half a year using automated tools is not ok without prior discussion no matter if QS or a bot account was used.
  • Importing thousands of books/monuments/whatever as part of a GLAM project over half a year is not ok without prior discussion.
  • Importing all the bridges in the Czech Republic e.g. Q130213201 during whatever time span would not be ok without prior discussion. @ŠJů:
  • Importing all hiking paths of Sweden e.g. Bohusleden (Q890989) over several years would not be ok.
etc.
The intention to import many object without prior community approval is what matters. The community is your boss, be bold when editing but check with your boss before mass-imports. I'm pretty sure most users would quickly get the gist of this policy. A good principle could be: If in doubt, ask first. So9q (talk) 05:16, 1 September 2024 (UTC)
@So9q: I'm not sure that the mention of individually created items for bridges belongs to this mass import discussion. There exist Wikidata:Notability policy and so far no one has questioned the creation of entries for physically existing objects that are registered, charted, and have or may/should have photos or/and categories in Wikimedia Commons. If a community has been working on covering and processing any topic for twenty years, it is probably not appropriate to suddenly start questioning it. I understand that what is on the agenda in one country may appear unnecessarily detailed in another one. However, numbered roads, registered bridges or officially marked hiking paths are not a suitable example to question; their relevance is quite unquestionable.
The question of actual mass importation would be moot if the road administration (or another authority) published the database in an importable form. Such a discussion is usually led by the local community - for example, Czech named streets were all imported, but registered addresses and buildings were not imported generally (but individual items can be created as needed). Similarly, the import of authority records of the National Library, registers of legal entities, etc. is assessed by the local community; usually the import is limited by some criteria.. It is advisable to coordinate and inspire such imports internationally, however, the decision is usually based on practical reasons, i.e. the needs of those who use the database. It is true that such discussions could be more transparent, not just separate discussions of some working group, and it would be appropriate to create some formal framework for presenting and documenting individual import projects. For example, creating a project page that will contain discussion, principles of the given import, contact for the given working group, etc., and the project page should be linked from edit summaries. --ŠJů (talk) 06:03, 1 September 2024 (UTC)
Thanks for chipping in. I do not question the notability of the items in themselves. The community in telegram has voiced the opinion that this whole project has to consider what we want to include and not and when and what to give priority.
Millions of our current items are in a quite sad state as it is. We might not have the man-power to keep the level of quality at an acceptable level as is.
To give you one example Wikidata currently does not know what Swedish banks are still in operation. Nobody worked on the items in question, see Wikidata:WikiProject_Sweden/Banks, despite them being imported many years ago (some are from 2014) from svwp.
There are many examples to draw from where we only have scratched the surface. @Nikki mentioned in Telegram that there are a ton of items with information in descriptions not being reflected by statements.
A focus on improving what we have, rather than inflating the total number of items, is a desire by the telegram community.
To do that we need to discuss imports whether already ongoing or not, whether very notable or not that notable.
Indeed increased steering and formality would be needed if we were to undertake having an import policy in Wikidata. So9q (talk) 06:19, 1 September 2024 (UTC)
Just as a side note, with no implications for the discussion here, but "The community in telegram has voiced" is irrelevant, I understood. Policies are decided here on the wiki, not on Telegram. Or? --Egon Willighagen (talk) 07:58, 1 September 2024 (UTC)
It is correct that policies are created on-wiki. However, it may also be fair to use that as a prompt to start a discussion here and transparent to explain if that is the case. It won't really carry weight unless the same people also voice their opinions here, but there is also no reason to belittle it just because people talked somewhere else. Ainali (talk) 08:13, 1 September 2024 (UTC)
+1. I'll add that the Wikidata channel is still rather small compared to the total number of active Wikidata editors (1% or less is my guess). Also the frequency of editors to chat is very uneven. A few very active editors/chatmembers contribute most of the messages (I'm probably one of them BTW). So9q (talk) 08:40, 1 September 2024 (UTC)
Sorry, I did not want to imply that discussion cannot happen elsewhere. But we should not assume that people here know what was discussed on Telegram. Egon Willighagen (talk) 10:36, 1 September 2024 (UTC)
Terminology matters, and the original bot policies are probably not clear anymore to the current generation Wikidata editors. With tools like OpenRefine and QuickStatements), I have the impression it is no longer clear what is "bot" and what is not. You can now easily create hundreds of items with either of these tools (and possibly others) in a editor-driven manner. I agree it is time to update the Wikidata policies around important. One thing to make clear is the distinction mass creation of items and mass import (the latter can also be mass importing annotations and external identifiers, or links between items, without creating items). -- Egon Willighagen (talk) 08:04, 1 September 2024 (UTC)
I totally agree. Since I joined around 2019 I have really struggled to understand what is okay and not when it comes to mass-edits and mass-imports. I have had a few bot requests declined. Interestingly very few of my edits have ever been questioned. We should make it simple and straightforward for users to learn what is okay and what is not. So9q (talk) 08:43, 1 September 2024 (UTC)
I agree that we need an updated policy that is simple to understand. I also really like the idea of raising the quality of existing items. Therefore, I would like the policy to recommend that, or to even make an exception for preapproval if, there is a documented plan to weave the imported data into the existing data in a meaningful way. I don't know exactly how it could be formulated, but creating inbound links and improving the data beyond the source should be behavior we want to see, whereas just duplicated data on orphaned items is what we don't want to see. And obviously, these plans need to be completed before new imports can be made, gaming the system will, as usual, not be allowed. Ainali (talk) 08:49, 1 September 2024 (UTC)
@ainali: I really like your idea of "a documented plan to weave the imported data into the existing data in a meaningful way". This is very similar to the OSM policy.
They phrase it like so:
"Imports are planned and executed with more care and sensitivity than other edits, because poor imports have significant impacts on both existing data and local mapping communities." source
A similar phrasing for Wikidata might be:
"Imports are planned and executed with more care and sensitivity than other edits, because poor imports have significant impacts on existing data and could rapidly inflate the number of items beyond what the community is able or willing to maintain."
WDYT? So9q (talk) 08:38, 5 September 2024 (UTC)
In general, any proposal that adds bureaucracy that makes it harder for people to contribute should start wtih explaining what problem it wants to solve. This proposal contains no such analysis and I do consider that problematic. If there's a rule 10,000 items / year seems to me more reasonable than 100. ChristianKl15:37, 2 September 2024 (UTC)
Thanks for pointing that out. I agree. That is the reason for me to raise the discussion here first instead of diving right into a writing a RfC.
The community in Telegram seems to agree that a change is needed and has pointed to some problems. One of them mentioned by @Nikki: was: most of the manpower and time in WMDE for the last couple of years seem to be spent on trying to avoid a catastrophic failure of the infrastructure rather than improving the UI, etc. At the same time a handful of users have imported mountains of half-ass quality data (poor quality import) and show little or no sign of willingness to fix the issues pointed out by others in the community. So9q (talk) 08:49, 5 September 2024 (UTC)
Would someone like to sum up this discussion and point to ways forward?
Do we want/need a new policy? No?
Do we want/need to adapt decisions about imports or edits to the state of the infrastructure? No?
Do we want to change any existing policies? No?
I have yet to see any clear proposals from the fearsome crowd in Telegram or anyone else for that matter. Might I have stirred this pot just to find that business as usual is what this community wants as a whole?
If that is the case I'm very happy. It means that the community is somewhat split, but that I can possibly continue working on things I find valuable instead of holding back because of fear of a few.
I dislike decisions based on fear. I have wanted to improve citations in Wikipedia for years. I have built a citation extraction tool that could easily be used to import all papers with an identifier that is cited in English Wikipedia (=millions of new items). In addition I'm going to request running a bot on Wikipedia to swop out local idiosyncratic local citation templates with citeq.
I look forward to see Wikidata grow in quantity and quality. So9q (talk) 08:20, 13 September 2024 (UTC)

There are many different discussions going on here on Wikidata. Anyone can open a discussion about anything if they feel the need. External discussions outside of Wikidata can evaluate or reflect Wikidata project, but should not be used to make decisions about Wikidata.

This discussion scope is a bit confusing. By mass import I mean a one-time machine conversion of an existing database into wikidata. However, the examples given relate to items created manually and occasionally over a long period of time. In relation to this activity, they do not make sense. If 99 items of a certain type are made in a few years, everything is fine, and as soon as the hundredth item have to be made, we suddenly start treating the topic as "mass import" and start demanding a previous discussion? That makes absolutely no sense. For this, we have the rules of notability, and they apply already for the first such item, they have no connection with "mass imports".

As I mentioned above, I would like to each (really) mass import have its own documentation project page, from which it would be clear who did the import, according to what policies, and whether someone is taking care of the continuous updating of the imported data. It is possible to appeal to mass importers to start applying such standards in their activities. It is also possible to mark existing items with some flags that indicate which specific workgroup (subproject) takes care of maintaining and updating the item. --ŠJů (talk) 18:06, 1 September 2024 (UTC)

Maybe using the existing "bot requests" process is overkill for this (applying for a bot flag shouldn't be necessary if you are just doing QS or Openrefine work), but it does seem like there should be either some sort of "mass import requests" community approval process, or as ŠJů suggests, a structural prerequisite (documentation on a Wikiproject or something of that sort). And I do agree if we are not talking about a time-limited threshold for this then 100 is far too small. Maybe 10,000? ArthurPSmith (talk) 22:55, 1 September 2024 (UTC)
There are imports based on existing identifiers - these should be documented on property talkpages (e.g. new mass import of newly created identifiers every month, usually using QS). Next big group is import of existing geographic features (which can be photographed) - these have coordinates, so are visible on maps. Some of them are in focus of few people only. Maybe document them in country wikiproject? JAn Dudík (talk) 15:49, 2 September 2024 (UTC)


My thoughts on this matter :
  • we indeed need a page (maybe a policy, maybe just a help page, a recommandation, a guideline, etc.) to document how to do good mass-import
  • mass-import should be defined in more precise terms, is it only creation? or any edits? (they are different but both could be problematic and should be documented)
  • 100 items is very low
    • we are just the 2nd of September and 3 people already created more than 100 items ! in August, 215 people created 100 or more items, the community can't process that much
    • I suggest at least 1000, maybe 10 000 items (depends if we focus only on creations or on any edits)
  • no time span is strange, is it even a mass-import if someone create one item every month for 100 months? since most mass-import are done by tools, most are done in a small period, a time span of a week is probably best
  • the quantity is a problem but the quality should also be considered, also the novelty (it's not the same thing to create/edit items following a well know model and to create a new model rom scratch, the second need more review)
  • could we act on Wikidata:Notability, mass-import should be "more" notable? or at least, notability should be more thoroughly checked?
    • the 2 previous point are based on references which is often suboptimal right now (most imports are from one source only, when crossing multiple references should be encouraged if possible)
  • the bot policies (especially Wikidata:Requests for permissions/Bot) probably need an update/reform too
  • finally, there is a general problem concerning a lot of people but there is mainly a few power-user who are borderline abusing the ressources of Wikidata ; we should focus the second before burden the second, it would be easier and more effective (ie. dealing with one 100 000 items import rather than with 1000 imports of 100 items).
Cheers, VIGNERON (talk) 09:20, 2 September 2024 (UTC)
  • I have two points to make at this stage. Firstly, I think the limit of 100 items is way too low. To give an example from my own experience, when I created items for streets of some Warsaw boroughs using OpenRefine, there were usually around 200-300 items created per borough (and Warsaw has 18 of those). I find it more harmful than beneficial to force users to go through a lenthy process with that sort of scale of work. Secondly, I would like to see a written policy draft before giving my final opinion. From what I read here, I'm still not sure if it would be ok to create, let's say 150 items per day using, for instance, Mix'n'Match or Cradle, which are hardly automated tools. Powerek38 (talk) 15:54, 2 September 2024 (UTC)
    I think when planning an import the key question to ask is "how many items would it be if replicated worldwide?" For streets, that would be a very large number, so I think there would be valid concerns about that. While I'd not argue for a moratorium on new ideas while we back-fill, we don't want hugely inconsistent data coverage, so a plan how the idea would be replicated systemwide is key. Vicarage (talk) 16:02, 2 September 2024 (UTC)
    @Vicarage: it seems too restrictive to only think in term of instance of (P31). Not all streets are the same ; in some places, there is easily 2-3 good references even for small streets, in other places it's hard to find even one reference (same goes for other aspects, like do we already have items that needs this streets, like buildings located or people born there). The first should be imported, not the second. It's inconsistent if you look at the P31, but it's consistent if you look at data quality. Cheers, VIGNERON (talk) 16:42, 2 September 2024 (UTC)
    Perhaps the 4000odd Warsaw streets were notable, but its probably an order of magnitude more than the notable inhabitants of such a historic city. Its one reason why a mass import needs to be documented to convince others of its merit. There is a danger that people sneak in trivial things under the 'structural need' argument, as part of pet project, and we get the inconsistent data quality neither of us want. Its a problem that bots can pour data in, but humans to decide on quality. Vicarage (talk) 17:01, 2 September 2024 (UTC)
    I totally agree. I have never used a street item. Is there a WikiProject where streets are discussed as noteworthy of import? Are the current street items used by any other Wikimedia project?
    The same goes for books and papers. Do we really want to import all books in existence? All papers ever published in scientific journals? I would say no. So9q (talk) 08:56, 5 September 2024 (UTC)
    A lot of streets have a Commonscat, since there have been taken pictures and uploaded on Commons, some also have articles. According to Wikidata:Notability, street Objects with Commonscat and/or article(s) should be notable. Also see
    @Nortix08, Z thomas, Stefan Kühn, Triplec85, PantheraLeo1359531: for information. M2k~dewiki (talk) 09:52, 5 September 2024 (UTC)
In my opinion, items for streets are very useful, because there are a lot of pictures with categories showing streets. There are street directories. Streets often have their own names, are historically significant, buildings/cultural heritage monuments can be found in the respective street via street categories and they are helpful for cross-referencing. So please keep items for streets. Triplec85 (talk) 10:26, 5 September 2024 (UTC)
As streets are very important for infrastructure and for structuring of villages, towns, cities, yes, they are notable. Especially if we consider the historical value of older streets or how often they are used (in the real world). Data objects of streets can be combined with many identificators from origanizations like OpenStreetView and others. And they are a good element for categorizing. It's better to have images categorized in Hauptstraße (Dortmund) (with a data object that offers quick facts) than only Dortmund. Also, streets are essential for local administrations, which emphasizes the notability. And you can structurize where the street names come from (and how often they are used) etc. etc., with which streets they are connected etc. For me, I see many good reasons for having lists of streets in Commons, Wikidata, whatever; it gives a better overview, also for future developments, when streets are populated later or get cultural heritage monuments, or to track the renaming of streets... --PantheraLeo1359531 (talk) 10:37, 5 September 2024 (UTC)
Regarding the distribution / portion of content by type (e.g. scholary articles vs. streets / architectal structure) see this image:
Content on Wikidata by type
M2k~dewiki (talk) 10:41, 5 September 2024 (UTC)
Better to consider the number of potential items. ScienceOpen has most studies indexed and stands at 95 M items (getting to the same degree of completion would unlock several use-cases of Wikidata like Scholia charts albeit not substituting all the ones the mentioned site can be used for as the abstract text content as well as altmetrics scores etc are not included here). 100 M isn't that large and I think there are more streets than there are scholarly articles – having a number there would be nice.
-
I think of Wikidata usefulness beyond merely linking Wikipedia articles in this way: what other widely-used online databases exist and can we do the same but better and more? Currently, Wikidata can't be used to fetch book metadata into your ebook reader or song metadata into your music player/library, can't be used for getting food metadata in food tracking apps, can't tell you the often problematic ingredients of cosmetics/hygiene products, can't be used to routinely monitor new studies of a field or search them, or pretty much anything else that is actually useful to real people so I'd start working on the data coverage of such data first before importing lots of data with unknown questionable potential future use or manual item creation/editing. If we got areas covered that people actually use and need, then we could still improve areas where no use-cases yet exist or which if at all only slightly improve the proprietary untransparent-algorithm Google Web search results (that don't even index files & categories on Commons). I'd be interested in how other people think about WD's current and future uses but discussing that may be somewhat outside the scope of this discussion. Prototyperspective (talk) 13:35, 7 September 2024 (UTC)
I use streets a lot to qualify locations, especially in London - see Ambrose Godfrey (Q130211790) where the birth and death locations are from the ODNB. - PKM (talk) 23:16, 5 September 2024 (UTC)
Disagree on books and papers then – they need to be imported to enable all sorts of useful things which are not possible or misleading otherwise such as the statistics of Scholia (e.g. research field charts, author timeline/publications, etc etc).
I think papers are mostly a (nearly-)all-or-nothing thing – they aren't that useful before where I don't see much of a use-case. Besides charts, one could query them in all sorts of interesting ways once they are fairly complete, embed results of queries (e.g. studies by author sortable by citations & other metrics on the WP article about the person).
When fairly complete and unvandalized, they could also be analyzed, e.g. for AI-supported scientific discovery (there's studies on this) and be semantically linked & queried and so on.
It's similar for books. I don't know how Wikidata could be useful in that space if it doesn't contain at least as many items with metadata than other websites. For example one could then fetch metadata from Wikidata instead of from these sites. In contrast to studies, I currently see an actual use-case for WD items for streets – they may be useful at some point but I don't see why now or in the near future or how. Prototyperspective (talk) 00:31, 6 September 2024 (UTC)

Hello, from my point of view, there would be some questions regarding such a policy, like:

  • What is the actual goal of the policy? What is is trying to achive? How will this goal be achivied?
  • Is it only a recommendation for orientation (which easily can be ignored)?
  • Is this policy realized in a technical manner, so users are blocked automatically? Currently, for example QuickStatements already implements some policy and disables the creation of new objects with the error massage: Cannot automatically assign ID: As part of an anti-abuse measure, this action can only be carried out a limited number of times within a short period of time. You have exceeded this limit. Please try again in a few minutes. How will the new policy be different from the current anti-abuse policy?
  • Who will control and decide, if the policy if followed/ignored and how? What are the consequences if the policy is ignored?
  • Does this policy only include objects without sitelinks or also objects with sitelinks to any language version or project like wikisource, commons, wikivoyage, wikibooks, ...?
  • Does this policy only concern the creation of new objects or also the modification of existing objects?
  • How is quality defined regarding this policy? How and by whom will be decided if a user and/or user task is accepted for a higher limit?
  • There are always thousands of unconnected articles and categories in any language version and project (including commons), for example
  • https://wikidata-todo.toolforge.org/duplicity/#/
AutoSuggestSitelink-Gadget

Who will connect them to existing objects (if existing) or create new objects if not yet existing and when (especially, if there is a new artificial limit to create such objects)? Will someone implement and operate a bot for all 300 wikipedia language versions and all articles, all categories (including commonscats), all templates, all navigation items, ... to connect sitelinks to existing objects or create new objects if not yet existing?

From my point of view, time and ressources should be spent on improving processes and tools and help, support and educate people in order to improve data quality and completeness. For example, in my opionion the meta:AutosuggestSitelink-Gadget should be activated for all users on all language versions per default in the future.

Some questions and answers which came up over the last years (in order to help, educate and support users) can be found at

M2k~dewiki (talk) 19:24, 2 September 2024 (UTC)
For example, the functionality of
could also be implemented as bot in the future by someone. M2k~dewiki (talk) 21:13, 2 September 2024 (UTC)
This wasn't written but when the discussion started here, but here is a summary of the growth of the databases, that this policy partly addresses: User:ASarabadani (WMF)/Growth of databases of Wikidata. There are also some relevant links on Wikidata:WikiProject Limits of Wikidata. For an extremely high overview summary, Wikidata is growing so quick that we will hit various technical problems and slowing down the growth (perhaps by prioritizing quality over quantity) is a way to find time to address some of the problems. So the problem is wider than just new item creations, but slowing that would certainly help. Ainali (talk) 08:09, 3 September 2024 (UTC)
This has been also recently discussed at
Possible solutions could be:
M2k~dewiki (talk) 08:18, 3 September 2024 (UTC)
Just a note that the split seems to have happened now, so some more time is bought.
Ainali (talk) 21:14, 3 September 2024 (UTC)
Please note that "Cannot automatically assign ID: As part of an anti-abuse measure, this action can only be carried out a limited number of times within a short period of time. You have exceeded this limit. Please try again in a few minutes." is not a error message or limit imposed from QuickStatement; it is a rate limit set by Wikibase, see phab:T272032. QuickStatement should be able to run a large batch (~10k commands) in a reasonable (not causing infra issue) speed. If QuickStatements does not retry when rate limit is hit, I consider it a bug; batches should be able to be run unattended with error recovery mechanism GZWDer (talk) 14:50, 4 September 2024 (UTC)
Thanks for linking to this report. I see that revisions is a very large table. @Nikki linked to this bot that does 25+ single edits to the same astronomical item before progressing to the next. This seems very problematic and given the information in that page, this bot should be stopped immediately. So9q (talk) 09:12, 5 September 2024 (UTC)
For what reason? That he is doing his job? Matthiasb (talk) 00:53, 6 September 2024 (UTC)
No, because it is wasting resources when doing the job. The bot could have added them in one edit, and instead it added unnecessary rows to the revisions table, which is a problem. Ainali (talk) 06:00, 6 September 2024 (UTC)
Well, non-bot users aka humans cannot add those statements in one single edit, right? That said, human editors by definition are wasting resource and probably should not edit at all? Nope. But that's only one side of the medal. It might be a surprise: the limes of the number of rows in version histories is ∞. For such items like San Francisco (Q62) that might go faster and for such times like asteroids slower because of no other editors are editing them. If we talk about the source of a river it cannot edited in one single edit, but we would have two for latitude and longitude, one for the altitude, one for a town nearby, the county, the state and the country each, at least. Several more if editors forgot to include proper sourcing and the language of the source. Until this point hundred of millions statements with "0 sources" are yet missing. So where is the problem? The edit behaviour of the single editor or did we forgot something to regulate or to implement back in 2015 or so what affects edits of every single user? Matthiasb (talk) 22:31, 7 September 2024 (UTC)
No, to me, a waste is when resources could have been saved but were not. So we're not talking manual editing here (which also accounts for a lot less edits in total). And while I agree that an item in theory may need almost limitless revisions, practically, we lack the engineering for it now. So let's do something smarter than Move Fast and Break Things (Q18615468). Ainali (talk) 07:30, 8 September 2024 (UTC)

I would echo the points raised above by M2k~dewiki. My feeling is that when we actually think about the problem we are trying to solve in detail, there will be better solutions than placing arbitrary restrictions on upload counts. There are many very active editors who are responsible and actually spend much of their time cleaning up existing issues, or enriching and diversifying the data. Many GLAMs also share lots of data openly on Wikidata (some exclusively so), helping to grow the open knowledge ecosystem. To throttle this work goes against everything that makes Wikidata great! It also risks fossilising imbalances and bias we know exist in the data. Of course there are some folks who just dump masses of data into Wikidata without a thought for duplication or value to the wider dataset, and we do need a better way to deal with this but I think that some automated checks of mass upload data (1000s not 100s) to look for potential duplication, inter connectivity and other key indicators of quality might be more effective at flagging problem edits and educating users, whilst preserving the fundamental principals of Wikidata as an open, collaborative data space. Jason.nlw (talk) 08:34, 3 September 2024 (UTC)

We definitely need more clarity on the guidelines, thanks for putting that up! Maybe we can start with some very large upper boundary so we can at least agree in the principle and enforcement tooling? I suggest we start with a hard limit for 10k new items / month for non-bot accounts + wording saying creation of over 1000 items per month should be preceded by a Bot Request if items are created based on the same source and any large scale creation of items (e.g. 100+ items in a batch) should be at least discussed on Wiki, e.g. on the related WikiProject. Also, I think edits are more complicated than new item creations; author-disambiguator.toolforge.org/, for example, allows users to make 100k edits in a year semi-manually. It may be a good idea for simplicity to focus only on item creation at this point. TiagoLubiana (talk) 21:00, 3 September 2024 (UTC)

User statistics can be found for example at:
Recent batches, editgroups, changes and creations can be found for example at:
Also see
M2k~dewiki (talk) 21:41, 3 September 2024 (UTC)
Including bots:
M2k~dewiki (talk) 21:45, 3 September 2024 (UTC)
@DaxServer, Sldst-bot, Danil Satria, LymaBot, Laboratoire LAMOP: for information. M2k~dewiki (talk) 13:23, 4 September 2024 (UTC)
@Kiwigirl3850, Romano1920, LucaDrBiondi, Fnielsen, Arpyia: for information. M2k~dewiki (talk) 13:24, 4 September 2024 (UTC)
@1033Forest, Brookschofield, Frettie, AdrianoRutz, Luca.favorido: for information. M2k~dewiki (talk) 13:24, 4 September 2024 (UTC)
@Andres Ollino, Stevenliuyi, Quesotiotyo, Vojtěch Dostál, Alicia Fagerving (WMSE): for information. M2k~dewiki (talk) 13:24, 4 September 2024 (UTC)
@Priiomega, Hkbulibdmss, Chabe01, Rdmpage, Aishik Rehman: for information. M2k~dewiki (talk) 13:24, 4 September 2024 (UTC)
@Cavernia, GZWDer, Germartin1, Denelson83, Epìdosis: for information. M2k~dewiki (talk) 13:24, 4 September 2024 (UTC)
@DrThneed, Daniel Mietchen, Matlin: for information. M2k~dewiki (talk) 13:24, 4 September 2024 (UTC)
Just a first quick thought (about method and not the content): apart from pings (which are very useful indeed), I think that the best place to make decisions on such an important matter would be an RfC; IMHO a RfC should be opened as soon as possible and this discussion should be moved to in its talk page, in order to elaborate there a full set of questions to be then clearly asked to the community. Epìdosis 13:49, 4 September 2024 (UTC)
Also see
M2k~dewiki (talk) 16:55, 4 September 2024 (UTC)
I don't really understand what the proposal here is or what problem it is trying to solve, there seem to be quite a number of things discussed.
That said, I am absolutely opposed to any proposal that places arbitrary limits on individual editors or projects for new items or edits simply based on number rather than quality of edits. I think this will be detrimental because it incentivises working only with your own data rather than contributing to other people's projects (why would I help another editor clean up their data if it might mean I go over my quota so can't do the work that is more important to me?).
And in terms of my own WikiProjects, I could distribute the new item creation to other editors in those projects, sure, but to what end? The items still get created, but with less oversight from the most involved and knowledgeable editor and so likely with greater variability in data quality. How is that a good thing? DrThneed (talk) 23:50, 4 September 2024 (UTC)
+1 Andres Ollino (talk) 01:22, 13 September 2024 (UTC)
Thanks for chipping in. Your arguments make a lot of sense to me. The community could embrace your view and e.g. tell the WMF to give us a plan for phasing out Blazegraph and replace it with a different backend now that Orb and QLever exist. See also Wikidata:WikiProject Couchdb
Also the revision issue is created because mediawiki is set up like any other project wiki. But it really doesn't have to be keeping all the revisions in memory. A performance degradation might be okay to the community for revisions older than say 100 days.
We a a community can either walk around in fear and take no decisions or take decisions based on the current best available information.
My suggestion to the community is to start discussing what to do. The answer might be: nothing. Maybe we don't care about Blazegraph now that we have good alternatives? Maybe it's not for us to worry about because WMF should just adapt to whatever the community is doing?
I would like to build a bot that imports all papers currently cited in Wikipedia. That would amount to at least a couple of million items. This I have wanted since 2021 but because of fear in the community I have waited patiently for some time and helped trying to find substitutes for Blazegraph.
Are we okay with that proposal now that we already split the graph? It would mean a great deal to Abstract Wikipedia to be able to use sources already in Wikidata.
Should I ask permission anywhere before go It also improves Scholia, etc. In all a very good idea if you ask me.ing ahead and writing a bot requests? So9q (talk) 08:05, 13 September 2024 (UTC)
  • For some reason I think that some/more/many of the discutants on this have a wrong understanding on how Wikidata works – not technically, but in coresspondence with other WM projects as Wikipedia. In fact we have the well established rule that each Wikipedia article deserves an item. I don't know how man populated places articles LSJ bot created or how many items we have which are populated places, but citing earlier discussions in the German Wikipedia years ago about how far Wikioedia can expand I calculated the the number of populated places on earth might exceed 10 millions. So would creating 10 million WParticles on populated places cause a mass upload on Wikidata since every WP article in any language is to be linked to other language versions via Wikidata? (I also kind of calculated the possible number of geographic features on earth with nearly two millions of it in the U.S. alone. I think it are up to 100 million on the earth totally. We consider all of them notable, so at some point we will have items on 100 million geoagraphic features caused by interwiki. Is this mass upload? Shall we look on cultural heritage? So, in the U.S., the United Kingdom, France and Germamy together exist about one million buildings which are culturally protected in one way or another. At this time some 136.080 of them or elsewhere in the world have articles in the German WP, and yes, they are linked in Wikidata. Counting all other countries together some more millions will add to this.
  • When I started in German WP, some 18 years ago, it hat some 800.000 articles or so. At that time we hat users who tried hardly to refrain the sice of the German WP back to 500.000 articles. They failed. At some point before the end of this year the number of articles in the German WP will exceed 3.000.000, with the French WP following to the same marker some four or five months later. Though many of those items might already exist in one language version or more, many of those articles to the three million mark might not have a Wikidata item yet. Are these, say, 100.000 items mass upload?
  • And, considering a project I am involved with, GLAM activities on Caspar David Friedrich (Q104884) that is, the Hamburg Caspar David Friedrich. Art for a New Age Hamburger Kunsthalle 2023/24 (Q124569443) featured some 170 or so works of the artist, all of the considered notable in the WP sense of notability but we need all of them in Wikidata anyways for reasons of proveniency research for which WD is essential. So if on the way I am creating 50, 70 or 130 items and linking them up with their image files on Commons and individual WP languages which can be found, so I am committing the crime of mass upload, even if the process take weeks and weeks because catalogues of different museums use different namings and identifying can be done only visually?

Nope. When talking about the size of Wikidata then we must take it as given that in the 2035 Wikidata is at least ten times bigger then today. If we are talking about better data, I agree with this as an important goal, which needs to add data based on open sources which do not rely on Wikis but original data from the source, e.g. statistical offices and which get sourced in a proper way. (But when I called for sources some months ago one laughed and said Wikidata is not about sourcing data but collecting them. Actually that user should have been banned for eternity and yet another three days.) Restricting upload won't work and even might prevent adding high quality data. --Matthiasb (talk) 00:48, 6 September 2024 (UTC)

@Matthiasb In an ideal world, your points make a lot of sense. However, the technical infrastructure is struggling (plenty of links to those discussions above), and if we put ideals over practicalities, then we will bring Wikidata down before the developers manage to solve them. And then we will definitely not be ten times bigger in 2035. Hence, the thoughts about slowing the growth (hopefully only temporarily). If we need to prioritize, I would say that maintaining sitelinks goes above any other type of content creation and would be the last one to slow down. Ainali (talk) 06:08, 6 September 2024 (UTC)
No objections to the latter part. It was in the second half of the 2000 years when Mediawiki was struggling by its own success, users getting time outs all the time. Yet Tim Starling told us something like "dont't care about resources". Well he said something slightly differrent, but I don't remember the actual wording. The core of his remarks was that we as a community should not make our head aches on it. When it comes to lacking resources it would be his work and that of the server admins and technical admins to fix it. (But if it got necessary to act then we should do what they say.) Matthiasb (talk) 11:31, 6 September 2024 (UTC)
You probably refer to these 2000s quotes: w:Wikipedia:Don't worry about performance. --Matěj Suchánek (talk) 17:29, 6 September 2024 (UTC)
Maybe we need the opposite for Wikidata?
If that mindset is carrying over to use of automated tools (e.g. to create every tree in OpenStreetMap, there are 26,100,742 as of 2024-09-07, and link them to other features in OSM/Wikidata) that would very quickly become totally technically unsustainable. Imagine every tree in OSM having a ton of statements like the scientific articles. That quickly becomes a mountain of triples.
I'm not saying we could not do it, perhaps even start today, but what is the combined human and technical cost of doing it?
What other items do we have to avoid importing to avoid crashing the system?
What implications would this and other mass-imports have on the current backend?
Would WMF split the graph again? -> Tree subgraph?
How many subgraphs do we want to have? 0?
Can they easily (say in QLever in less than an hour) be combined again or is that non-trivial after a split?
Is a split a problem in itself or perhaps just impetus for anyone to build a better graph backend like QLever or fork Blazegraph and fix it?
We need to discuss how to proceed and perhaps vote on new policies to avoid conflict and fear and eventually perhaps a total failure of the community with most of the current active users leaving.
Community health is a thing, how are we doing right now? So9q (talk) 08:30, 7 September 2024 (UTC)
Yes, thank you @Matěj Suchánek! --Matthiasb (talk) 11:22, 7 September 2024 (UTC)
Do we have a community? Or better asked, how many communities do we have? IMHO there are a least three different commmunites:
  • people formerly being active on Wikipedia who came over while their activity as interwiki bot owner wasn't needed anymore in Wikipedia; most of them will likely have tens of thousands of edits each year.
  • Wikipedia users occasionally active in Wikidata; some only might fix issues, some other might prepare further usage of Wikidata in their own Wikipedia. Most of them have from several hundreds or a few thousand edits.
  • Users which don't fit in the former two groups but use Wikidata for some external project, far away from WMF. I mentioned above groups of museums world-wide for whom WD is a easy accessable data bass providing the infrastructure need in provenience research. I don't think that this group is big, and they might edit selected items. Probably hundreds to few thousands edits. This group might or might not colloborate with the former.
Some months back I saw a visualization of how in the English Wikipedia WikiProjects create sub-communities, part of them overlapping, other not overlapping at all, about 50 of theme are of notable size. Maybe in this three classes of communities as I broke it down above several subcommunities exists also a set of sub-communities, with more or less interaction. I can't much comment your other questions. I don't know about OSM more than looking up the map. I have no clue how wikibase works.
Peeking over the horizon we see some performance issues at Commons for some time now. People begin to turn away from Commons it seems because the for example have been photographing at an event hundreds of photographs but batch upload doesn't work. Wikinews editors are waiting on the uploads which do not come. Well, they won't wait of course. They won't write articles on those events. So the commons issue also affects Wikinews. By the way, restrictions drive away users as well. That's the reason why or how German Wikiquote has killed itself several months ago.
As I said, I don't know what and which effects the measures you mentioned will have on Wikidata users and on users of other WMF projects and on this "external" user community I mentioned above. Matthiasb (talk) 11:58, 7 September 2024 (UTC)
And just another thought: If we want to have better data we should prohibit uploading data based on some Wikipedia only and maybe even remove statements sourced with WP only, after some transition, say end of 2026. We don't need to import population data for U.S. settlements, for example, from any Wikipedia, if the U.S. Census Bureau is offering ZIP files for every state containing all that data. (However the statement URL does not need a source as it is its source itself.) We should also enforce that users add the language of the sources, many users are neglecting it. (And I see some wrong directed mentality that "english" as source language is not needed at all but Wikidata isn't an english language database, is it?) Matthiasb (talk) 14:05, 7 September 2024 (UTC)
I totally agree with @Ainali. We need a healthy way for this community to operate and make decisions that take both human and technical limits into account. So9q (talk) 08:16, 7 September 2024 (UTC)
Rather than attempting to put various sorts of caps on data imports, I would put the priority on setting up a process to review whether existing parts of Wikidata really ought to be there or if they should rather be hosted on separate Wikibases. If the community decides that they should rather not be in Wikidata, they'd be deleted after a suitable transition period. For instance, say I manage to create 1 million items about individual electrical poles (spreading the import over a long period of time and done by many accounts from my fellow electrical pole enthusiasts). At some point, the community needs to wake up and to make a decision about whether this really ought to be in Wikidata (probably not). It's not something you can easily deal with in WD:RFD because it would be about many items, created from many different people over a long time, so I would say there should be a different process for such deletions. At the end of such a process, Wikidata's official inclusion guidelines would be updated to mention that the community has decided that electrical poles don't belong in Wikidata (except if they have sitelinks or match other criteria making them exceptional).
The reason why I'd put focus on such a process is that whatever caps you put on imports, people will find ways to work around them and import big datasets. If we work with the assumption that once it's in Wikidata, it deserves to stay there for eternity, it's a really big commitment to make as a community.
To me, the first candidate for such a process would be scholarly articles, because I'm of the opinion that they should rather be in a separate Wikibase. This would let us avoid the query service split. But I acknowledge that I'm not active on Wikidata anymore and may be out of touch on such topics (I stopped being active precisely because of this disagreement over scholarly articles) − Pintoch (talk) 11:44, 8 September 2024 (UTC)
@Pintoch Does a regular deletion of items really reduce the size of the database? Admins can still view all revisions, suggesting that this procedure would not mitigate the problem. Ainali (talk) 12:22, 8 September 2024 (UTC)
According to
currently we have
  • 2,2 million items marked as deleted (but can be "restored", i.e. made visible again for everyone, not only for admins)
  • 4,4 million items which currently are redirects
  • 11 million omitted Q-IDs, i.e. which have not been assigned (therefore we have QIDs > 130 million, but only 112,3 million objects)
M2k~dewiki (talk) 12:31, 8 September 2024 (UTC)
From my point of view deletion of items increases the size, since also the history of deletions will be stored (who marked the object as deleted and when, addtional comments, ...) M2k~dewiki (talk) 12:44, 8 September 2024 (UTC)
@Ainali: deleting entities removes their contents from the query service. If the community decided to delete the scholarly articles, the WDQS split wouldn't be needed anymore. It would also remove it from the future dumps, for instance. The size of the SQL database which underpins MediaWiki is much less of a concern. − Pintoch (talk) 14:09, 8 September 2024 (UTC)
@Pintoch: Less, but not by much if I read this problem statement correctly. Ainali (talk) 18:40, 8 September 2024 (UTC)
@Ainali: my bad, I hadn't read this yet. Indeed, this is also concerning. I would say it should be possible to remove all revisions from a big subset of items (such as scholarly articles) if they were to be deleted in a coordinated way. That should surely help quite a bit. − Pintoch (talk) 18:43, 8 September 2024 (UTC)
I agree, that would help. A separate Wikibases with scholarly papers would probably not take a lot of resources or time to set up.
If anyone wants to go that route I recommend against choosing federated properties.
Also authors and similar information needed besides the papers themselves should be kept in Wikidata.
Such a Wikibases could set up in less han a month. The import would take longer though because Wikibase is rather slow. (Think 1 item a second so 46 mio seconds = 532 days) So9q (talk) 09:45, 11 September 2024 (UTC)
OpenAlex has about 250 mio papers indexed, you do the math how long an import of them all would take 😉 So9q (talk) 09:47, 11 September 2024 (UTC)
If the import is done based on a SQL dump from Wikidata the loading would be much faster. Perhaps less than a month.
But AFAIK access to the underlying mariadb is not normally available to users of wikibase.cloud.
Also I'm not sure we have a sql dump of the Wikidata tables available anywhere currently. So9q (talk) 09:53, 11 September 2024 (UTC)
I don't think it would be good to transfer them over the Internet instead of copying them physically. That could be done by cloning the hard drives for example, and what I proposed here for Wikimedia Commons media could also be made possible for Wikidata. So it would take a few days if done right. One can e.g. use hashsums to verify things are indeed identical but if it's done via HDD mirroring then it should be anyway...a problem there would be it would contain lots of other Wikidata items at first which would have to be deleted afterwards if they can't be cloned separately (which they probably could if those scholarly items are on separate storage disks). Prototyperspective (talk) 10:43, 11 September 2024 (UTC)

IMHO, of course we can stop mass import or even mass editing, if there is an inminent risk of failure of the database/systems. Anyway, I am sure that the tech team can temporarly lock the database or increase maxlag, etc, in such urgent situation, if community is slow taking decisions. But I don't think that the stuff I am reading in this thread are long term solutions (removing items, removing revisions, increasing notability standards, etc). I think Wikidata should include all scholar articles, all books, all biographies and even all stars in the galaxy, everything described in reputable sources. I think that Wikidata (and Wikipedia), as an high quality dataset curated by humans and bots, is used more and more in AI products. Wikidata is a wonderful source for world modelling. Why don't we think some kind of partnership/collaboration/honest_synergy with those AI companies/research_groups to develop a long term solution and keep and increase the growth of Wikidata instead of reducing it? Thanks. Emijrp (talk) 14:54, 9 September 2024 (UTC)

Yes, agreed. Everyone is rushing to create insanely big AI models. Why can't Wikimedia projects have a knowledgebase about everything? Don't be conservative, or it will be like Abstract Wikipedia that is made obsolete by language models even before the product launch! And we must be ready for next emergency like COVID-19 pandamic, in which lots of items must be added and accessed by everyone in emergency! And splitting WDQS into two means our data capacity is potentially doubled, nice! Midleading (talk) 17:30, 9 September 2024 (UTC)
Interesting view that AW is made obsolete by AI models. The creators seem to think otherwise, ie it's a tool for people of marginalized languages that are not described by mountains of text anywhere to gain access to knowledge currently in Wikipedia.
It's also a way to write Wikipedia articles in one place in abstract forms and then generate the corresponding language representations. As such it is very different and might be complementary to GenAI. Google describes reaching any support for 1000 language as a challenge because of lack of readily available digital resources in small languages or dialects (even before reaching support for 300).
I'm predicting it will get increasingly harder as you progress toward supporting the long tail of smaller/minority languages. So9q (talk) 08:48, 10 September 2024 (UTC)
I agree with Midleading except that I don't think Wikidata is currently useful in emergencies. Do you have anything backing up ie it's a tool for people of marginalized languages that are not described by mountains of text anywhere? That would be new to me. It's also a way to write Wikipedia articles in one place in abstract forms doesn't seem to be what AW is about and is not scalable and not needed: machine translation of English (and a few other languages of the largest WPs) is good enough at this point that this isn't needed and if you're interested in making what you describe real, please see my proposal here (AW is also mentioned there). Also as far as I can see nothing supports the terminology of "marginalized languages" instead of various languages having little training data available. I think the importance of a language roughly matches how many speak them and Wikimedia is not at well-supporting the largest languages so I don't see why focus on small languages in particular would be due...and these could also be machine translated into once things progress further and models and the proposed post-machine-translation system improve. I think AW is like a query tool but unlike contemporary AI models reliable, deterministic, transparent, and open/collaborative so it's useful but I think not nearly as useful as having English Wikipedia available in the top 50-500 languages (next to their language Wikipedias, not as a replacement of these). Prototyperspective (talk) 16:46, 10 September 2024 (UTC)
Who said it will be a replacement? I have followed AW quite closely and have never heard anyone suggesting it. Please cite your sources. Ainali (talk) 06:58, 11 September 2024 (UTC)
There I was talking about the project I'm proposing (hopefully just co-proposing), I see how especially the italics makes it misunderstandable. AW is not about making English Wikipedia/the highest-quality article available in the top 50-500 languages so it's not what I was referring to. Prototyperspective (talk) 10:47, 11 September 2024 (UTC)
Ah, I see. I am also curious what makes you think that AW is not about writing Wikipedia articles in an abstract form? As far as I have understood it, that is the only thing it is about (and why we first need to build WikiFunctions). Ainali (talk) 14:15, 11 September 2024 (UTC)
Why would it be – please look into Wikifunctions, these are functions like exponent of or population of etc. You could watch the video on the right for an intro. The closest thing it would get to is have short pages that say things like London (/ˈlʌndən/ LUN-dən) is the capital and largest city of both England and the United Kingdom, with a population of 8,866,180 in 2022 (and maybe a few other parts of the WP lead + the infobox) which is quite a lot less than the >6,500 words of en:London. I had this confusion about Abstract Wikipedia as well and thought that's what it would be about and it could be that many supporters of it thought so too...it's not a feasible approach to achieving that and I think it's also not its project goal. If the project has a language-independent way of writing simple Wikipedia-like short lead sections of articles then that doesn't mean there's >6 million short articles since all of them would need to be written anew and it can't come close to the 6.5 k words of the ENWP article which is the most up-to-date and highest-quality for reasons that include that London is a city in an English-speaking country. Wikifunctions is about enabling queries that use functions like 'what is the distance between cities x and y'. That there is this project should not stopgap / delay / outsource large innovation and knowledge proliferation that is possible right now. Prototyperspective (talk) 15:27, 11 September 2024 (UTC)
As far as I have understood it, WikiFunctions is not the same as Abstract Wikipedia. It is a tool to build it, but then it might be built somewhere else. Perhaps on Wikidata in connection to the items, where the sitelinks are, or somewhere else completely. So just lack of evidence that you cannot do it on WikiFunctions now does not mean it will not be done at all in the future. Also, we don't need to write each abstract article individually. We can start with larger classes and then refine as necessary. For example, first we write a generic for all humans. It will get some details, but lack most. Then that can be refined for politicians or actors. And so on, pending on where people have interest, down to individual detail. I might have gotten it all wrong, but that's my picture after interviewing Denny (audio in the file). Ainali (talk) 15:38, 11 September 2024 (UTC)
It seems many people are still interested in Abstract Wikipedia. It's still not launched yet, may I have best wishes with it. What Abstract Wikipedia is able to show people is just the content in Wikidata, plus some grammar rules at WikiFunctions and maybe additional structured data in Abstract Wikipedia itself (why don't store them in Wikidata?). At Wikidata we have so many statements, but it's still not comparable to the number of sentences on a Wikipedia article. And the competitor, LLM, already imported very detailed information from the web, and it can tell some facts that are so detailed that only a few sentences in a section of the Wikipedia article mentioned. If Abstract Wikipedia can only generate stub-class articles, I would consider it a complete failure. If some articles on Abstract Wikipedia become a featured article, will it automatically become a featured article in every language? I highly doubt it can - because there aren't enough labels set in Wikidata. An item has much lower probability of having a label if it doesn't have a Wikipedia article. Ukraine (Q212) has 332 labels, but history of Ukraine (Q210701) has only 73 labels, human rights in Ukraine (Q4375762) only 18 labels. So, the "featured article" of Abstract Wikipedia in a small language will be, most likely, a large amount of content fallback to English, but being processed according to a non-English grammar, resulting in unreadable English. Anyway, the conclusion is that the quantity of data in Wikidata is still not comparable to LLM. So, why are we doubting that people must stop contributing data to Wikidata because it's "full"? Also, it's somewhat discriminating that English content is imported in Wikidata, and then it's "full", content in other languages can't be imported. Midleading (talk) 15:23, 12 September 2024 (UTC)
AW will be using the labels on the items, yes, but most of the text generation will be using lexemes. Yet, it will be able to show more than content in Wikidata, it is also planned to do inferences from that, using SPARQL or other functions. And I think you may have different expectations of the length of the articles that will be generated. I think it is highly unlikely that it will be anything compared to a featured article. That was never the goal. (And for the question if it would become featured in all languages, obviously no, such decisions cannot be enforced on local projects, they have the freedom to set such requirements themselves.) Ainali (talk) 07:18, 13 September 2024 (UTC)
I find claiming an open wiki project has become obsolete because of (proprietary) LLMs very dangerous. And promoting wiki projects as the place-to-go during an emergency, too. --Matěj Suchánek (talk) 16:23, 11 September 2024 (UTC)
I agree. In an emergency I go to trusted government sources. I expect them to be domain experts and guide the public in a way that contributes to the best possible health for everyone e.g. during COVID-19 I read in 1177.se which has expert reviewed content from the Swedish state agencies.
I also expect the government agencies to uphold the integrity and quality in their systems and punish anyone endangering the resource or spreading e.g. misinformation.
I don't expect that of Wikipedia in the same way. It's not supposed to be trusted per se. It's based on sources, but I don't know how many readers/users verified that the content corresponds to the statements in the source. 🤷‍♂️ So9q (talk) 07:52, 13 September 2024 (UTC)

Dump mirrors wanted

WMF is looking for more mirror hosts of the dumps. They also rate limit downloads because the demand is high and few mirrors exist. See https://dumps.wikimedia.org/ So9q (talk) 10:13, 11 September 2024 (UTC)

"This includes, in particular, the Sept. 11 wiki." Do you have any context for this? Trade (talk) 02:56, 14 September 2024 (UTC)

Community Wishlist: Let’s discuss how to improve template discovery and reuse

Hello everyone,

The new Community Wishlist now has a focus area named Template recall and discovery. This focus area contains popular wishes gathered from previous Wishlist editions:

We have shared on the focus area page how are seeing this problem, and approaching it. We also have some design mockups to show you.

We are inviting you all to discuss, hopefully support (or let us know what to improve) about the focus area. You can leave your feedback on the talkpage of the focus area.

On behalf of Community Tech, –– STei (WMF) (talk) 16:16, 13 September 2024 (UTC)

Petscan not returning full results list?

Hi everyone! Over the weekend I used Petscan to add a bunch of P27 statements to items based on "Japanese people by occupation" categories. My hope was to reduce the number of results in this query that contains items with a link to jawiki that only have P31:Q5, since I usually fill in each item by hand. However, I was really disappointed to find that the number of results in the query didn't change at all. There's simply no way that there isn't a single actor, musician, or politician in all 4,400+ results! Most of the results in that query are newer, and were imported from jawiki sometime in 2023 or 2024.

I usually use the SPARQL query below in Petscan to filter out items that already have P27, but whether the SPARQL query is present or not I don't get results with Q numbers that are more than 8 digits long (not including Q). A good category to test with is "日本の銀行家‎ ", which returned 42 results for me, with the most recent being Q65291794.

SELECT ?item WHERE { ?item wdt:P31 wd:Q5 . OPTIONAL { ?item wdt:P27 ?dummy0 } FILTER(!bound(?dummy0)) }

I've read the manual over and over and I can't figure out why Petscan won't return items with newer, 9-digit long QIDs. What am I doing wrong? Thanks in advance! Mcampany (talk) 00:13, 17 September 2024 (UTC)

Replying to say that I'd ask on the Petscan discussion page, but it seems really quiet there. I don't think this is worth opening an issue in Github because I'm pretty sure this is user error. If someone knows of a better place to ask than here, I'd appreciate if you'd let me know! Mcampany (talk) 00:23, 17 September 2024 (UTC)
Hello Mcampany, in PetScan there is a Wikidata tabulator, where you could put the value P27 (citizenship) or P106 (occupation) or any other property ID in the field Uses items/props, plus the checkbox None, to find all items without the P27 or P106 property related to the selected subcategory.
M2k~dewiki (talk) 16:23, 17 September 2024 (UTC)
Hi @M2k~dewiki! Thanks, using the "Uses items/props" field brought back 500+ results, which is what I expected to see. Unfortunately, if I don't have a SPARQL query in Petscan, or am not using create mode, all the checkboxes and the option to use Quickstatements disappear. A list is really nice, but I'd rather not have to export and open up Quickstatements in another tab to make the edits. I really like how seamless having the checkboxes and Quickstatements box there is. Do you have any idea why the Quickstatements box in Petscan keeps disappearing? Mcampany (talk) 17:32, 17 September 2024 (UTC)
Sometimes it happens with PetScan that checkboxes disappear, I havent found out in which cases, yet. Could you post your PetScan-Short-URL? M2k~dewiki (talk) 17:35, 17 September 2024 (UTC)
For example, something like https://petscan.wmcloud.org/?psid=29304610&al_commands=P31%3AQ847017 M2k~dewiki (talk) 17:36, 17 September 2024 (UTC)
Sure! It's https://petscan.wmcloud.org/?psid=29304637 Mcampany (talk) 17:38, 17 September 2024 (UTC)
You have to switch on Other sources from Automatic to Wikidata:
https://petscan.wmcloud.org/?psid=29304729 M2k~dewiki (talk) 17:48, 17 September 2024 (UTC)
Amazing! Thank you so much, @M2k~dewiki. You're the best.
 Resolved Mcampany (talk) 17:54, 17 September 2024 (UTC)

Updates in Wikidata item not reflecting in Commons..

Once after connecting a Commons-category to Wikidata (not by adding QID at first, but by adding category name in Wikidata), the data is visible in Commons. Then after adding some updates to the Wikidata item, no matter what is done, the updates are not visible in the Commons page, until unless the QID is added in Commons category. Similarly again, any updates in Wikidata-item are not visible in Commons-category until-unless some changes are made in the connection to Wikidata in Commons, such as removing QID.
The updates are not visible at all (like forever) even if cache memory is removed from the computer. Why this happens and does anybody know any solution? Gpkp (talk) 09:13, 17 September 2024 (UTC)

Hello Gpkp, please see
regaring (purging) server side caches. M2k~dewiki (talk) 15:48, 17 September 2024 (UTC)
User:M2k~dewiki, Thank you very much. --Gpkp (talk) 14:59, 18 September 2024 (UTC)

Daniel Levy: Two items for one person

Q76245151 and Q3014345 are for the same person, each item lists biographies in two different Wikipedia language versions. Could someone who knows what they are doing, like User:M2k~dewiki, please merge them? Thank you! --Andreas JN466 08:31, 18 September 2024 (UTC)

Hello Jayen466, the two objects have been merged: Help:Merge M2k~dewiki (talk) 11:45, 18 September 2024 (UTC)

Uploading Logo on NCWiki

I could use some help to add the logo from Wikimedia Commons to WIkidata the logo is uploaded as Logo_NCWiki.png. Kroeppi (talk) 09:19, 19 September 2024 (UTC)

fusion

Georges Meyer (Q19629082)

Georges Meyer (Q107279424)

seem to be the same one, not sure how to do it Io Herodotus (talk) 07:52, 20 September 2024 (UTC)

@Io Herodotus Well, they're not the same... so they shouldn't be merged. RVA2869 (talk) 08:27, 20 September 2024 (UTC)
Sorry wrong item.
Georges Meyer (Q107279424)
Georges Meyer (Q125422040)
--Io Herodotus (talk) 08:36, 20 September 2024 (UTC)
For more info: Help:Merge RVA2869 (talk) 09:05, 20 September 2024 (UTC)

Class instances of physical object (Q223557)

There are several classes in Wikidata where it seems clear from their description that their instances cannot be classes. The most obvious of these is physical object (Q223557), but there is also concrete object (Q4406616). These classes are also instances of first-order class (Q104086571), the class of all classes whose instances are not classes, or subclasses of its instances giving further evidence that their instances cannot be classes. Of course, if the instances of these classes cannot be classes then so also for their subclasses, including, for example, physical tool (Q39546).

However, in Wikidata there are currently many instances of these classes for which there is evidence in Wikidata that they themselves classes, either having instances themselves or having superclasses or subclasses. For example, there were recently 1,821,357 instances of physical object (Q223557) that fit one or more of these criteria, out of 21,154,884 instances of physical object (Q223557), an error rate of 8.6 per cent. (These numbers come from the QLever Wikidata query service, as the WDQS times out on the queries.) This is very likely an undercount of class instances of physical object (Q223557), as there are items in Wikidata that are classes but that have neither instances, superclasses, or subclasses in Wikidata, such as R3 device (Q7274706).

I propose trying to fix these problems. It is not necessary to make nearly two million corrections to Wikidata as in many cases a single change to Wikidata can fix the problem for many of these class instances. For example, there are at least 764,424 items related to protein (Q8054) involved so determining what to do with protein (Q8054) may fix a large part of the problem.

In many cases fixing these problems requires making a decision between several different ways forward. For example, airbag (Q99905) could be determined to be a class of physical objects, and thus stay a subclass of physical object (Q223557) but would have to have its class instance changed to a subclass. Alternatively, airbag (Q99905) could be determined to be a metaclass, and thus would need to be removed from the subclasses of physical object (Q223557) and probably be stated to be is metaclass for (P8225) physical object (Q223557). It would be useful to have a consistent treatment of involved classes like this one.

So who is interested in trying to address this large number of errors in Wikidata? This effort is likely to take some time and could have a larger discussion than is common here so I have created a page in the Wikidata Ontology Project for the effort. If you are interested please sign up on that page. Peter F. Patel-Schneider (talk) 14:03, 13 September 2024 (UTC)

I think as a very general rule instance of (P31) is overused and subclass of (P279) underused in Wikidata, and where things are ambiguous the right choice is to replace a instance of (P31) statement with subclass of (P279). The vast majority of items in wikidata are concepts, designs, etc., not physical objects. Yes there are lots of individual people or geographical or astronomical objects, but it is rare to have notable actual instances of most concepts. ArthurPSmith (talk) 19:36, 13 September 2024 (UTC)
Agreed. But it does not appear to be reasonable to just replace all instance of (P31) under physical object (Q223557) with subclass of (P279) as there are actual physical objects there, for example items in museum collections. And it is not sufficient to just do this only for those items for which there is evidence in Wikidata that they are classes because there are items in Wikidata that are classes in the real world but there is no information in Wikidata signalling this. Peter F. Patel-Schneider (talk) 13:38, 16 September 2024 (UTC)

Basic instructions are lacking

Editors with experience editing English Wikipedia who find ourselves here in Wikidata need a lot more handholding than is currently provided. I was editing the Wikipedia article en:User:Stitchbird2/sandbox, which includes a reference in this Wikidata space, Sinopsis de las especies austroamericanas del género Ourisia (Scrophulariaceae) ... which, when I came, did not have the genus name Ourisia in italics. I came to this page and tried to italicize the genus name, and I assumed that, like Wikipedia, surrounding something with two apostrophes, before and after, causes italics. Did I do it right? If so, why does the User sandbox article show the genus name with two apostrophes before and after, instead of treating the two apostrophes before and after as italic markup? Please, Wikidata needs instructions for this sort of thing. If they exist, tell me where and make them easier to find. If they don't exist, they need to be created. Signing off with four tildes, does that work? I guess I'll find out. Anomalocaris (talk) 03:46, 15 September 2024 (UTC)

For more information on item labels you can read Help:Label. The section "Fonts and characters" has a good explanation. William Graham (talk) 05:21, 15 September 2024 (UTC)
All right, I reverted my changes to the item mentioned above. Anomalocaris (talk) 07:33, 15 September 2024 (UTC)
@Anomalocaris: You may also find title in HTML (P6833) and/or title in LaTeX (P6835) useful, though I have no idea if either of them are used by the English Wikipedia templates used in your article. (You can try it out on Wikidata Sandbox 1 (Q4115189) if you like.) Lucas Werkmeister (talk) 10:13, 16 September 2024 (UTC)

Wikidata weekly summary #645

How to replace a statement to another

Hi, editing Amagasaki Domain (dissolved in 1871), I found its statement "located in the administrative territorial entity" Property:P131 should be rather "located in the present-day administrative territorial entity" Property:P3842. OTOH I already added some elements to this statement and don't want to waste them. So I tried to edit the source of that page but found no way.

In addition, I'm afraid other * Domain entities might share the same problem, but have no time to dig it up deeper, now that 1:49 at my local time. There were around 300 Domains in Japan at that period (from 1600 till 1871).

Can anyone kindly suggest me what's the best to manage that?

Cheers. --Aphaia (talk) 16:52, 13 September 2024 (UTC)

The Move Claim gadget will do this for you. - PKM (talk) 01:59, 15 September 2024 (UTC)
@PKM How does one install this gadget? Peter F. Patel-Schneider (talk) 20:49, 17 September 2024 (UTC)
@Peter F. Patel-Schneider: Go to Special:Preferences#mw-prefsection-gadgets and tick the box next to moveClaim. Dexxor (talk) 08:44, 18 September 2024 (UTC)

Map of Holocaust memorials

I was just looking for images of Holocaust memorials, and made the below query. I never expect wiki info to be complete, but I was surprised how incomplete this map was compared to, say English Wikipedia. Even Commons has photos of memorials which don't seem to be on this map. I don't have time to work on this much myself but I hope it's okay to flag it up here as something that people may want to improve, mainly by adding commemorates (P547) The Holocaust (Q2763) to the relevant items. Holocaust education is a hot topic here in the UK as the new government is suggesting making it compulsory in all schools.

The following query uses these:

Features: map (Q24515275)  View with Reasonator View with SQID

#defaultView:Map{"hide":"?coords"}
SELECT ?item ?itemLabel ?coords ?image WHERE {
  ?item wdt:P547 wd:Q2763; wdt:P625 ?coords
      OPTIONAL {?item wdt:P18 ?image}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". }
}

MartinPoulter (talk) 12:43, 18 September 2024 (UTC)

Here's a Petscan query for articles in en:Category:Holocaust commemoration without property commemorates (P547): [4]. If you or someone else wants to add statements to them, go through the list, select suitable items, add "P547:2763" to the command list and press Start QS. Samoasambia 13:09, 18 September 2024 (UTC)

"position held" vs. "member of"

Hello, I am looking for advice on when to use "position held" or "member of" on Amy Brown Lyman. She was president of the Relief Society (the women's organization in the Church of Jesus Christ of Latter-day Saints). That seems clearly like a "position held." She was also part of the Relief Society General Board. I'm not sure which way to model that (I did both ways I could think of). Rachel Helps (BYU) (talk) 21:06, 17 September 2024 (UTC)

I think "member of" is better for the board membership, but "position held" is right for being a president of something. ArthurPSmith (talk) 18:00, 19 September 2024 (UTC)

Uploading photos on WikiLovesMonuments.org having problem with no category for .....

In County Sligo, Ireland there's a historic site which I believe is incorrectly named "Carrowmore Passage Tomb". There is no 'no Wikimedia Commons category'. It has a Wikidata entry # Q33093088. I've uploaded a set of images on the site but have just disappeared.

⁨I understand correct name to be Carrowmore Megalithic Cemetery⁩, ⁨Carrowmore⁩, ⁨Co. Sligo⁩, ⁨Ireland⁩


Any help appreciated.

Gillfoto (talk) 23:10, 18 September 2024 (UTC)

I moved the Wiki Loves Monuments ID to Carrowmore (Q260398) (Carrowmore Megalithic Cemetery). It's possible that Carrowmore Megalithic Cemetery W-M 9a (Q33093088) was intended to be "Carrowmore Passage Tomb Cemetery", but the "Cemetery" was missing, and it was unclear whether it referred to the cemetery or one of the tombs. Peter James (talk) 20:44, 19 September 2024 (UTC)

Automotive

Imperial cars are called Chrysler Imperials. Imperial has not been a model of Chrysler since 1954. In 1955 Imperial became a separate car brand like Lincoln is from Ford or Cadillac is from Buick. Since 1954 there has not been a Chrysler Imperial. 2600:1700:E770:7680:54D0:3A6B:2947:91BA 20:32, 20 September 2024 (UTC)

There was a Chrysler Imperial between 1989 and 1993 (https://www.nytimes.com/1989/11/08/business/chrysler-imperial-joins-cutthroat-car-market.html https://www.macsmotorcitygarage.com/the-royal-prerogative-chryslers-very-last-imperial-1990-93/). Chrysler Imperial (Q1088705) is for the Chrysler model series, Chrysler Imperial (Q97373892) is the 1989-1993 Chrysler model, and Imperial (Q668822) is for the brand. The models have "brand: Chrysler", which links to Chrysler (Q29610); the brand only has "founded by" and and "owned by" linking to the Chrysler company Stellantis North America (Q181114). This seems to be correct, at least for the time the cars were produced (although this may need changing if the Stellantis North America (Q181114) item is split). The items could probably be improved with the addition of start and end of production or sale - Imperial (Q668822) has inception (P571) and dissolved, abolished or demolished date (P576), but I'm not sure they are suitable properties for a product. Peter James (talk) 00:20, 21 September 2024 (UTC)
I found the properties date of commercialization (P5204) and discontinued date (P2669) and added them. There was also the Chrysler Imperial Concept (Q50396854) from 2006 but that is probably not relevant as it was only a concept car. Peter James (talk) 00:50, 21 September 2024 (UTC)

Captions and colour of images of personalities scanned from books

There are many drawings of Czech personalities used here in Wikidata which look like e. g. File:Adolf Heyduk – Jan Vilímek – České album.jpg or File:Karel Jaromír Erben – Jan Vilímek – České album.jpg. Sometimes there may be another version of the same image of similar quality in Commons (often uploaded by me) such as File:Karel Jaromír Erben (cut).jpg, which

  1. is without the caption present directly in the image,
  2. is more cut with less space around the portrait
  3. has the colour of the book paper removed.

Until now I did not have any problems with such replacements in Wikidata although I've been doing this occasionally for years, but now I have met with disagreement by User:Skot. Because my arguments did not convince him, and because I would like to prevent any edit-warring, I would like to ask other people for their opinions.

While I admit that the pictures with captions can be useful for somebody and so it is good to have them in Commons, I also believe that pictures without them like File:Karel Jaromír Erben (cut).jpg are better for Wikidata, because WD has a different tool to include captions if they are needed–the qualifier "media legend". Images from Wikidata are also being taken over by other projects, which also use different tools for captions, and the result then is that the caption is being doubled, as has happened e. g. in s:Author:Adolf Heyduk.

Cutting off the extra space around the portrait leads to better display in infoboxes, where the portrait looks larger, while taking the same infobox space.

I also think that the yellow-brown colour of the book paper is a noise contaminating the picture and worsening its contrast, and that it is not the colour of the picture but the colour of the medium from which the picture was taken. While it is absolutely possible to use such coloured pictures in Wikidata when there is no choice, if there is a possibility to choose between two alternatives, users should not be prevented to replace the coloured version by the de-coloured one in these cases.

Any opinions whether such replacements in Wikidata are possible are really appreciated. -- Jan Kameníček (talk) 20:58, 17 September 2024 (UTC)

@Jan.Kamenicek: I totally agree with you, a cropped and clean version is obviously better (especially as the coloration is a degradation of the paper that was not originally present nor intended ; but also the white space or the caption inside the image doesn't bring anything really useful for Wikidata). AFAIK, most people are doing the same (and for years, it was done also on other projects before Wikidata). I'm curious of hearing Skot arguments and reasoning. Cheers, VIGNERON (talk) 11:21, 21 September 2024 (UTC)
@VIGNERON The question is not whether to adjust the color of the images, we adjust them both, the question is the degree of adjustment. You can compare the original file: https://ndk.cz/uuid/uuid:8ea19fd0-cb83-11e6-ac1c-001018b5eb5c
For me personally, the degree of colour editing (it is simply automatic contrast) from a colleague is beyond the edge of losing image information, so I will not do it this way. At the same time, I have no intention of replacing his images as long as he uploads them in full resolution from the original source. The core of the problem is determining whether one of the color versions is significantly better than the other, justifying systematic replacement of each other's images within Wikimedia projects.
Regarding uploading images without captions, there i feel quite consensus within the community, so there is no problem uploading both the version with a caption and the one without directly to Commons. Skot (talk) 12:05, 21 September 2024 (UTC)

Deletion of Userpage

Hello,

I unknowingly created my user page in this wiki. Later realized that global user pages at MetaWiki exist. Now I am confused where do I list my userpage for Speedy deletion. Any help would be greatly appreciated regarding this issue. Thanks in advance! Bunnypranav (talk) 12:36, 21 September 2024 (UTC)

Hi Bunnypranav, just add {{Delete}} template to your user page. Samoasambia 13:13, 21 September 2024 (UTC)
Thanks for the clarification, I thought this template cannot be used for user nominated speedy deletions. Bunnypranav (talk) 13:15, 21 September 2024 (UTC)
This section was archived on a request by: --Wüstenspringmaus talk 13:12, 27 September 2024 (UTC)

Q3219113 : error - PS : fixed

Gender is male, as it can be checked via the given url. But I don’t know how to correct this error. Can anybody help ? Thank you. Punctilla (talk) 00:35, 27 September 2024 (UTC)


Édit : Oups ! I just found I could revert vandalism : error fixed. Sorry for disturbing. Punctilla (talk) 00:44, 27 September 2024 (UTC)

Thank you for keeping eyes open! --Matěj Suchánek (talk) 06:32, 27 September 2024 (UTC)

Description

Hi, could you explain the easiest way to add the same description to all articles within a specific category on Wikipedia? I'm looking to do this for settlements in the municipalities of Serbia, based on the categories in the Serbian-language Wikipedia. I'm aware of QuickStatements but haven't used it before. Thank you! — Sadko (words are wind) 01:48, 21 September 2024 (UTC)

Hello Sadko,
with Help:QuickStatements you need to prepare a list of QIDs with the description, for example using PetScan or SPARQL.
The format is described at:
The statements for QuickStatements also could be prepared and modified with the help of LibreOffice, OpenOffice, Excel, etc. (three columns, which can be copy/pasted into QuickStatements)
Also see:
M2k~dewiki (talk) 20:00, 22 September 2024 (UTC)

Expelled because endorsement.

How to say that a person was excluded from a political party because he endorsed an another political party how can i represent that in Wikidata using the Property end cause (P1534)?

For instance most political parties have in their charter URL (P6378) that if you endorse or are elected on another political party ticket than you get expelled from your political party. Johshh (talk) 23:01, 21 September 2024 (UTC)

I'd likely just use the P1534 but then add additional qualifier property statements to say when/where/how/why ? There are a few "expulsion" concepts you could link the qualifiers to, there's also this one removal from office (Q106677579) I found, but maybe that one is too narrow for your use case and you might have to search for others, or create one called "removal by policy" or something suiting the exact case like "removal from political party". Thadguidry (talk) 05:55, 22 September 2024 (UTC)
exclusion from a political party (Q50394689) exists for this purpose M2Ys4U (talk) 20:05, 22 September 2024 (UTC)
I'm not sure a database like Wikidata is really suited to model such fine-grained details. I'd almost argue that certain minute intricacies are much better handled in text form on platforms like Wikipedia. Wikidata might arguely be better off not trying to recreate every little detail with our rather blunt, nuance-less, context-less database properties. --2A02:810B:5C0:1F84:CC37:BB5C:F551:763A 17:41, 22 September 2024 (UTC)

How do I replace a citation? How do I get coords displayed in decimal degrees?

First question. There are many Queensland place names which have been imported from something called the Geographic Names Server. I don't know what that is, but it is not an authoritative source for Queensland place names and the information is often wrong, and then it creeps into Wikipedia and Commons infecting them with its misinformation. My immediate problem is Alligator Creek (Q21922742). Although I managed to change the coords and I eventually managed to delete the Geographic Names Server, I cannot see how to add a citation to a reliable source, the Queensland Places names database, entry 391. So how do I do this? There is a cite template on Wikipedia for the purpose (cite QPN). How do I use it here or get set up something equivalent set up on Wikidata?

To illustrate the problem, see this Wikidata-generated infobox on Commons

https://commons.wikimedia.org/wiki/Category:Alligator_Creek_(creek,_City_of_Townsville)

See how the point is the middle of dry land because it is incorrect. The creek mouth is further west.

Second question. Most of the Queensland resources use decimal degree format. Although I entered my coords in the decimal degree format, they are displaying as DMS. How do I get them to display (in general or just for me) as decimal degrees so I can check what's in Wikidata without having to manually convert all the DMS each and every time.

Thanks, Kerry Kerry Raymond (talk) 02:56, 22 September 2024 (UTC)

They are stored in decimal format; it can be seen in the Wikidata Query Service, and in diffs when coordinates are changed, added or removed (example: Special:Diff/2251695016), but I am not aware of a way to display the decimal values in a page. The coordinates form a link using decimal format, so Q8678#P625 displays as 22°54'40"S, 43°12'20"W but the link goes to https://geohack.toolforge.org/geohack.php?params=-22.91111111111111_N_-43.205555555555556_E_globe:earth&language=en (south and west are represented by negative values of N and E) - hover over the coordinates or copy the link for the decimal value. After clicking "edit" the link disappears; it returns after refreshing or reopening the page. The DMS coordinates displayed can also be inaccurate because of precision, but the correct value is still in the map and link, for example in Q9397598#P625 the value entered is 53.5592°N, 18.3625°E, which with default precision would convert to 53°33'33.1"N, 18°21'45.0"E, but precision - seen by clicking on "edit" or viewing a diff - is 0.15123359250243 (added by a bot; that value is not an option in normal editing), so the DMS displays as 53°32'N, 18°18'E. Peter James (talk) 19:37, 22 September 2024 (UTC)
There is a long-standing Phabricator ticket for displaying coordinates as decimals. M2Ys4U (talk) 20:12, 22 September 2024 (UTC)
It seems amazing that a simple thing like this can't be done. And I guess I wonder why, if the internal representation is decimal degrees, is the default rendering DMS? It is the 21st centuryand all geospatial tools and datasets I use are based decimal degrees. Kerry Raymond (talk) 01:25, 23 September 2024 (UTC)
@Kerry Raymond You can't easily do that on the front-end, but you can download any coordinate statement in the original (i.e. decimal) format using a SPARQL query. People are confusing the front end of Wikidata with Wikipedia, but Wikidata is not really meant for browsing. Most of the stuff here is meant to be done on the underlying data, and that's where the development of Wikidata should concentrate its focus. In other words, what you call a 'simple thing' is actually simple, but probably unimportant and a useless waste of developer's time. Vojtěch Dostál (talk) 07:11, 23 September 2024 (UTC)
"Unimportant" and "useless waste of time". We are discussing data quality. How unimportant is that? Kerry Raymond (talk) 08:45, 23 September 2024 (UTC)
@Kerry Raymond In my view, the way data are displayed has nothing to do with data quality. Perhaps you mean 'ability to assess data quality quickly' because you want to compare the statements in Wikidata to a source which uses a decimal format. I'd retort to that such comparisons are best done by downloading coordinates in both databases, merging them in a spreadsheet and doing the comparison in a much more high-throughput manner. For that, the way data are displayed is not an issue at all. Is it maybe possible that you are trying to use the Wikipedian way of verifying stuff for a completely different Wikimedia project, where we tend to have different approaches to this? Vojtěch Dostál (talk) 09:08, 23 September 2024 (UTC)
@Kerry Raymond: Regarding adding references, there is Queensland place ID (P3257) you can use. I've added it as an identifier to Alligator Creek (Q21922742), plus you can use it as a reference. Check the item to see how I did it. Unfortunately, there doesn't seem to be any way to reference the specific "Queensland place names search", as individual entries do not have their own webpage. But, it appears P3257 uses the exact same database. Huntster (t @ c) 23:29, 22 September 2024 (UTC)
Thanks for that. I did find that and tried to use it but WikiData refused to let me publish it, so evidently whatever I did with it was wrong in some way. I guess I try again and see if I can make it work. Kerry Raymond (talk) 01:20, 23 September 2024 (UTC)

Extra badge for WikiDictionary redirects

Some Wikipages like https://en.wikipedia.org/wiki/Brachyology aren't normal redirects but redirects to WikiDictionary. Currently, it seems that those sitelinks are often listed without a badge, similar to sitelinks that aren't redirect. Should we have a new badge for those pages? ChristianKl17:41, 22 September 2024 (UTC)

Yes. Ymblanter (talk) 18:48, 22 September 2024 (UTC)
@ChristianKl: I think Wiktionary redirects are usually stored in separate items with statement instance of (P31)Wiktionary redirect (Q21278897). But sometimes they have gotten mixed up with regular sitelinks. Samoasambia 07:46, 23 September 2024 (UTC)
Those items are not notable according to our policies, so the solution for them is to be deleted. Sometimes however they are intermixed with other sitelinks. ChristianKl08:06, 23 September 2024 (UTC)

Hi!

I'm new here on WikiData, so excuse me if I'm making a faux pas...

Can someone address the question I asked here?

Thanks.

ValJor (talk) 20:21, 23 September 2024 (UTC)

Wikidata weekly summary

There is no status update this week? Ayack (talk) 09:43, 24 September 2024 (UTC)

The link was posted on X/Twitter by the Wikidata account last night https://www.wikidata.org/wiki/Wikidata:Status_updates/2024_09_23 Piecesofuk (talk) 09:54, 24 September 2024 (UTC)
Thank you, but usually I "received" it on my user talk page and it's also posted here, on the project chat. Ayack (talk) 10:25, 24 September 2024 (UTC)
@Mohammed Abdulai (WMDE) FYI. Ayack (talk) 17:47, 24 September 2024 (UTC)
Very weird. I'm certain that I did pushed the send button. Thanks for notifying me Ayack, I'll look in to it. -Mohammed Abdulai (WMDE) (talk) 19:17, 24 September 2024 (UTC)

Problem bot edit on Sick Boi (Q7507561)

Sick Boi (Q7507561) was created to link to the Wikipedia article https://web.archive.org/web/20121001051141/https://en.wikipedia.org/wiki/Sick_Boi which was an album/mixtape by the artist Donald Glover aka Childish Gambino (note that most of the statements refer to this original item)

However, at some point this Wikipedia link was turned into a redirect, see https://web.archive.org/web/20210716053015/https://en.wikipedia.org/wiki/Sick_Boi which redirected to https://web.archive.org/web/20210716053015/https://en.wikipedia.org/wiki/Childish_Gambino_discography#Mixtapes

Then at some point this redirect seems to have been redirected to https://en.wikipedia.org/wiki/Ren_Gill#Discography which eventually turned into the article https://en.wikipedia.org/wiki/Sick_Boi (there's a new redirect for the Childish Gambino album at https://en.wikipedia.org/wiki/Sick_Boi_(mixtape) )

I guess it's this edit https://www.wikidata.org/w/index.php?title=Q7507561&oldid=1780864042 by @MsynBot which updated the redirect to Ren's album.

Is there a automatic fix to get this back to as it was? Or could I just restore to just before this edit and add a separate item for Ren's album (I noticed that there are dozens of international labels that would be lost if I did this)

Can this sort of bot edit be prevented from happening in future? Piecesofuk (talk) 09:49, 24 September 2024 (UTC)

The bot just adds the badges to make it easier on Wikidata to see what's a sitelink and what isn't. If people over at Wikipedia change what a page is about, there's little the bot can do about that.
Hey man im josh (talkcontribslogs) was the first person who changed the label against the identity of the item.
Restoring it to the state before Josh's edit and then creating a new item for Ren's album would be the way to go. ChristianKl10:13, 24 September 2024 (UTC)
Thanks for the reply. I've restored the item and created a new one for Ren's album. Piecesofuk (talk) 15:00, 24 September 2024 (UTC)

Rewriting Blazegraph syntax to pure SPARQL 1.1

Hi All,

@Andrawaag and I have been working on extracting all SPARQL templates in wikidata at the latest Biohackathon Japan DBCLS BioHackathon (Q109379755). So that we can automatically rewrite them to be SPARQL 1.1. The tool we use is something developed for the SIB Swiss Institute of Bioinformatics (Q3152521) sparql examples project [5]. That now has support for crawling wikibase. We still have issues open [6] and we hope that the community can find some more. We have aimed to be gentle with crawling and encourage to test the java code with the "--use-cached" flag so that you don't use more server resources than needed.

For example still missing is the nice static webpages due to an issue with Jekyll. Also the extraction of "comments" is still being worked on by @Andrawaag.

Please let us know if you like/dislike this and what you would like to see added.

Regards, Andra and Jerven (talk) 16:20, 20 September 2024 (UTC)

An example of a rewritten query is http://jerven.eu/wikibase-sparql-examples/examples/wikidata/0a6179e690a052035e4812db5a566b87/ which was rewritten from blazegraph syntax from a query found on [a request a query archive page] Jerven (talk) 13:19, 23 September 2024 (UTC)
We now have a status page [7] that keeps track of how many queries we can parse with the different parsers. This shows we fail to parse 3.6% with the Blazegraph parser, and 5.6% fails to parse with RDF4j and 6.9% with Jena. The blazegraph named subqueries are starting to get fixed, currently we still fail to fix example with an includes inside a with clause. @AndreaWest is this useful for you? Jerven (talk) 13:16, 24 September 2024 (UTC)
@Andrea Westerinen is this useful for your work? Jerven (talk) 13:17, 24 September 2024 (UTC)
@Jerven: IMO it would be useful to also keep and make accessible the original versions. If the queries use eg named subqueries, they may be significantly easier for humans to read in their original form than in the modified form -- and the named subquery form may be more efficient too, if the results of the subquery are re-used more than once.
Keeping both forms may therefore be useful, for both output and performance comparisons, as well as for archive/corpus purposes.
With luck, any replacement chosen for Blazegraph will include a native implementation of the named subquery extension to the standard; or, failing that, your translator could be used as a preprocessor (albeit that might not be able to capture the efficiency gain of named subqueries when data is used more than once). Jheald (talk) 20:39, 24 September 2024 (UTC)
It's maybe also worth remembering other extensions in Blazegraph, such as bd:sample to generate a given number of truly random sample triples, that also have the potential to be very valuable, but cannot efficiently be translated into standard SPARQL 1.1; it would be good if potential vendors could find ways to make that accessible too. -- Jheald (talk) 21:00, 24 September 2024 (UTC)
Hi @Jheald not shown in the rendering to Markdown. But the blazegraph queries are stored, even if they are fixed a little bit by adding prefixes in use. See this example [8] for how it is done for now. Do you think it is useful to have this in the markdown?
I hope that the vendors implement something more general than the named subqueries are implemented in anzo and blazegraph. Jerven (talk) 07:25, 25 September 2024 (UTC)

Merging needed of two items

situation in 1981

I accidentaly created Q130340447 while Q5649310 already existed. Some things are not correct. It not out of use but rebuild for another purpose. (no trains but used as a greenhouse).Smiley.toerist (talk) 21:24, 21 September 2024 (UTC)

I tried to merge these but they have conflicting commonswiki sitelinks (one to the Category, the other to the name). I'm not familiar with how commonswiki links should work so if somebody else can take a look I'd appreciate it, thanks. ArthurPSmith (talk) 15:51, 23 September 2024 (UTC)
From my point of view there is no need to merge these two items, since one covers the (former) railway station and the other the botanical garden at the same place. The two objects link to each other.
Also see
M2k~dewiki (talk) 15:53, 23 September 2024 (UTC)
Ok, but they have the same commons category etc. statements, while the commonswiki sitelinks differ - surely only one of those should be in effect? ArthurPSmith (talk) 19:23, 24 September 2024 (UTC)
Regarding Commons there is a category on commons as well as gallery. (Commons) categories should be connected to category wikidata objects, gallery sitelinks should be connected to articles / article objects, the object for the articles and the object for the category can link to each other. If there is no gallery on commons also the commonscat could be connected to the article object. M2k~dewiki (talk) 19:31, 24 September 2024 (UTC)
The train images of the old stationshed: File:Renfe 597-1981.jpg and File:Madrid Atocha 1981.jpg are placed in the general item Q2842655. I see no reason to use a separate item for the old trainshed in these cases. This causes only confusion. Smiley.toerist (talk) 10:23, 25 September 2024 (UTC)
I would tend to agree. It is the same building. I have removed the statements which were confusingly linking them, and also sorted the category item. So they should be ready for merging now — Martin (MSGJ · talk) 11:57, 25 September 2024 (UTC)

Misspelled wikidata item

So this was my first time creating a Wikidata item, but I mistakenly putted _ instead of spaces in the title. See Mukhtar_Ali_(footballer,_born_1962). Could anyone please let me know how to sort this out. Regards JayFT047 (talk) 23:42, 24 September 2024 (UTC)

@JayFT047: ✓ Done, there's a edit button on the upper right corner which enables editing labels, descriptions and aliases. Samoasambia 06:25, 25 September 2024 (UTC)

(Cross-posted from Wikidata:Administrators' noticeboard#Q1477321_and_Q28496595)

The strange edit histories of these two items came to my attention through a reported issue at the Name Suggestion Index GitHub. At one point, Q1477321 referenced what is now the subject of Q28496595, and vice versa. Their original subjects (the entity now represented by Q1477321) appear to be identical (compare the first revisions of Q1477321 and Q28496595), meaning Q28496595 now refers to a different entity than it originally did. Since a lot of the messiness happened years ago, should these pages be left as is, or should Q28496595 still be merged into Q1477321, with the subject of Q28496595 getting a new QID? BrownCat1023 (talk) 12:55, 25 September 2024 (UTC)

Dict: protocol

Do we have a server for the dict: protocol, as described in this blog post and at DICT?

Curiously, if I type dict:cheese in the search bar here, I am taken to https://www.wikidata.org/wiki/Special:GoToInterwiki/dict:cheese (and similar if I do so on en.Wikipedia, etc.*), which displays:

Leaving Wikidata

You are about to leave Wikidata to visit dict:cheese, which is a separate website.

Continue to https://www.dict.org/bin/Dict?Database=*&Form=Dict1&Strategy=*&Query=cheese}}

and not to a Wikidata entry (nor a Wiktionary page**). Can we get that changed?

[* doing so on fr.Wikipedia still takes me to an English definition; does it do so for people whose browsers use other languages?]

[** Also raised at wikt:Wiktionary:Grease pit/2024/September#Dict: protocol]. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:15, 18 September 2024 (UTC)

See phab:T31229.--GZWDer (talk) 16:02, 18 September 2024 (UTC)
It's kind of retro-sexy but also incredibly niche. Since the WMF offers servers to useful projects you could set up a server that offers wikidata or wikipedia over gopher, telnet (BBS-style or non-interactive) or dict. I didn't intend to register a developer account, but for something fun like this I might change my mind. I can hardly think of a better way of procrastinating. Be warned though, I might be compelled to do the implementation in D (language designed by walter bright and andrei alexandrescu, both C++ heavyweights) just because I want more experience with it. I recon a single docker instance will do which doesn't require much formality, so this this could be up and running without too much delay. Main thing would be agreeing on how the protocols should be queried. Infrastruktur (talk) 17:39, 19 September 2024 (UTC)
Had a quick look at it. I guess dict protocol makes more sense for Wikipedia and Wiktionary than it does for Wikidata, as Wikidata doesn't have short definitions. Seems most dict servers serve a unicode text file that consists of a key-value pair of dictionary entry and its definition. If we don't expect much traffic I think an approach where we skip compiling a dictionary and merely act as a gateway transforming the first paragraph of Wikipedia articles into pure text, stripping out any templates, should be sufficient. Might also scrap the plan to use D for this and just use good old Python. Looks like lookups will also be exact, so no search suggestions or anything like that. No user authentication required either, but people might like support for encryption.
On a related note I use bang codes in my browser bar to look up stuff. If DuckDuckGo is configured you can just type "!wd Q12345", "!wen Marco Polo" or "!mw gourd" to look up stuff quick. Lots of dictionaries supported [9]. Infrastruktur (talk) 17:45, 25 September 2024 (UTC)
@Infrastruktur: "Wikidata doesn't have short definitions" On cheese (L4517), for example, at L:L4517#S1, I can see "milk-based food product". But then we also have cheese (L331133),so I guess we would need to query all lexemes with a label matching the desired string. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:04, 25 September 2024 (UTC)
Forgot about lexemes. It might be easier to handle lexemes with the same name than it is to handle Wikipedia disambiguation pages, where it's not clear which of the links is a definition of the word. This protocol is all new to me so lots of things to figure out still. Infrastruktur (talk) 23:43, 25 September 2024 (UTC)

datatype of P5143

Hi all. I have proposed changing the datatype of amateur radio callsign (P5143) from external ID to string, and I'd like to have a consensus about it before making any changes. If you are interested, go ahead and join the discussion on the property talk page. Thanks. Samoasambia 08:26, 25 September 2024 (UTC)

A good way, it to ping all the people who commented on the creation of the property when proposing to change it. ChristianKl12:45, 25 September 2024 (UTC)
Thanks for the suggestion. I did that now. Samoasambia 15:23, 25 September 2024 (UTC)

aircraft engine (Q743004) and models (also many classes similar to aircraft engine (Q743004))

The class aircraft engine (Q743004) has problems in Wikidata. aircraft engine (Q743004) is a subclass of physical object (Q223557), which means that its instances are physical objects. But almost all the instances of aircraft engine (Q743004) are not physical objects, instead mostly being aircraft engine models like Poinsard (Q7207885).

What should be done to fix this problem? The simplest fix would be to just make aircraft engine (Q743004) no longer be a subclass of physical object (Q223557) and other classes that cause similar problems, perhaps by replacing aircraft engine (Q743004)subclass of (P279)aircraft component (Q16693356) with aircraft engine (Q743004)is metaclass for (P8225)aircraft component (Q16693356). But that would leave all the labels and descriptions for aircraft engine (Q743004) as is, and not corresponding to the actual intent of the class. Changing just the English label and descriptions would be possible but would cause a difference in the meaning of the labels in different languages. Adding an English value for Wikidata usage instructions (P2559) would help a bit not would not solve the mismatch. Changing all the descriptions doesn't seem immediately possible.

A variation of the first option would be to add a new class for aircraft engines, transfer all the labels and descriptions and aliases to the new class, correctly place the new class in the Wikidata ontology, give the new class appropriate label and description and aliases, and make the model instances also be subclasses of the new class.

Another option would be to make all the aircraft engine models that are currently instances of aircraft engine (Q743004) subclasses of it instead, perhaps in conjunction making the models instances of some suitable metaclass like engine model (Q15057021). This option probably requires many more statment changes than the previous approaches. If this is done, adding an appropriate English value for Wikidata usage instructions (P2559) on aircraft engine (Q743004) seems indicated.

Given what I have seen in related classes, I expect that there are many classes that have the same problem so perhaps the best way forward is to consider the problem in general and come up with a general solution.

Does anyone have preferences between these approaches? Does anyone have a different approach to fix this problem? Does anyone know what is the best way to gather a community that could come up with a consensus decision? Peter F. Patel-Schneider (talk) 15:32, 18 September 2024 (UTC)

the standard solution would be to create a new item "aircraft engine model" as a metaclass for aircraft engine. Fgnievinski (talk) 23:50, 19 September 2024 (UTC)
We'd save a lot of work generating a parallel *_model hierarchy if something could be a instance of (P31) of product model (Q10929058) and a subclass of (P279) (or some other property) of its functional class. Vicarage (talk) 00:17, 20 September 2024 (UTC)
Care to exemplify Fgnievinski (talk) 02:03, 20 September 2024 (UTC)
Peter and I are working on a RfC. Vicarage (talk) 20:51, 21 September 2024 (UTC)
Wikidata:Requests for comment/object vs design class vs functional class for manufactured objects now ready for comment. Vicarage (talk) 14:15, 26 September 2024 (UTC)
Sadly I think the first thing you need to do is make sure aircraft engine (Q743004) is used as a subclass of (P279), not the 237 times its currently used as a instance of (P31). Once that's done, I don't see why you can't leave the subclass of (P279) aircraft component (Q16693356) alone, and work up the chain to find a point where a design idea erroneously becomes a physical thing. Naively it seems to me you can have design all the way down to a final instance of (P31) for a single object. But you know a lot more about ontology than me. Really it should be called "aircraft_engine_model", like we have for weapon model (Q15142894) but I don't like the idea of doubling up adding _model to every concept, to have 2 parallel trees. I see we have engine model (Q15057021), so perhaps as we do with weapons, we make the 237 aircraft engines use that, and keep looking for generic terms at the level above a specific design.

Vicarage (talk) 16:31, 19 September 2024 (UTC)

Yes, I'm coming around to this approach of moving lots of items from instance to subclass of aircraft engine (Q743004), except that there might be a few instances that are actual physical objects (like aircraft engines in museum collections). That's probably the most actual changes but probably has the least changes to the Wikidata ontology. Peter F. Patel-Schneider (talk) 16:52, 19 September 2024 (UTC)
Those actual objects should either at a pinch have P31 of physical_object directly, or for museum items, item of collection or exhibition (Q18593264). I expect we can come up with useful reasons why we care about a physical instance of a manufactured design. Vicarage (talk) 16:58, 19 September 2024 (UTC)

Dereferencing missatributed Israel CBS Ids

I cant figure out how to edit the pages and the bot which originally made the errors seems to have been dead for a couple years now. I added a comment on the talk pages but would appreciate if someone who knows how to do this would remove the properties.

https://www.wikidata.org/w/index.php?title=Talk:Q48195&oldid=2253153681 https://www.wikidata.org/w/index.php?title=Talk:Q121157&oldid=2253151994 Wissotsky (talk) 11:31, 26 September 2024 (UTC)

You wouldn't have been able to edit those two items because of the protection ([10][11][12][13]) so it's possible the "edit" links don't appear. I removed them, the correct values were already on items for places in Israel. Everything with CBS IDs now seems to be somewhere in Israel or Israeli-occupied territories. Peter James (talk) 13:52, 26 September 2024 (UTC)

RfC on object vs design class vs functional class for manufactured objects

@Peter F. Patel-Schneider and I have been in discussion over how we distinguish for manufactured items a physical object, its design, and the function it performs. We propose a series of constraints on their instance and subclass properties, and a simplification of the parochial set of something_type, something_model and something_family classes. We have used military items as exemplars, but the approach would have much wider application. We would appreciate your views at Wikidata:Requests for comment/object vs design class vs functional class for manufactured objects. (talk) 14:26, 26 September 2024 (UTC)

Wikidata MOOC For Beginners (in English) - Starting October 1, 2024!

Hi everyone,

A rerun of the Wikidata Open Online Course will kick off on October 1, 2024, and will be available for the following 5 weeks. The previous iteration of the course saw a great turnout, with positive feedback from learners, including GLAM professionals and students.

Here’s what you can expect:

Course Structure

  • Chapter 1: The Wikimedia Movement and the Creation of Wikidata
  • Chapter 2: Understanding Knowledge Graphs and Queries
  • Chapter 3: Discovering Wikidata, Open Data, and the Semantic Web
  • Chapter 4: Contributing to Wikidata, the Community, and Data Quality
  • Chapter 5: Bonus Resources on Scientific Bibliography from Wikidata

Head over to Wikidata 101: An Introduction to enroll, and don’t hesitate to share it with your friends and colleagues. The course is hosted on learn.wiki, and you can sign up using the same credentials you use for Wikimedia projects.

If you have any questions, feel free to reach out to me directly.

Cheers, Mohammed Abdulai (WMDE) (talk) 19:44, 26 September 2024 (UTC)

ID property for the actual WPBSA site (snooker association)

It seems we have the WST.tv property: World Snooker Tour player ID (P4498) and the SnookerScores.net property: WPBSA SnookerScores player ID (P10857), but we do not have an ID property for wpbsa.com. It appears that wpbsa.com actually contains a significant amount of data, for example: Mark Allen on WPBSA, which is more than on: the same player on WST. Nux (talk) 19:07, 26 September 2024 (UTC)

@Nux You can always propose a new property: Wikidata:Property proposal RVA2869 (talk) 12:55, 27 September 2024 (UTC)
Thanks for the tip :).
Vote or discuss here: Wikidata:Property proposal/WPBSA com player ID :) --Nux (talk) 21:21, 27 September 2024 (UTC)

Surname is a common christian name

The entry here for Sydney Walker Barnaby, here is wrong. His surname is Barnaby, on commons it shows up as a given name? Meanwhile I added it to commons as a surname, but the given name derived from wikidata, still shows as a given name? Why? Broichmore (talk) 17:10, 29 September 2024 (UTC)

Fixed RVA2869 (talk) 17:47, 29 September 2024 (UTC)

FBI file numbers

I’d like to add an FBI file number to a Wikidata profile, ( i.e. 100-HQ-34789, or 92-NY-1456, etc.). However, many FBI files were destroyed or are still classified, so I can’t link the file number to an external copy of the file in every case. I can provide a reference for each file number though.

  1. Is there an existing property, such as “Described by Source” or “Inventory Number”, that could be used for these numbers? If so, would it be best to create a new Item for each FBI file?
  2. If not, would this be appropriate for a new property (something like “Federal Bureau of Investigation File Number”), even if the file numbers won’t link to an external database or site?

Thanks! Nvss132 (talk) 10:40, 26 September 2024 (UTC)

I think (2) is preferred, but you should probably start a property proposal to have a more in-depth discussion about this. I'm not entirely clear what these file numbers identify - they are for individual people? Can one person have more than one file number? Anyway a property proposal discussion would be a good place to clarify current options or whether we really should create a new property for this. ArthurPSmith (talk) 13:17, 27 September 2024 (UTC)
Thanks for responding. After researching this weekend, I don’t think creating a new property will work anymore. Not every FBI file maps to a specific Wikidata item. (For example, FBI file 100-HQ-4869 is on the funding of the Communist Party while file 100-HQ-365088 covers the sale of foreign publications in America.) Since these subjects won’t correspond to one Wikidata item, I think the best solution is to create an item for each file, treating them like individual works. In addition, this also lets people use Template:P1343 to link individual people described in the file who are not the main subject of the file, such as a spouse being described in someone’s FBI file or when a file covers multiple members of an organization. Nvss132 (talk) 00:04, 30 September 2024 (UTC)

Depreciation tag for database entries that were wrongly created due to scraping?

Take a look at Q23649754. There are currently four statements for identifiers that are meant exclusively for video games, not software (Can You Run it ID, HowLongToBeat ID, Lutris game ID and Rock Paper Shotgun game ID). However because these sites scrapes everything from Steam the identifiers were created anyways Trade (talk) 02:57, 29 September 2024 (UTC)

I think these should not be deprecated, unless the website deprecates, redirects or deletes these identifiers themselves. Midleading (talk) 10:44, 29 September 2024 (UTC)
It does create an annoying amount of constraint errors Trade (talk) 18:08, 29 September 2024 (UTC)

Wikidata Weekly Summary #647

allow source(s) to be added to support claim of an "alias"

Currently, an alias is added with no ability to add a "reference" to support that claimed alias. How and do I make this proposal? Thank you, -- Ooligan (talk) 17:53, 30 September 2024 (UTC)

Labels and aliases are different from other properties in that they are mainly for the use of human editors, and are somewhat outside the graph database logic. Generally if you need to be more specific about the name of an item, the period it applies for, or its variants, and provide references, you should be using one of the "name" properties like name (P2561) or official name (P1448), and adding references to those, and repeating the names as aliases so human searches can see them. Vicarage (talk) 17:59, 30 September 2024 (UTC)

Duplicate entries due to ceb wiki?

Landau an der Isar (Q509536) and Landau an der Isar (Q32084506) seem to be the same but ceb.wiki has two articles. Magnus Manske (talk) 09:24, 27 September 2024 (UTC)

I have merged the Cebuano pages into one because they both about the same subject. But the WD items are about the different concepts -- the commune and the centre of the commune. Landau an der Isar is divided into seven settlements (? quarters?), and the main one shares the same name with the commune. See w:de:Landau an der Isar#Gemeindegliederung. --Wolverène (talk) 10:04, 27 September 2024 (UTC)
@Magnus Manske There are a lot of bot created pages in ceb.wiki because of GeoNames see https://www.wikidata.org/wiki/Wikidata:WikiProject_Territorial_Entities/Geonames_and_CebWiki for more background. ChristianKl13:13, 27 September 2024 (UTC)
If you come across more of these, please feel free to add them to the WikiProject Conflation's items too. - Yupik (talk) 23:57, 4 October 2024 (UTC)