User talk:Daniel Mietchen/Archive/2022

From Wikidata
Jump to navigation Jump to search

Wikidata weekly summary #501

Wikidata weekly summary #502

missing authors in paper Q27349821

Hello Daniel, I am writing you since you are the creator of https://www.wikidata.org/wiki/Q27349821, but I guess many others. If you look at the "author" property of the entry you can see that all the authors surnames starts by A,B or C. In fact many are missing (e.g. myself). Do you know why? I guess this kind of entries are imported automatically, so I guess there is a bug somewhere. --Wiso (talk) 12:52, 5 January 2022 (UTC)

@Wiso: Thanks for checking. I tried to fix this but at that scale, our usual author curation workflows tend to break down. So for now, we have the author (P50) information in Measurement of the Higgs boson mass from the H → γ γ and H → Z Z * → 4 ℓ channels in p p collisions at center-of-mass energies of 7 and 8 TeV with the ATLAS detector (Q27349821) and the author name string (P2093) information in Q110449462, and merging them did not work (more details here and here). While we are in touch, I would like to have a chat with you about what the best way would be to handle such cases with thousands of authors and how to leverage information from places like INSPIRE-HEP (Q5972440) for author disambiguation. Would you have time for that over the weekend? --Daniel Mietchen (talk) 23:45, 6 January 2022 (UTC)
Hello @Daniel Mietchen, thanks for the reply. If I have understood correctly Q27349821 and Q110449462 are duplicated, created when trying to move from author name string to author. I guess author is more correct. I am not expert of wikidata, for example no idea how to merge items. I have worked with INSPIRE-HEP api in the past, with some python scripts, it is not complicated. I would say that the main one to resolve author disambiguation is with the orchid-id, which is available by inspire-hep. Chatting is extremely complicated, impossible in the week end. Is there any way to create entries in a sandbox in wikidata? I can try to write a script to import papers from my experiment (ATLAS). Wiso (talk) 14:22, 7 January 2022 (UTC)
@Wiso Just curious: did you write any parts of the paper? (I don't ask which ones) How do the deceased authors contribute? There was some discussion with Daniel at MediaWiki_talk:Gadget-Merge.js#What_to_do_in_case_of_EntityContentTooBigException?. --- Jura 15:08, 7 January 2022 (UTC)
Yes, I am an author. The author list for each paper is frozen at a certain point of the analysis, very close to its release. All the qualified ATLAS authors are included, plus special request (e.g. new student who are not author but who worked on the analysis, ...). Then the document is realeased on arxiv and at the same time it is submitted to the journal. The iteration with the journal take some time (few months). After the iteration, if the paper is accepted by the journal, it is published by the journal. "Deceased authors" are the ones who died between then creation of the author list (so they contributed to the paper) and the publication on the journal. Wiso (talk) 08:59, 11 January 2022 (UTC)

Wikidata weekly summary #503

Wikidata weekly summary #504

Wikidata weekly summary #505

Wikidata weekly summary #506

Wikidata weekly summary #507

Wikidata weekly summary #508

Wikidata weekly summary #509

Hi,

You and you bot created and edited International Society for Therapeutic Ultrasound Conference 2016: Tel Aviv, Israel. 14-18 March, 2016. (Q50025596) extensively. In author (P50) there seem to be several authors listed several times (Kullervo Hynynen X 3, Elisa E. Konofagou X 6, Nathan McDannold X 3, Natalia Vykhodtseva X2 , Hao-Li Liu X 2, Cyril Lafon X 6, Vera Khokhlova X 6 etc.). Could you check this issue? Thanks, DGtal (talk) 07:35, 27 February 2022 (UTC)

@DGtal: Thanks for checking. This actually appears to be correct in the sense that every author of a contribution (poster, oral etc.) to this conference is listed as an author of this publication, and if people like the ones you listed appear on multiple such contributions, they are listed multiple times at the source. I do not like this practice very much myself, but there are quite a few conference-related publications that use this style of authorship that can result in multiple occurrences of the same author on the same paper. --Daniel Mietchen (talk) 19:16, 6 March 2022 (UTC)
I think I understand. So the item doesn't really deal with a scholarly article (Q13442814) but rather is a Conference Summary that should ideally be split up into it's components? DGtal (talk) 07:48, 7 March 2022 (UTC)

Wikidata weekly summary #510

Wikidata weekly summary #511

Wikidata weekly summary #512

Wikidata weekly summary #513

Scholia use profiles

Hi Daniel! I hope you are well. Thanks for your recent comment about Jupyter notebook. Could you please tell me more about your work "improving the Scholia /use profiles"? The test case you mentioned looks amazing. -- Oa01 (talk) 08:06, 25 March 2022 (UTC)

@Oa01:, In essence, the underlying info stems from mining PubMed Central for mentions of Jupyter notebooks AND GitHub and then checking whether said notebooks can actually be found on GitHub. If that was the case, I tagged the paper as describing a project that uses Jupyter notebooks. For some other test cases, see the bottom panel at toolforge:scholia/P4510. --Daniel Mietchen (talk) 17:34, 1 April 2022 (UTC)
Thanks, Daniel. Very cool. Oa01 (talk) 19:33, 1 April 2022 (UTC)

Open Definition conformant license

Hi Daniel, this edit is incorrect. reference URL (P854) should never be used as a qualifier. Probably best to revert this batch. I wonder if instance of (P31) is really the right property to put open license (Q30939938) in. Is this really the identity of the item? I would move it to complies with (P5009) which is a dedicated property to store this kind of data. Multichill (talk) 15:24, 2 April 2022 (UTC)

@Multichill: Thanks for the check, the fix and the suggestion. I am aware that P854 is not to be used as a qualifier, but I was distracted when I set up the batch, so mistyped an S as a P. When I noticed that the batch had mixed up the reference and qualifier tagging, I triggered a batch revert immediately, but this has not finished, and I do not know how I would get it to finish. So I will check the remnants and fix as much as possible. I share your concern about the use of P31 for tagging "Open Definition conformant license" and agree that P5009 (of which I hadn't been aware) could play a role here, but the combination "complies with" "Open Definition conformant license" still seems odd. I would say that this particular license is compliant with the Open Definition, and the nature of this compliance has something to do with "Open Definition conformant license" but I am not sure yet what the best way to model this would be. Also, my reasoning here is for now just based on the English labels. Maybe it makes some more sense in some of the other languages, which I have not checked yet. --Daniel Mietchen (talk) 13:43, 3 April 2022 (UTC)
@Multichill: Turns out that DeltaBot has fixed the remaining cases. --Daniel Mietchen (talk) 15:26, 3 April 2022 (UTC)
@Multichill: In terms of how to model the compliance, what about this way? --Daniel Mietchen (talk) 15:31, 3 April 2022 (UTC)
That looks like a nice solution. Multichill (talk) 16:28, 3 April 2022 (UTC)

Wikidata weekly summary #514

Wikidata weekly summary #515

Oberholser

@Daniel Mietchen: there is a (Harry C. Oberholser (Q67349028)) and a Harry C. Oberholser (Harry C. Oberholser (Q39626)). Wonder if you could take a look at this. Thank you for your time. Lotje (talk) 11:16, 13 April 2022 (UTC)

@Lotje: Thanks for the ping. I merged these two items. --Daniel Mietchen (talk) 01:21, 16 April 2022 (UTC)

Wikidata weekly summary #516

Wikidata weekly summary #517

Wikidata weekly summary #518

Wikidata weekly summary #519

quickstatements; temporary_batch_1633294882827

Hi Daniel! Can you please check to see what did go wrong with this batch? For example: [2] -- Meisam (talk) 07:41, 3 May 2022 (UTC)

Thanks for checking, Meisam, and for the ping. That batch made only a single edit, which was to create that page. This seems to be correct to me. However, I agree that the item would benefit from further annotation, so I added a bit more. Note that the item was originally created for author (P50) statements, the result of which you can explore, for instance, via their Scholia profile. --Daniel Mietchen (talk) 19:56, 5 May 2022 (UTC)
Thank you for taking care of it. I think for these type of items, having a claim like occupation:researcher or a similar thing would have helped a lot. Cheers! -- Meisam (talk) 09:33, 7 May 2022 (UTC)
@Meisam: I opted for occupation: author and did a few batches (example). --Daniel Mietchen (talk) 00:23, 11 May 2022 (UTC)

Wikidata weekly summary #520

author vs researcher

Hi Daniel,

I wonder if this is correct to add author (Q482980) for occupation (P106) for example Katarzyna Danis-Wlodarczyk (Q71100537). author (Q482980) is described as "author or intellectual author of a linguistic work". The reference for this claim is stated in (P248) and a scholarly article. Kpjas (talk) 08:20, 14 May 2022 (UTC)

@Kpjas: Thanks for sharing your thoughts about this. I am not sure either whether this is the way to go, but these edits were triggered by the discussion with @Meisam: above, whose opinion would be welcome here too. --Daniel Mietchen (talk) 20:05, 20 May 2022 (UTC)
@Sic19: used to add researcher (Q1650915) to this kind of items which I find it quite reasonable. -- Meisam (talk) 08:04, 22 May 2022 (UTC)

Wikidata weekly summary #521

Wikidata weekly summary #522

Wikidata weekly summary #523

A new Abstract Wikipedia and Wikifunctions update is out

The new Abstract Wikipedia and Wikifunctions Update is out! Please, come and read it! In this issue: an essay from Denny, development updates.

Subscribe · Translate this message


--MediaWiki message delivery (talk) 13:35, 8 June 2022 (UTC)

Wikidata weekly summary #524

Lack of information

You've created Jennifer Couzin-Frankel (Q64692845) which is totally lacking information.--Lovehansa (talk) 09:39, 14 June 2022 (UTC)

@Lovehansa: Thanks for reviewing that item. I agree it could be expanded a lot (and I added a bit), but there is quite a bit of information in items linked to it, as can be explored via WhatLinksHere or via tools like Reasonator or Scholia. --Daniel Mietchen (talk) 01:21, 19 June 2022 (UTC)

Wikidata weekly summary #425

Wikidata weekly summary #426

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On June 10, an update regarding Wikifunctions usability tests has been published (the full report is also available).

On June 21, an update regarding manually-written articles in Abstract Wikipedia has been published (for how AW will deal with model articles, read the earlier update).

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 16:21, 30 June 2022 (UTC)

A new Abstract Wikipedia and Wikifunctions update is out

The new Abstract Wikipedia and Wikifunctions Update is out! Please, come and read it! In this issue: development updates, team updates.

Subscribe · Translate this message


--MediaWiki message delivery (talk) 16:38, 1 July 2022 (UTC)

A new Abstract Wikipedia and Wikifunctions update is out

The new Abstract Wikipedia and Wikifunctions Update is out! Please, come and read it! In this issue: development updates, team updates.

Subscribe · Translate this message


--MediaWiki message delivery (talk) 16:38, 1 July 2022 (UTC)

Wikidata weekly summary #522

Wikidata weekly summary #528

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On July 12, an update regarding the latest team and development news has been published.

On July 15, an essay regarding the potential of Abstract Wikipedia, written by Denny, has been published.

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 10:40, 17 July 2022 (UTC)

Wikidata weekly summary #529

Wikidata weekly summary #530

Research bot

Hi Daniel, Your research bot has recently marked (last 48 hours) on wikidata several Australian native plant taxa as invasive to Australia . Why is it doing this, and can you please halt it? Junglenut (talk) 11:17, 21 July 2022 (UTC)

@Junglenut: Thanks for your note. The bot has added invasive to (P5588) statements (example) based on entries in the database SInAS database of alien species occurrences (Q112339024), and the file "SInAS_AlienSpeciesDB_2.4.1.csv" in particular. Can you give some examples where you think these statements would be wrong? Like with any database, some entries are likely to be problematic, and I hope we can help address at least part of such problems. --Daniel Mietchen (talk) 11:47, 26 July 2022 (UTC)
The ones that I am aware of are Ptychosperma elegans, Ptychosperma macarthurii, Eupomatia laurina, Pseuderanthemum variabile, Angiopteris evecta, Ficus drupacea, Ficus platypoda, Toona ciliata, and Pipturus argenteus. All of these are valid Australian natives, and two of them are endemic.
There may be others as well, given that I'm not subscribed to every Australian taxon. But whatever the source your bot is using it seems fairly obvious that either the database itself or the data extraction is inaccurate, and the bot should be halted while the issue is sorted out. — Junglenut (talk) 11:58, 28 July 2022 (UTC)
Apologies - the list I gave you just now was a full list of the ones for which I received notifications about edits by the bot, but not necessarily marked as invasive to AU. Only the following were incorrectly edited by the bot: Eupomatia laurina, Pseuderanthemum variabile, Toona ciliata and Pipturus argenteus, I have removed those statements on the relevant pages, however there still exists the possibility of other taxons being incorrectly labelled. Junglenut (talk) 12:59, 28 July 2022 (UTC)
@Junglenut: Thanks for checking and for the fix. As just stated in the other thread above, I'm reviewing the workflows, but I think marking the problematic statements as deprecated is the best we can do for now. Thanks again for curating in this area. --Daniel Mietchen (talk) 12:11, 29 July 2022 (UTC)

Wikidata weekly summary #531

Invasive to its native place?

Hi Daniel, your bot just added a bunch of invasive to (P5588) values to Dracaena draco (Q157952), including Canary Islands (Q5813). That can't be right - it's a native (and endangered) plant here, not an invasive one! Can you double-check please? Thanks. Mike Peel (talk) 06:08, 18 July 2022 (UTC)

Just to ping you again about this? Thanks. Mike Peel (talk) 14:56, 26 July 2022 (UTC)
Hi Mike, thanks for the note and the additional ping. I'll look into this and fix as needed. For background, see here. --Daniel Mietchen (talk) 01:24, 28 July 2022 (UTC)
@Mike Peel:. I am still checking the workflows, but in the meantime, I have marked this one as deprecated for now. Thanks again for the quality check. --Daniel Mietchen (talk) 12:08, 29 July 2022 (UTC)
Thanks! I see the bot has now removed the values - except for the deprecated one! So it now looks odder. :-) I've just removed the deprecated value, just mentioning this in case it would have happened elsewhere too. Thanks. Mike Peel (talk) 18:43, 5 August 2022 (UTC)

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On July 20, an update regarding the latest additions to the team has been published.

On July 29, an essay regarding long typed lists and how to deal with them, written by Denny, has been published.

On August 5, an essay regarding a proposed launch plan for Wikifunctions, written by Denny, has been published.

Enjoy the reading!

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 09:16, 5 August 2022 (UTC)

Wikidata weekly summary #532

Wikidata weekly summary #533

Wikidata weekly summary #534

Call for participation in the interview study with Wikidata editors

Dear Daniel Mietchen,

I hope you are doing good,

I am Kholoud, a researcher at King’s College London, and I work on a project as part of my PhD research that develops a personalized recommendation system to suggest Wikidata items for the editors based on their interests and preferences. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I would love to talk with you to know about your current ways to choose the items you work on in Wikidata and understand the factors that might influence such a decision. Your cooperation will give us valuable insights into building a recommender system that can help improve your editing experience.

Participation is completely voluntary. You have the option to withdraw at any time. Your data will be processed under the terms of UK data protection law (including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018). The information and data that you provide will remain confidential; it will only be stored on the password-protected computer of the researchers. We will use the results anonymized (?) to provide insights into the practices of the editors in item selection processes for editing and publish the results of the study to a research venue. If you decide to take part, we will ask you to sign a consent form, and you will be given a copy of this consent form to keep.

If you’re interested in participating and have 15-20 minutes to chat (I promise to keep the time!), please either contact me at kholoudsaa@gmail.com or use this form https://docs.google.com/forms/d/e/1FAIpQLSdmmFHaiB20nK14wrQJgfrA18PtmdagyeRib3xGtvzkdn3Lgw/viewform?usp=sf_link with your choice of the times that work for you.

I’ll follow up with you to figure out what method is the best way for us to connect.

Please contact me using the email mentioned above if you have any questions or require more information about this project.

Thank you for considering taking part in this research.

Regards

Kholoud

Not sure what came out of that interview, but I guess this can be archived by now. --Daniel Mietchen (talk) 15:56, 25 August 2022 (UTC)

Hi Daniel, is he distinct from Joseph Lau (Q85265213)?-- Culinary Specialist (talk) 16:23, 11 August 2022 (UTC)

@Culinary Specialist: not sure yet, but I've been sing the Author Disambiguator to try and sort this out a bit better. --Daniel Mietchen (talk) 15:55, 25 August 2022 (UTC)

ambiguous items

Hello. Just a friendly reminder to add a couple identifying properties, labels, and/or external identifiers to human items such as Brittany Rohl (Q112118368), so that they can be identified and distinguished more easily, and don't get mistakenly merged with other items (especially important in biomedical articles where there may be hundreds of authors in the same field with similar names). I added a Google Scholar ID to Kathleen M. Collins (Q112118365) to aid in disambiguation. Cheers. -Animalparty (talk) 23:41, 15 August 2022 (UTC)

Thanks for the reminder. I am aware of this and usually add multiple links to such items when I create them, and for a while, those bare items have also been enriched with a third statement (details). In any case, I made some edits to Brittany Rohl (Q112118368). --Daniel Mietchen (talk) 15:26, 25 August 2022 (UTC)

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On August 9, an update regarding the launch of Wikifunctions Beta has been published.

On August 19, an update regarding the proposed template language syntax for Wikifunctions and the Wikimania 2022 sessions dedicated to Wikifunctions and Ninai-Udiron has been published.

Enjoy the reading!

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 18:06, 29 August 2022 (UTC)

Wikidata weekly summary #535

Wrong date imported by Research bot

Hi! I have noticed a few time already that ResearchBot import a publication date that isn't the correct one. See e.g. Q113203530, which I'll leave uncorrected as a case study. I'm not sure what is causing it but I would appreciate if you could have a look. Thanks! --Jahl de Vautban (talk) 10:14, 31 August 2022 (UTC)

Thanks for the quality check, Jahl de Vautban! The problem has been identified and likely fixed in the code (we're still doing some tests), but it will take some time to fix the corresponding statements (we're looking into this too). In any case, from now on, future imports should not have this problem any more. --Daniel Mietchen (talk) 22:07, 1 September 2022 (UTC)

Wikidata weekly summary #536

ResearchBot adding references to articles

Thanks for your work on this. We know this is a gigantic task, as I'm doing the same semi-automatically for specific topics. So I noticed it is very useful to have all references from review articles (P31 Q7318358).

My suggestion would be that you prioritize your work to add refs to review articles first. SCIdude (talk) 07:55, 8 September 2022 (UTC)

Hi SCIdude — thanks for thinking and editing along. Can you provide some more details and/ or examples? "add refs to review articles first" can be interpreted in multiple ways, and I'm not sure I can be of much help with any of them at this point. --Daniel Mietchen (talk) 20:52, 10 September 2022 (UTC)
I'm assuming your bot adds P2860 statements to article items (P31 Q13442814).. Review article items (P31 Q7318358) are a subset. My suggestion is to work on this subset first. SCIdude (talk) 06:50, 11 September 2022 (UTC)
SCIdude: The bot rarely does cites work (P2860) edits but if it does, I will try to give special attention to review articles. --Daniel Mietchen (talk) 21:41, 11 September 2022 (UTC)
Thanks.

Wikidata weekly summary #537

Wikidata weekly summary #538

Wikidata weekly summary #539

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On September 23, an update regarding the state of Abstract Wikipedia's Natural Language Generation workstream has been published. This entry is also available as a Wikimedia Diff blog post.

On September 27, an update regarding how to create a Wikifunctions guideline for staff contribution has been published. Please, take some time to contribute to the discussion.

Enjoy the reading!

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 10:53, 29 September 2022 (UTC)

Wikidata weekly summary #540

Duplicate item for scholarly article?

I happened to notice that A case study of the Hirsch index for 26 non‐prominent physicists (Q114267509) and A case study of the Hirsch index for 26 non-prominent physicists (Q114267510) had the same title, and after digging a bit I discovered that Q114267510 has a DOI pointing to https://onlinelibrary.wiley.com/doi/10.1002/andp.20075190903, which has the title "A more direct representation for complex relativity". I think Research Bot somehow has a bug here that caused the second item to end up with the same name as the first, despite intending to represent a different article entirely?

Also, thank you for running Research Bot, I've been following it since I've been hoping we can hit 100 million items in time for Wikidata's 10th birthday :) Just want to make sure we do so without adding bad data.

Best, Nicereddy (talk) 13:54, 28 September 2022 (UTC)

@Nicereddy: This one was a bit complicated, but here is what I think happened under the hood: in short, it's again (just as with the HTML tags discussed in the thread below) a problem originating at from the Crossref record, and I do not see an automated way to handle such cases at the moment while reusing their metadata. I'll think about it, though, and leave the two items alone for the moment. I did leave a comment at Talk:Q114267510, though. --Daniel Mietchen (talk) 21:40, 9 October 2022 (UTC)

Wikidata weekly summary #541

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On September 30, an update regarding Cory Massaro's arts residency in Istanbul has been published.

On October 5, a comprehensive update about the last two months of Abstract Wikipedia and Wikifunctions technical developments has been published.

Enjoy the reading!

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 10:59, 13 October 2022 (UTC)

adding references

Up to now, to find articles missing references, I just excluded any article with any number of references. But recently seeing articles with very few refs added by you (Q91562400, Q22809915) I have to reconsider. Out of curiousity, why don't you use crossref, scholar or semanticscholar? SCIdude (talk) 10:28, 16 October 2022 (UTC)

@SCIdude: Sometimes, I am just working on citations to a particular set of items rather than citations from them, which might explain the patchiness in cases like these. Another thing to keep in mind is that the references cited from a particular item may well not all be in Wikidata at a a given time. That said, I have not used wd-cli much but would be happy to give it a try for handling citations. Can you point me to the code underlying some of your relevant sample edits? Apart from that, perhaps it's time for us to have some synchronous exchanges to coordinate our respective Wikidata activities. --Daniel Mietchen (talk) 14:01, 16 October 2022 (UTC)
Actually most what I'm doing in general is semi-automatic. For example the script [6] needs manual creation of missing articles (e.g. run, copy DOIs into sourcemd, wait, continue) before writing the commands with all citations, which then can be used as input for `wd ee`. In general I see no other way than that. Using the Unix command line, where previous commands can be easily edited and run helps tremendously. If you browse the repo given above you'll see that small scripts as parts of workflows are doing everything. The modular workflow paradigm is unavoidable with complex tasks, in my opinion. Monolithic software will get you in a cul-de-sac. SCIdude (talk) 14:33, 16 October 2022 (UTC)

Wikidata weekly summary #542

Wikidata weekly summary #543

Italics in paper names

Anastasia Denisova. Internet Memes and Society: Social, Cultural and Political Contexts (Q114394609), Fossil Folklore from India: The Siwalik Hills and the Mahâbhârata (Q114394606), and H.M. ARMED SHIP VIGILANT, 1777–1780 (Q114394603) all were imported with "i" HTML tags in their titles. These should probably be fixed, along with any others that were created recently with this problem (there are a few I saw that I didn't link here). Nicereddy (talk) 02:07, 4 October 2022 (UTC)

And a similar problem with Multilayer nanogranular films (Co40Fe40B20)50(SiO2)50/α-Si:H and (Co40Fe40B20)50(SiO2)50/SiO2: Magnetic properties (Q114394646). Nicereddy (talk) 02:10, 4 October 2022 (UTC)
For me it looks more like a limitation of MediaWiki, not a problem of item. But I've seen a lot of similar problems and I wonder if it is even technically possible to convert all current cases into en:Unicode subscripts and superscripts. Lockal (talk) 03:09, 4 October 2022 (UTC)
@Nicereddy, Lockal: Thanks both for looking into this. It is a problem I have on my radar, though it is not specific to my bot (for examples from other bots, see Tripfordines A−C, Sesquiterpene Pyridine Alkaloids from Tripterygium wilfordii, and Structure Anti-HIV Activity Relationships of Tripterygium Alkaloids (Q113960369), Functional Characterization of the Lin28/let-7 Circuit During Forelimb Regeneration in Ambystoma mexicanum and Its Influence on Metabolic Reprogramming (Q104488932) or Taxonomy, ecology, and conservation status of Philippine Rafflesia (Rafflesiaceae) (Q54250206)), as the titles are actually stated with the Markup in the databases queried, be that Crossref (example) or Europe PMC (example). A fix is not straightforward, since there are a number of edge cases that complicate things, and I am using Wikidata Integrator (Q31743627), which does the heavy lifting. There is also the issue of how best to handle such corrections in terms of deprecations or preferred statements and the suitability of the cited source (see my notes). In any case, I am doing regular rounds of cleanups for titles containing HTML tags like italics, bold or line breaks and others for which title (P1476) has constraint statements. This is not the case for <sub> and <sup> yet, so we should probably add them there too, especially since there are much higher numbers of them (currently 3k for sup and 2k for sub). Help with any of that would be appreciated. --Daniel Mietchen (talk) 11:20, 9 October 2022 (UTC)
@Nicereddy, Lockal: I wrote a maintenance query that produces QuickStatements (V1) commands that do the following (sample edits):
  • search for articles with <sub> or </sub> tags in their title
  • create a new title and English label in which the tags are replaced with single spaces
  • move the old titles to title in HTML (P6833) (qualifier) statements
Some of the things I am not sure:
  • whether <sub> or </sub> tags should be replaced by nothing or by single spaces (I went for single spaces here but this may actually be more appropriate for <i> or </i> tags)
  • how to expand this to automatically detect and cover all languages in which the titles and/ or labels contain the respective target tags
  • whether the references under the title statements should be kept, given that they state the title with tags, while we then use it without
  • whether we should not delete the old title and instead better use deprecated/ preferred ranks, which I think QuickStatements cannot currently handle
Apart from that, the approach seems applicable for other HTML tags too.
--Daniel Mietchen (talk) 21:03, 9 October 2022 (UTC)
I think it would be much better to use unicode subscripts and superscripts instead of just removing all tags, e. g.: Multilayer nanogranular films (Co₄₀Fe₄₀B₂₀)₅₀(SiO₂)₅₀/α-Si:H and (Co₄₀Fe₄₀B₂₀)₅₀(SiO₂)₅₀/SiO₂: Magnetic properties. At least for numbers and some basic Latin characters. I bet it would cover 99% of titles with subscripts and superscripts. And for the rest decide case by case. Lockal (talk) 02:54, 10 October 2022 (UTC)
There are 66k labels with HTML tags at this moment: https://quarry.wmcloud.org/query/67990 Lockal (talk) 04:28, 10 October 2022 (UTC)
Came across this thread as I am facing similar issue (https://www.wikidata.org/wiki/Topic:X5q2qm0bxy8mujam), if there is a consensus on how to best handle html tags (not being limited to sub/supscript numbers), happy to learn! AdrianoRutz (talk) 17:24, 26 October 2022 (UTC)

Research Bot is adding a written language to paintings

This edit doesn't make sense: Eenglish added to painting. Jane023 (talk) 18:45, 26 October 2022 (UTC)

@Jane023:: Thanks for the hint. Fixed. --Daniel Mietchen (talk) 02:24, 28 October 2022 (UTC)

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On October 14, we welcomed Stef Dunlap in our team as Quality and Test Engineer.

On October 20, an update regarding a new Wikifunctions demo (also available here) has been published.

On October 27, we shared a summary of what we achieved thanks to the efforts of six Google Fellows, whose fellowship has come to an end.

Enjoy the reading!

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 11:17, 28 October 2022 (UTC)

empty items

It would be nice if you would add at least one statement to items that you create. Like you made hydraulic trait (Q113024585) and linked a ton of things to it but you didn't add any statements. Generally items without statements aren't desired. BrokenSegue (talk) 17:43, 26 September 2022 (UTC)

Thanks for the note, BrokenSegue. I completely agree that items should have at least one statement. Not sure why I left out this one — thanks for the fix. --Daniel Mietchen (talk) 11:20, 27 September 2022 (UTC)

This happened again with impact of disasters on cultural heritage (Q108829176). It's not helpful to have items with no statements. BrokenSegue (talk) 16:44, 31 October 2022 (UTC)

Wikidata weekly summary #544

wrong main subject (P921) statement

Hi Daniel, this edit is wrong, please fix. Hint: the membranes that are fused here are different from those at viral entry. SCIdude (talk) 07:16, 22 October 2022 (UTC)

Here is another: 2

@SCIdude: Thanks for catching that. I see what you mean and will think about a fix. --Daniel Mietchen (talk) 14:59, 4 November 2022 (UTC)

John Cupitt

Hi Daniel,

John R. Cupitt (Q110741611) has no information – maybe it's a duplicate to John R. Cupitt (Q102256335)? Jørund Viktoria Alme (talk) 11:04, 3 November 2022 (UTC)

@Jørund Viktoria Alme: Thanks for the pointer. After taking a closer look, the two turned out to be indeed about the same person. Note that each page here also has a "What links here" link in the sidebar, which may well provide additional information relevant to improving the item at hand. For people specifically, it is also often worthwhile to check out their Scholia profiles, in this case toolforge:scholia/author/Q102256335. --Daniel Mietchen (talk) 14:58, 4 November 2022 (UTC)

Empty item

Hello,

may you give more information on Xiangguo Qiu (Q66680131)?-- Aunty Gormint (talk) 20:20, 6 November 2022 (UTC)

@Aunty Gormint: Done. See also their Scholia profile at toolforge:scholia/author/Q66680131. --Daniel Mietchen (talk) 23:16, 6 November 2022 (UTC)

Wikidata weekly summary #545

Research Bot adds number of pages twice

Hi! I noticed that in some cases Research Bot adds twice the number of pages, once with and the other without a reference ; see e.g. Q64368710#P1104. Could you have a look at it ? Thanks ! --Jahl de Vautban (talk) 17:44, 10 November 2022 (UTC)

@Jahl de Vautban: Thanks for the hint. This was due to a missing "DISTINCT" in the query I used to generate the edit commands. This is now fixed, and I will look into cleaning up the leftovers. --Daniel Mietchen (talk) 19:48, 10 November 2022 (UTC)

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On November 4, an essay from Denny about the possible evolutions of natural language generation tasks on Wikifunctions was published.

On November 9, an update about a new browser-based tool, Form Checker, was published (instructional video available here).

Also, we have a new channel on IRC, called "wikipedia-abstract-tech", which is also bridged with Telegram. Find out more about (and maybe join) our official channels here.

Enjoy the reading!

Subscribe · Translate this message

--MediaWiki message delivery (talk) 10:19, 11 November 2022 (UTC)

Wikidata weekly summary #546

Wikidata weekly summary #547

New Abstract Wikipedia and Wikifunctions updates are out

There are new updates for Abstract Wikipedia and Wikifunctions! Please, come and read them!

On November 17, we announced the beginning of the consultation about Wikifunctions' Code of Conduct. You are invited to express your opinions in the talk page.

On November 23, we posted a brief newsletter announcing the interview to former Google Fellow and returning Wikipedian Ori Livneh. You can read his full interview on the Diff blog.

Also, we remind you that we have a new channel on IRC, called "wikipedia-abstract-tech", which is also bridged with Telegram. Find out more about (and maybe join) our official channels here.

Enjoy the reading!

Subscribe · Translate this message

-- MediaWiki message delivery (talk) 10:11, 28 November 2022 (UTC)

Wikidata weekly summary #548

Wikimedians for Sustainable Development - November 2022 Newsletter

This is our twentyfirst newsletter, covering November 2022. This issue has news related to SDGs 3, 5, 11, 13, 14, 15, 16 and 17.

Activities

  • Current: Women in Climate Change editathon in various languages (1 December 2022 - 31 January 2023) (SDG 5 & 13) [17]
  • Past: Wikipedia and sustainability, how to increase knowledge on climate change? (SDG all) [1]
  • Past: Wikidata editathon in Swedish during COP 27 (SDG 13) [2]

News

  • Wikimedia Nederland signed the Wikimedia Affiliates Environmental Sustainability Covenant (SDG all) [3]
  • WikiProject Govdirectory was featured as Digital Public Goods (SDG 16) [11]
  • The theme for Wiki Loves Africa 2023 has been revealed: Climate and weather (SDG 13) [16]

Resources

  • Let's Connect learning report - pilot phase 2022 (SDG 17) [15]

Research

  • Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity publishing [SDG 15) [7]
  • A longitudinal analysis of gender diversity in Wikipedia articles (SDG 5) [13]

Videos

  • 10 Anos de Wikidata: Wikidata for Open Access to Cultural Heritage, por Susanna Ånäs (AvoinGLAM) (SDG 11) [14]

New WikiProjects

  • Myosotis Pilot (SDG 15) [12]

New Wikidata properties

  • Bank of information on the historical and cultural heritage of the Republic of Belarus (SDG 11) [4]
  • Louisiana Plant ID (SDG 15) [5]
  • DrugCentral ID (SDG 3) [8]
  • Probes And Drugs ID (SDG 3) [9]

New Wikidata query examples

  • Map of water bodies in France (SDG 14) [6]
  • Gender balances of winners of literary awards (SDG 5) [10]

Links

This message was sent with Global message delivery by Ainali (talk) 22:05, 1 December 2022 (UTC)ContributeManage subscription

A new Abstract Wikipedia and Wikifunctions update is out

There is a new update for Abstract Wikipedia and Wikifunctions. Please, come and read it!

In this issue: news about Natural Language Generation (NLG) meetings, a new Mastodon account for the project, and the latest development updates.

Also, we remind you that:

Enjoy the reading! -- MediaWiki message delivery (talk) 10:11, 3 December 2022 (UTC)

Subscribe to the newsletter

ResearchBot: Book review items

Hello, the bot created several items with corrupted DOI codes:

The items do not have some links or names that allow clearly identify the entities. instance of (P31) is looked wrong also. Could you fix or delete the items? — Ivan A. Krestinin (talk) 11:10, 3 December 2022 (UTC)

@Ivan A. Krestinin: Thanks for the checking and notification. I slightly reformatted your list and am looking into the matter. --Daniel Mietchen (talk) 13:52, 3 December 2022 (UTC)
Looks like the journal changed ownership, and the new owners discarded the original DOIs in favour of new ones. --Daniel Mietchen (talk) 15:19, 3 December 2022 (UTC)
Thank you! Usually DOI codes do not use "&" symbol. Most probably the values are incorrect, not deprecated. — Ivan A. Krestinin (talk) 15:47, 3 December 2022 (UTC)

Wikidata weekly summary #549

Wikidata weekly summary #550

Please respond

Please respond to the comments at Wikidata:Administrators'_noticeboard#Lexeme_creations_by_Daniel_Mietchen. Thank you. BrokenSegue (talk) 21:50, 16 December 2022 (UTC)

@BrokenSegue: Thanks for the notification - will do. --Daniel Mietchen (talk) 09:29, 17 December 2022 (UTC)

Wikidata weekly summary #551

Hi Daniel,

are the obituaries correctly placed? Seditious Joe (talk) 17:20, 20 December 2022 (UTC)

@Seditious Joe: Yes, I think adding a described by source (P1343) statement to the person's item and using the item for the obituary as value of that statement seems correct. You can also add a main subject (P921) statement to the obituary's item, linking back to the person, and the obituary's item can be marked as an instance of an obituary. --Daniel Mietchen (talk) 20:14, 20 December 2022 (UTC)
PS: For Australians, there is also Obituaries Australia ID (P9232). --Daniel Mietchen (talk) 20:43, 20 December 2022 (UTC)

Undefined lexemes

Please improve the quality of the lexemes you're adding. Adding lexemes with only forms, and no senses, identifiers or even backlinks to help people understand the meaning of the lexeme and then leaving them for other people to fill in for you is not fair on other editors, particularly when done repeatedly and/or in large numbers. - Nikki (talk) 02:38, 2 December 2022 (UTC)

@Nikki: Saw this only now by way of this comment. Will address things over there first and then come back here for anything else that may need addressing. --Daniel Mietchen (talk) 09:34, 17 December 2022 (UTC)
@Nikki: I am moving my lexeme-related comments from there to here, so that we can work out next steps here too.
  • I agree that lexemes should be linked to structured information as much as possible. If you follow my edits in the lexeme namespace, then you will have noticed that I am doing senses, etymology, compounds, pronunciation and similar on a regular basis. Not sure how best to give an overview here (the contributions in the lexeme namespace currently times out), but perhaps things like the NavelGazer for P5137 might serve as an indication. I am not doing much in terms of identifiers, simply because I am not aware of good workflows for that — pointers appreciated.
  • When I am adding structured information to lexemes, I do not pay much attention to who created the respective pages, other than sending an occasional "Thank you" to contributors that are new to me in lexeme contexts. This is partly because I work on lexemes as I come across them in other contexts (e.g. while reading a paper or working on a Wikipedia article), rather than systematically, and partly because my workflows for creating lexemes (e.g. via Ordia's text to lexemes) are different than those for annotating lexemes (e.g. via MachtSinn), and when I am doing one, I concentrate on that.
  • For lots of lexemes, it is not immediately possible to link them to a sense (e.g. verbs, or nouns with a meaning more granular than the Wikidata items in this space), compound, etymology or so, and immediately going off to create these would again disrupt the lexeme creation workflows, so I leave that for later. This seems to be the core of Nikki's criticism, and having cleaned up or annotated lots of entities set up by others in the various namespaces myself, I have an idea about the frustration that this can cause. However, I think it is important for the project to allow this kind of incompleteness as an entry point for contributions and contributors. There may well be people who prefer to annotate existing lexeme entries rather than creating new ones (verbs have many forms in many languages, and nouns or adjectives in many languages can have many forms too, or a user's knowledge of the language in question might not be sufficient to get all the forms right), and we should be open to this way of contributing, which rests on complimentary contribution pathways, e.g. "simply" creating the lexemes and their respective forms.
  • Working on automation myself both on Wikidata and beyond, I am aware of the benefits of using identifiers and other aspects of structured data, but as stated above, I am not aware of relevant workflows for lexemes. Via things like Scholia's curation pages (example), I help build curation workflows, and closer integration with lexemes (both missing and existing) is being considered. While that has not seen much progress lately, I am planning to get back to this in the new year, and testing the existing workflows that I know is part of that, so that they can be more closely integrated with the new ones.
  • I am aware of some bots active in the lexeme space (e.g. User:Elhuyar Fundazioa bot or User:Uzielbot) but not for German, and I am not sure what the workflows are that Nikki uses to add, say, Duden lexeme ID (P8376) statements to lexeme entries. In any case, I would welcome more bot jobs that assist in curating lexeme entries, both existing and missing, and ideally in ways that work across languages.
  • As an aside, I think the lexeme equivalent of an empty item is a lexeme without forms, and while I do not like empty items either, I think it's better to have them than having no entry for a given concept, since queries allow to list all entities with no statements at all but not all entities that have no entry.
To sum up, I think your criticism is justified but the solution is not for me to stop creating bare lexeme entries. Instead, we should coordinate on improving the workflows in this space.
--Daniel Mietchen (talk) 11:50, 22 December 2022 (UTC)

I had no further information and merged this duplicate just by its name. Correct? Seditious Joe (talk) 17:39, 20 December 2022 (UTC)

In general, merging duplicates just by the name is not advisable for people, and before any such merge, it is important to check both items as well as their "what links here" pages. In the case of Special:WhatLinksHere/Q115365471, this gave 106 publications, but since they all seemed to fall into the realm of ophthalmology and thus just the field that Michael F. Chiang (Q60651503) is active in, I have moved them over from Q115365471 to Q60651503 too. A good tool for that — and generally for author disambiguation — is the WD:Author Disambiguator, and a companion tool to check out the individual merge candidates (or indeed anyone known to Wikidata as an author) is WD:Scholia. Both tools are linked in various ways, e.g. via Scholia's curation pages (example), which complement the profile pages (for the same example) for most of the ca. 30 profile types that Scholia offers. --Daniel Mietchen (talk) 20:48, 20 December 2022 (UTC)
Es ist übrigens auch sehr mutig, Items wie Q115365471 ohne irgendeine Aussage über die Person zu erstellen. Lass dich nicht ärgern, --Seditious Joe (talk) 21:01, 20 December 2022 (UTC)
Es geht ja hier um verlinkte Daten, da muss also nicht immer alles auf einem Blatt stehen bzw. Werkzeuge wie die obengenannten oder WD:Reasonator können sich die Informationen aus verschiedenen Seiten zusammenbasteln. --Daniel Mietchen (talk) 21:29, 20 December 2022 (UTC)
Ich empfinde es schon als unhöflich bis frech, Datenobjekte anzulegen und andere dann die Arbeit machen zu lassen. Speziell wenige Tage nach einer Teilsperre des eigenen Accounts. --Emu (talk) 21:37, 21 December 2022 (UTC)
Es ist nicht meine Absicht, mit der Anlage solcher Objekte anderen Mehrarbeit zu verursachen, und ich beteilige mich umfänglich an der weiteren Ausgestaltung diverser Klassen von Datenobjekten, unabhängig davon, wer die einzelnen Seiten jeweils angelegt hat. --Daniel Mietchen (talk) 11:42, 22 December 2022 (UTC)

Wikidata weekly summary #552