User talk:Matěj Suchánek

About this board

Archives of very old discussions are available:

Label additions by MatSuBot

3 comments • 17:13, 28 April 2024 4 days ago

3

Data Consolidation Officer (talkcontribs)

Hi Matěj Suchánek, you probably know this, but in most Wikipedias (including the German one), additions in brackets are used for mere disambiguation and shouldn’t be part of the label.

Your bot doesn’t appear to take this into account, though; here, for example, the label should just have been “Tare” (without the “(Würzsauce)” part). Would it be possible to change the bot so that it strips trailing additions in brackets from sitelinks when creating labels? --Data Consolidation Officer (talk) 10:35, 21 April 2024 (UTC)

Reply 10:35, 21 April 2024 12 days ago

Matěj Suchánek (talkcontribs)

Hi. Yes, I have been aware of this ever since I started these imports. Unfortunately, it's complicated, there are also many cases where the trailing brackets should be kept. Most recently: (German Wikipedia). More: .

Therefore, my strategy is to import the label with the disambiguator and then I apply some rules to determine if it's redundant.

In your case, I kept it because there was no German description. The idea is to motivate people to insert a description while also fixing the label manually. If there was a German description, I would remove it if either "Würzsauce" was found in the description or most other languages with labels starting with "Tare" didn't include any disambiguator.

Reply 08:27, 28 April 2024 5 days ago

Data Consolidation Officer (talkcontribs)

Yes, that seems like a reasonable heuristic. I don’t agree that the bracketed part should have been kept in District 1 (Düsseldorf) (Q551600) (Düsseldorf is already present in the description and that’s sufficient, imho), but if that’s the consistent handling of Düsseldorf’s districts, then it’s OK for now. I’d still say that such cases are outliers rather than being usual, at least for German Wikipedia. Yet, I can’t come up with a better solution for now, at least where descriptions are missing (automatically importing those is obviously not so easy).

Reply 17:13, 28 April 2024 4 days ago

Reply to "Label additions by MatSuBot"

translated text

3 comments • 15:41, 13 April 2024 20 days ago

3

EncycloPetey (talkcontribs)

Your bot just made a HUGE number of edits like this one that go against agreed upon standards at WikiProject:Books. For all of the instance of (P31) usages, the value should instead be version, edition or translation (Q3331189).

For example Prometheus Bound (Q24063711) is a translation, but it's also an edition of that translation. Likewise, Śakoontalá; or, The Lost Ring (Q51107450) is both a translated text and is the fourth edition of that translation.

I'm not sure how many of these are the result of incorrect vlues being added in the past, but version, edition or translation (Q3331189) is now the agreed standard for instance of (P31). --EncycloPetey (talk) 05:43, 13 April 2024 (UTC)

Reply Edited 05:51, 13 April 2024 20 days ago

Matěj Suchánek (talkcontribs)

Q39811647 was merged and redirected to Q21112633, my bot just fixes the redirects. If the change is not appropriate, the items shouldn't have been merged (and therefore claimed equivalent).

Reply 06:26, 13 April 2024 20 days ago

EncycloPetey (talkcontribs)

The merge is correct for the two items, but its use was not. Part of the reason for the merge was its frequent misuse on multiple data items.

Reply Edited 15:41, 13 April 2024 20 days ago

Reply to "translated text"

ISNI format abuse filter 110

18 comments • 14:44, 10 March 2024 1 month ago

18

CV213 (talkcontribs)

Special:AbuseFilter/history/110 - you created it. In 2023-12 ISNI format changed to no spaces. Can you adjust the filter and re-enable?

Will it only give a warning when changing from no-space-format to space-format? If yes, can there also be made another filter that warns users not to insert ISNI with spaces?

Reply 12:07, 2 February 2024 3 months ago

Matěj Suchánek (talkcontribs)

I flipped that condition. I will have it run without warning for some time. (Remind me if I forget to reinstate that.)

I don't think it's necessary to have another filter for that. We have bots automatically fixing that, or we can also extend that filter.

Reply 11:23, 3 February 2024 3 months ago

CV213 (talkcontribs)

Thank you, I tried it, it works!

Can it catch more, any change away from correct regex? See .

Regarding warning when adding it in spaced format:

If people don't get a warning, they may go on forever, causing avoidable edits by bots. On top, they are less likely to notice the creation of duplicates, which can be detected if the ISNI is already on another item. This may result in extended work on an item and work of an uninvolved editor checking duplicates.

The primary source has the ISNI easily available without spaces, people that add ISNI manually should always check the primary source anyway.

There is also unnecessary clutter on DB CV reports (example) interfering with the work of editors checking the diffs and working on removing violations, not necessarily these violations, but any.

Reply Edited 11:48, 3 February 2024 3 months ago

CV213 (talkcontribs)

The regex for ISNI is /[0-9]{15}[0-9X]/. In P213 the current regex is /[0]{7}[0-9]{8}[0-9X]/, which is probably OK for some time, as position 8 in new ISNI is currently "5" so requiring 7 zeros shouldn't be an issue for several months. But for the abuse filter which requires an admin to edit, it is likely better to be less restrictive - as long as no abuse is seen.

Reply Edited 11:57, 3 February 2024 3 months ago

Matěj Suchánek (talkcontribs)

I will have it run without warning for some time. (Remind me if I forget to reinstate that.) Oops, I didn't realize the warning was still enabled, only the filter had been turned off. Thanks for testing, I guess the filter is safe.

In my opinion (and experience), the problems with setting filters for individual properties are

(As you said) You need an admin to modify the filter, especially in situations when the format changes.
They are getting more complex (due to the diff structure) when you want to cover 100% cases (modification with qualifier addition, etc.).
We cannot really cover every property.

But let me see.

Reply 17:47, 3 February 2024 2 months ago

CV213 (talkcontribs)

"[0-9]{15}[0-9X]" is defined in an ISO standard, and used in millions of links, I don't expect that format to change soon.

Finally I found mw:Extension:AbuseFilter/Rules_format which contains regex/rlike. Maybe something like:

& string(removed_lines) regex "[0-9]{15}[0-9X]"
&! string(added_lines) regex "[0-9]{15}[0-9X]"

But I don't know if it would prevent removal of a claim. Then one would have to test if string(added_lines) is not empty.

Reply Edited 18:57, 3 February 2024 2 months ago

Matěj Suchánek (talkcontribs)

Okay, I changed the filter to be more restrictive, yet robust.

But I don't know if it would prevent removal of a claim. It wouldn't, there is a check for edit summary.

Reply 19:04, 3 February 2024 2 months ago

CV213 (talkcontribs)

Thank you, much better protection against changes away from correct format now. Not sure about "novalue" and "somevalue".

Insertions of format violating strings are still possible.

Reply 11:33, 5 February 2024 2 months ago

Matěj Suchánek (talkcontribs)

Not sure about "novalue" and "somevalue". The current regex ([0]{7}[0-9]{8}[0-9X]|) matches an empty string, this is how "somevalue/novalue allowed" is indicated.

Reply 17:03, 11 February 2024 2 months ago

CV213 (talkcontribs)

I changed the regex to ([0]{7}[0-9]{8}[0-9X]). Better they end up in the CV reports. I have seen some of these claims, but they had no qualifier or reference.

Reply 22:32, 11 February 2024 2 months ago

CV213 (talkcontribs)

https://www.wikidata.org/w/index.php?title=Q19750837&diff=prev&oldid=2071715235 wrong format, same ID in non-spaced format already there. 10 hours some minutes later bot first re-formatted, then removed, because it is duplicated. Value appears at https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P213&oldid=2072720311#%22Format%22_violations

Reply 16:54, 9 February 2024 2 months ago

Matěj Suchánek (talkcontribs)

I am not fully convinced we really need a filter because of that. (Imagine the report was generated after the bot run. Imagine they duplicated the statement right away, without spaces.)

But I gave it a try. What's strange, though, I was able to make it catch this edit, this edit, but not this edit. The filter reads the present, valid ISNI...

Reply Edited 17:04, 11 February 2024 2 months ago

CV213 (talkcontribs)

The list of today in the CV report: https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P213&oldid=2074639676 - I would prefer to give the spaced ISNI inserter at least a warning. I didn't look into the items yet. The http://isni.org/isni/0000000097580195 I have seen already at User:DeltaBot/fixClaims/maintenance/P213format - not sure why DeltaBot lists the spaced ones.

The last item has the spaced ISNI because of mixnmatch, maybe some of the others too https://www.wikidata.org/w/index.php?title=Q124489180&oldid=2074561219. Reported to Magnus: Topic:Xyr5b6zcavxhideq. All other tools I know of are fixed now.

Reply 22:28, 11 February 2024 2 months ago

Teslaton (talkcontribs)

Hi Matěj. Regarding Special:AbuseLog/28575880: am I missing something? It seems to be perfectly valid ISNI code (https://isni.org/isni/0000000423486330) and I've tried both compact (0000000423486330) and goruped (0000 0004 2348 6330) format, both leading to a filter hit. Any idea?

(edit: ok, so it went away later eventually , although I'm not aware that I would have changed anything... :D)

Reply Edited 01:36, 21 February 2024 2 months ago

Matěj Suchánek (talkcontribs)

You had a right-to-left mark as the last character of the input.

Reply 08:46, 21 February 2024 2 months ago

Teslaton (talkcontribs)

Yeah, indeed, good point! It can't be seen in the rendered diff (and actually, at first glance, not much even in the dump itself... :D). Thanks.

Reply 08:59, 21 February 2024 2 months ago

Arlo Barnes (talkcontribs)

Could the warning link to this thread? Its not clear in the current text what is inappropriate about the with-spaces version.

Reply 03:15, 25 February 2024 2 months ago

Matěj Suchánek (talkcontribs)

Customized warning: MediaWiki:Abusefilter-disallowed-isni.

Reply 09:21, 25 February 2024 2 months ago

Reply to "ISNI format abuse filter 110"

Make exception in abuse filter please

3 comments • 13:44, 10 February 2024 2 months ago

3

Swpb (talkcontribs)

Can you please modify Special:AbuseFilter/87 to allow the property to be used as a value of Wikidata property (P1687) on marriage location (Q124222019)? Thanks. Swpb (talk) 15:41, 11 January 2024 (UTC)

15:41, 11 January 2024 3 months ago

Matěj Suchánek (talkcontribs)

Yes, Special:AbuseFilter/history/87/diff/prev/2015. Sorry for the inconvenience!

10:12, 13 January 2024 3 months ago

Swpb (talkcontribs)

Thanks!

18:02, 13 January 2024 3 months ago

MatSuBot: preserve labels

One comment • 12:08, 10 February 2024 2 months ago

1

Pommée (talkcontribs)

In French please preserve (simple dames), (double dames), (simple messieurs) and (double messieurs). Pommée (overleg) 12:08, 10 February 2024 (UTC)

Reply Edited 12:08, 10 February 2024 2 months ago

Reply to "MatSuBot: preserve labels"

Bot is adding Russian labels not written in Cyrillic

2 comments • 13:45, 29 January 2024 3 months ago

2

Summary by Koavf

Looks like it's conventional to not Cyrillicize Latin names in Russian. Thanks Ymblanter.

Koavf (talkcontribs)

Maybe I'm just dumb but additions of Russian labels like this: https://www.wikidata.org/w/index.php?title=Q120401703&diff=prev&oldid=2064574184 are Russian labels written in the Latin alphabet, but Russian is written in Cyrillic. Is there something I'm missing here or is the bot publishing inaccurate labels?

13:22, 29 January 2024 3 months ago

Ymblanter (talkcontribs)

Russian is of course written in Cyrillic but some things including most names of music albums never get translated/transliterated and just appear in Latin. You can check that some articles of the Russian Wikipedia just have Latin names. (On the other hand, books and films usually get translated).

13:31, 29 January 2024 3 months ago

AbuseFilter 39

6 comments • 07:23, 15 January 2024 3 months ago

6

KaiKemmann (talkcontribs)

Hej Matej,

what is the rationale behind [[Special:AbuseFilter/39]]?

Was there a discussion about it or what is the usual procedure when new filters are established?

best regards,

~~~~

Reply 16:03, 2 December 2023 5 months ago

Matěj Suchánek (talkcontribs)

Hej! The rationale behind Special:AbuseFilter/39 is buried somewhere in archives. Wikidata does not support links to userpages, even though it is technically possible to save such a link (that's why the filter exists). They are also considered invalid by the notability policy (along with other kinds of pages).

Reply 16:41, 2 December 2023 5 months ago

KaiKemmann (talkcontribs)

I found the most recent discussion, although not the previous ones. It seems to be mostly a case of "personal preference" as not many reasons were given.

Could I ask you to look at [Commons:Administrators' noticeboard/Archive 94 - Wikimedia Commons this discussion] referring to this topic at commons and tell me what you think?

thank you,

~~~~

Reply 11:33, 9 December 2023 4 months ago

KaiKemmann (talkcontribs)

PS

What's wrong with the usual Wiki-formatting on this page?

Reply 11:35, 9 December 2023 4 months ago

Matěj Suchánek (talkcontribs)

If you want to create an item for c:Category:Raimund_Liebert, just go ahead. But links to user pages are not supported by Wikidata (by rules, technically they are, this is what the abuse filter was made for).

As for formatting on this page, you just need to switch to wikitext in the bottom right corner.

Reply 16:45, 9 December 2023 4 months ago

KaiKemmann (talkcontribs)

Thanks again for your explanation.

I noticed now that links between user pages are handled quite differently on the various Wikimedia projects (and most often don't appear at all).

It would be nice to be able to switch between them (and a user's commons category, in case it exists) easily and consistently on all project pages. As there apparently is a consensus against doing this through Wikidata it would probably have to be implemented through the Mediawiki software. Another story again ..

~~~~

Reply 13:58, 10 December 2023 4 months ago

Reply to "AbuseFilter 39"

Jak změnit pořadí již vypsaných položek u vlastnosti?

2 comments • 07:23, 15 January 2024 3 months ago

2

Kusurija (talkcontribs)

Například u hesla Němen je řada položek přítoků, které nejsou seřazeny ani abecedně, ani podle říčního kilometru ústí (který nb ani nebývá uveden). Poznámka: z přítoků Němenu jsem vyhodil Miniji, protože ta není přítokem Němenu, ale Atmaty (jedno z vedlejších ramen delty Němenu; ani jedno z četných ramen delty Němenu nenese název Němen, tudíž všechny přítoky za první bifurkací již nejsou přítoky Němenu. (Ale jsou v povodí Němenu). Lze u přítoku uvést více, než jedno povodí (například velké přítoky Němenu mají již v kódu hydrologického pořadí vlastní první dvě číslice jejich povodí, které je součást povodí Němenu).

Reply Edited 20:23, 1 December 2023 5 months ago

Matěj Suchánek (talkcontribs)

Někteří uživatelé na to používají různé skripty, kterými lze pořadí měnit, ale já jsem proti tomu, protože na Wikidatech neexistuje nic jako pořadí dat (není nijak garantováno). To si určuje až konzument dat.

Jo a všiml jsem si, že do vlastnosti povodí (P4614) se vyplňuje přímo položka povodí. Tedy např. povodí Němenu (Q13370189), a ne Němen (Q5622). (Viz tento vykřičník.)

Reply 09:01, 2 December 2023 5 months ago

Reply to "Jak změnit pořadí již vypsaných položek u vlastnosti?"

Shodné s (P - property)

8 comments • 07:23, 15 January 2024 3 months ago

8

Kusurija (talkcontribs)

Bylo by třeba založit takovouto property. Pro shodné/totožné objekty, pojmenované za různých okolností různě. Vzhledem k těmto různým okolnostem i příslušné související objekty souviseji jen s jedním z pojmenování a nikoliv s druhým (pro shodný objekt). Například řeka Akmena se na dolním toku nazývá Danė. Tato řeka má (mnoho) přítoky, jedny se vlévají do Akmeny a jiné do Danė. Pokud zadám že řeka X ústí do..., nebude pravda, když použiji odkaz na název odlišný (nepravdivé tvrzení), než jaký je název v místě soutoku.

Reply 22:28, 1 December 2023 5 months ago

Matěj Suchánek (talkcontribs)

údajně totéž co (P460), rozdílné od (P1889), částečně se kryje s (P1382). Nicméně pokud je to ta samá řeka, jenom se jí v různých částech říká jinak, to tvrzení o přítoku není špatně z pohledu dat. Určitě bych kvůli tomu nezakládal novou položku. Myslím, že k tomu jsou určené vlastnosti platí pro název pojmenovaného subjektu (P5168), resp. platí pro název hodnoty (P8338).

Reply 09:07, 2 December 2023 5 months ago

Kusurija (talkcontribs)

platí pro název pojmenovaného subjektu (P5168) A postup je jaký? Založím/založený štítek s názvem řeky. Aktivuji property ústí do (P403) a vložím co? nejprve platí pro název pojmenovaného subjektu (P5168) nebo Qxy nadřazeného toku (pokud tak, jak a kam platí pro název pojmenovaného subjektu (P5168) a co dál?)? A v onom nadřazeném toku Qxy jak vymezím odkud pokud platí název "A" a odkud pokud platí název "B" (a odkud pokud platí název "C"). Například k tomu mám k dispozici říční kilometr: "A platí od pramene do 53,6 říčního kilometru" a "B platí od 53,6 říčního kilometru až do ústí". Eventuelně "A platí od pramene do soutoku s řekou Qwz" a "B platí od soutoku s řekou Qwz až do ústí"?

Reply Edited 10:28, 2 December 2023 5 months ago

Matěj Suchánek (talkcontribs)

Myslím, že nejlepší bude to ukázat na příkladu. Která z těchto řek se vlévá do které "identity"?

Reply 10:25, 2 December 2023 5 months ago

Kusurija (talkcontribs)

Šlaveita do Akmeny (Akmena, 50,0 řkm), Eketė do Danė (synonymum: Dangė nikoliv Akmena, 16,2 řkm); lépe se podívat do tabulky přítoků v článku w:cs:Danė. V bodě soutoku Akmeny s Tenžė (27,1 řkm) se mění název na Danė (synonymum: Dangė). Právě proto jsem se jinde ptal, jakým způsobem přesunout položky přítoků na WD.

Reply 10:44, 2 December 2023 5 months ago

Matěj Suchánek (talkcontribs)

Navrhoval jsem něco takového: Q3512881#P974, Q10407818#P403. Sice tam svítí upozornění, ale to bych pro ukázku ignoroval. Pokud by se to mělo začít používat daleko víc, dá se to časem upravit tady.

Reply 13:43, 2 December 2023 5 months ago

Kusurija (talkcontribs)

Jen v Litvě odhaduji tak 300 případů, celosvětově to budou stovky tisíc (zejména mnoho v Japonsku, tam mívají až deset různých názvů pro různé úseky téhož toku (pochopitelně se svými přítoky). A ano, upravit to by bylo hezké, ale já se na to zatím necítím - ostatně bych si mohl svým neumětelstvím vykoledovat nepřízeň až postih od kolegů.

Reply Edited 14:19, 2 December 2023 5 months ago

Kusurija (talkcontribs)

Dalším problémem je Mūša-Lielupe. Jde o tutéž řeku, podobně jako Akmena-Danė s různými názvy, ale např. na cs.wp první byla násilně a přes protesty rozdělena (na některých jinýcch wikipediích nikoliv), ta druhá rozdělena není (díkybohu). Proto by pro tu první bylo vhodné zavést "Shodné s" (P - property). U obou řek je situace totožná (různé názvy, tentýž tok), ale osud článku/-ů odlišný. Další příklad: Jara-Šetekšna, atd., atd.

Reply 11:03, 2 December 2023 5 months ago