User talk:Matěj Suchánek

Jump to navigation Jump to search

About this board

Archives of very old discussions are available:

Data Consolidation Officer (talkcontribs)

Hi Matěj Suchánek, you probably know this, but in most Wikipedias (including the German one), additions in brackets are used for mere disambiguation and shouldn’t be part of the label.

Your bot doesn’t appear to take this into account, though; here, for example, the label should just have been “Tare” (without the “(Würzsauce)” part). Would it be possible to change the bot so that it strips trailing additions in brackets from sitelinks when creating labels? --Data Consolidation Officer (talk) 10:35, 21 April 2024 (UTC)

Matěj Suchánek (talkcontribs)

Hi. Yes, I have been aware of this ever since I started these imports. Unfortunately, it's complicated, there are also many cases where the trailing brackets should be kept. Most recently: (German Wikipedia). More: .

Therefore, my strategy is to import the label with the disambiguator and then I apply some rules to determine if it's redundant.

In your case, I kept it because there was no German description. The idea is to motivate people to insert a description while also fixing the label manually. If there was a German description, I would remove it if either "Würzsauce" was found in the description or most other languages with labels starting with "Tare" didn't include any disambiguator.

Data Consolidation Officer (talkcontribs)

Yes, that seems like a reasonable heuristic. I don’t agree that the bracketed part should have been kept in District 1 (Düsseldorf) (Q551600) (Düsseldorf is already present in the description and that’s sufficient, imho), but if that’s the consistent handling of Düsseldorf’s districts, then it’s OK for now. I’d still say that such cases are outliers rather than being usual, at least for German Wikipedia. Yet, I can’t come up with a better solution for now, at least where descriptions are missing (automatically importing those is obviously not so easy).

Reply to "Label additions by MatSuBot"
EncycloPetey (talkcontribs)

Your bot just made a HUGE number of edits like this one that go against agreed upon standards at WikiProject:Books. For all of the instance of (P31) usages, the value should instead be version, edition or translation (Q3331189).

For example Prometheus Bound (Q24063711) is a translation, but it's also an edition of that translation. Likewise, Śakoontalá; or, The Lost Ring (Q51107450) is both a translated text and is the fourth edition of that translation.

I'm not sure how many of these are the result of incorrect vlues being added in the past, but version, edition or translation (Q3331189) is now the agreed standard for instance of (P31). --EncycloPetey (talk) 05:43, 13 April 2024 (UTC)

Matěj Suchánek (talkcontribs)
EncycloPetey (talkcontribs)

The merge is correct for the two items, but its use was not. Part of the reason for the merge was its frequent misuse on multiple data items.

Reply to "translated text"
CV213 (talkcontribs)

Special:AbuseFilter/history/110 - you created it. In 2023-12 ISNI format changed to no spaces. Can you adjust the filter and re-enable?

Will it only give a warning when changing from no-space-format to space-format? If yes, can there also be made another filter that warns users not to insert ISNI with spaces?

Matěj Suchánek (talkcontribs)

I flipped that condition. I will have it run without warning for some time. (Remind me if I forget to reinstate that.)

I don't think it's necessary to have another filter for that. We have bots automatically fixing that, or we can also extend that filter.

CV213 (talkcontribs)

Thank you, I tried it, it works!

Can it catch more, any change away from correct regex? See .

Regarding warning when adding it in spaced format:

If people don't get a warning, they may go on forever, causing avoidable edits by bots. On top, they are less likely to notice the creation of duplicates, which can be detected if the ISNI is already on another item. This may result in extended work on an item and work of an uninvolved editor checking duplicates.

The primary source has the ISNI easily available without spaces, people that add ISNI manually should always check the primary source anyway.

There is also unnecessary clutter on DB CV reports (example) interfering with the work of editors checking the diffs and working on removing violations, not necessarily these violations, but any.

CV213 (talkcontribs)

The regex for ISNI is /[0-9]{15}[0-9X]/. In P213 the current regex is /[0]{7}[0-9]{8}[0-9X]/, which is probably OK for some time, as position 8 in new ISNI is currently "5" so requiring 7 zeros shouldn't be an issue for several months. But for the abuse filter which requires an admin to edit, it is likely better to be less restrictive - as long as no abuse is seen.

Matěj Suchánek (talkcontribs)

I will have it run without warning for some time. (Remind me if I forget to reinstate that.) Oops, I didn't realize the warning was still enabled, only the filter had been turned off. Thanks for testing, I guess the filter is safe.

In my opinion (and experience), the problems with setting filters for individual properties are

  • (As you said) You need an admin to modify the filter, especially in situations when the format changes.
  • They are getting more complex (due to the diff structure) when you want to cover 100% cases (modification with qualifier addition, etc.).
  • We cannot really cover every property.

But let me see.

CV213 (talkcontribs)

"[0-9]{15}[0-9X]" is defined in an ISO standard, and used in millions of links, I don't expect that format to change soon.

Finally I found mw:Extension:AbuseFilter/Rules_format which contains regex/rlike. Maybe something like:

& string(removed_lines) regex "[0-9]{15}[0-9X]"
&! string(added_lines) regex "[0-9]{15}[0-9X]"

But I don't know if it would prevent removal of a claim. Then one would have to test if string(added_lines) is not empty.

Matěj Suchánek (talkcontribs)

Okay, I changed the filter to be more restrictive, yet robust.

But I don't know if it would prevent removal of a claim. It wouldn't, there is a check for edit summary.

CV213 (talkcontribs)

Thank you, much better protection against changes away from correct format now. Not sure about "novalue" and "somevalue".

Insertions of format violating strings are still possible.

Matěj Suchánek (talkcontribs)

Not sure about "novalue" and "somevalue". The current regex ([0]{7}[0-9]{8}[0-9X]|) matches an empty string, this is how "somevalue/novalue allowed" is indicated.

CV213 (talkcontribs)

I changed the regex to ([0]{7}[0-9]{8}[0-9X]). Better they end up in the CV reports. I have seen some of these claims, but they had no qualifier or reference.

CV213 (talkcontribs)
Matěj Suchánek (talkcontribs)

I am not fully convinced we really need a filter because of that. (Imagine the report was generated after the bot run. Imagine they duplicated the statement right away, without spaces.)

But I gave it a try. What's strange, though, I was able to make it catch this edit, this edit, but not this edit. The filter reads the present, valid ISNI...

CV213 (talkcontribs)

The list of today in the CV report: https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P213&oldid=2074639676 - I would prefer to give the spaced ISNI inserter at least a warning. I didn't look into the items yet. The http://isni.org/isni/0000000097580195 I have seen already at User:DeltaBot/fixClaims/maintenance/P213format - not sure why DeltaBot lists the spaced ones.

The last item has the spaced ISNI because of mixnmatch, maybe some of the others too https://www.wikidata.org/w/index.php?title=Q124489180&oldid=2074561219. Reported to Magnus: Topic:Xyr5b6zcavxhideq. All other tools I know of are fixed now.

Teslaton (talkcontribs)

Hi Matěj. Regarding Special:AbuseLog/28575880: am I missing something? It seems to be perfectly valid ISNI code (https://isni.org/isni/0000000423486330) and I've tried both compact (0000000423486330) and goruped (0000 0004 2348 6330) format, both leading to a filter hit. Any idea?

(edit: ok, so it went away later eventually , although I'm not aware that I would have changed anything... :D)

Matěj Suchánek (talkcontribs)
Teslaton (talkcontribs)

Yeah, indeed, good point! It can't be seen in the rendered diff (and actually, at first glance, not much even in the dump itself... :D). Thanks.

Arlo Barnes (talkcontribs)

Could the warning link to this thread? Its not clear in the current text what is inappropriate about the with-spaces version.

Matěj Suchánek (talkcontribs)
Reply to "ISNI format abuse filter 110"

Make exception in abuse filter please

3
Swpb (talkcontribs)
Matěj Suchánek (talkcontribs)
Swpb (talkcontribs)

Thanks!

Pommée (talkcontribs)

In French please preserve (simple dames), (double dames), (simple messieurs) and (double messieurs). Pommée (overleg) 12:08, 10 February 2024 (UTC)

Reply to "MatSuBot: preserve labels"

Bot is adding Russian labels not written in Cyrillic

2
Summary by Koavf

Looks like it's conventional to not Cyrillicize Latin names in Russian. Thanks Ymblanter.

Koavf (talkcontribs)
Ymblanter (talkcontribs)

Russian is of course written in Cyrillic but some things including most names of music albums never get translated/transliterated and just appear in Latin. You can check that some articles of the Russian Wikipedia just have Latin names. (On the other hand, books and films usually get translated).

KaiKemmann (talkcontribs)

Hej Matej,

what is the rationale behind [[Special:AbuseFilter/39]]?


Was there a discussion about it or what is the usual procedure when new filters are established?


best regards,

~~~~

Matěj Suchánek (talkcontribs)

Hej! The rationale behind Special:AbuseFilter/39 is buried somewhere in archives. Wikidata does not support links to userpages, even though it is technically possible to save such a link (that's why the filter exists). They are also considered invalid by the notability policy (along with other kinds of pages).

KaiKemmann (talkcontribs)

I found the most recent discussion, although not the previous ones. It seems to be mostly a case of "personal preference" as not many reasons were given.


Could I ask you to look at [Commons:Administrators' noticeboard/Archive 94 - Wikimedia Commons this discussion] referring to this topic at commons and tell me what you think?


thank you,


~~~~

KaiKemmann (talkcontribs)

PS

What's wrong with the usual Wiki-formatting on this page?

Matěj Suchánek (talkcontribs)

If you want to create an item for c:Category:Raimund_Liebert, just go ahead. But links to user pages are not supported by Wikidata (by rules, technically they are, this is what the abuse filter was made for).

As for formatting on this page, you just need to switch to wikitext in the bottom right corner.

KaiKemmann (talkcontribs)

Thanks again for your explanation.

I noticed now that links between user pages are handled quite differently on the various Wikimedia projects (and most often don't appear at all).

It would be nice to be able to switch between them (and a user's commons category, in case it exists) easily and consistently on all project pages. As there apparently is a consensus against doing this through Wikidata it would probably have to be implemented through the Mediawiki software. Another story again ..


~~~~

Reply to "AbuseFilter 39"

Jak změnit pořadí již vypsaných položek u vlastnosti?

2
Kusurija (talkcontribs)

Například u hesla Němen je řada položek přítoků, které nejsou seřazeny ani abecedně, ani podle říčního kilometru ústí (který nb ani nebývá uveden). Poznámka: z přítoků Němenu jsem vyhodil Miniji, protože ta není přítokem Němenu, ale Atmaty (jedno z vedlejších ramen delty Němenu; ani jedno z četných ramen delty Němenu nenese název Němen, tudíž všechny přítoky za první bifurkací již nejsou přítoky Němenu. (Ale jsou v povodí Němenu). Lze u přítoku uvést více, než jedno povodí (například velké přítoky Němenu mají již v kódu hydrologického pořadí vlastní první dvě číslice jejich povodí, které je součást povodí Němenu).

Matěj Suchánek (talkcontribs)

Někteří uživatelé na to používají různé skripty, kterými lze pořadí měnit, ale já jsem proti tomu, protože na Wikidatech neexistuje nic jako pořadí dat (není nijak garantováno). To si určuje až konzument dat.

Jo a všiml jsem si, že do vlastnosti povodí (P4614) se vyplňuje přímo položka povodí. Tedy např. povodí Němenu (Q13370189), a ne Němen (Q5622). (Viz tento vykřičník.)

Reply to "Jak změnit pořadí již vypsaných položek u vlastnosti?"
Kusurija (talkcontribs)

Bylo by třeba založit takovouto property. Pro shodné/totožné objekty, pojmenované za různých okolností různě. Vzhledem k těmto různým okolnostem i příslušné související objekty souviseji jen s jedním z pojmenování a nikoliv s druhým (pro shodný objekt). Například řeka Akmena se na dolním toku nazývá Danė. Tato řeka má (mnoho) přítoky, jedny se vlévají do Akmeny a jiné do Danė. Pokud zadám že řeka X ústí do..., nebude pravda, když použiji odkaz na název odlišný (nepravdivé tvrzení), než jaký je název v místě soutoku.

Matěj Suchánek (talkcontribs)
Kusurija (talkcontribs)

platí pro název pojmenovaného subjektu (P5168) A postup je jaký? Založím/založený štítek s názvem řeky. Aktivuji property ústí do (P403) a vložím co? nejprve platí pro název pojmenovaného subjektu (P5168) nebo Qxy nadřazeného toku (pokud tak, jak a kam platí pro název pojmenovaného subjektu (P5168) a co dál?)? A v onom nadřazeném toku Qxy jak vymezím odkud pokud platí název "A" a odkud pokud platí název "B" (a odkud pokud platí název "C"). Například k tomu mám k dispozici říční kilometr: "A platí od pramene do 53,6 říčního kilometru" a "B platí od 53,6 říčního kilometru až do ústí". Eventuelně "A platí od pramene do soutoku s řekou Qwz" a "B platí od soutoku s řekou Qwz až do ústí"?

Matěj Suchánek (talkcontribs)

Myslím, že nejlepší bude to ukázat na příkladu. Která z těchto řek se vlévá do které "identity"?

Kusurija (talkcontribs)

Šlaveita do Akmeny (Akmena, 50,0 řkm), Eketė do Danė (synonymum: Dangė nikoliv Akmena, 16,2 řkm); lépe se podívat do tabulky přítoků v článku w:cs:Danė. V bodě soutoku Akmeny s Tenžė (27,1 řkm) se mění název na Danė (synonymum: Dangė). Právě proto jsem se jinde ptal, jakým způsobem přesunout položky přítoků na WD.

Matěj Suchánek (talkcontribs)

Navrhoval jsem něco takového: Q3512881#P974, Q10407818#P403. Sice tam svítí upozornění, ale to bych pro ukázku ignoroval. Pokud by se to mělo začít používat daleko víc, dá se to časem upravit tady.

Kusurija (talkcontribs)

Jen v Litvě odhaduji tak 300 případů, celosvětově to budou stovky tisíc (zejména mnoho v Japonsku, tam mívají až deset různých názvů pro různé úseky téhož toku (pochopitelně se svými přítoky). A ano, upravit to by bylo hezké, ale já se na to zatím necítím - ostatně bych si mohl svým neumětelstvím vykoledovat nepřízeň až postih od kolegů.

Kusurija (talkcontribs)

Dalším problémem je Mūša-Lielupe. Jde o tutéž řeku, podobně jako Akmena-Danė s různými názvy, ale např. na cs.wp první byla násilně a přes protesty rozdělena (na některých jinýcch wikipediích nikoliv), ta druhá rozdělena není (díkybohu). Proto by pro tu první bylo vhodné zavést "Shodné s" (P - property). U obou řek je situace totožná (různé názvy, tentýž tok), ale osud článku/-ů odlišný. Další příklad: Jara-Šetekšna, atd., atd.

Reply to "Shodné s (P - property)"

Would appreciate your input on deleting of data about Peter C. Gøtzsche's website

1
Fabius byle (talkcontribs)
Reply to "Would appreciate your input on deleting of data about Peter C. Gøtzsche's website"