Help talk:Property constraints portal/Format

From Wikidata
Jump to navigation Jump to search

trailing pipes for novalues in format as a regular expression (P1793)[edit]

There are plenty format as a regular expression (P1793) qualifiers on this constraint with a trailing pipe character (|, query). KrBot interprets this as a situation where novalue claims are allowed, which it does not if the trailing pipe is missing.

To my experience, the gadget behaves a little differently and accepts novalues without the pipe. Is this correct? Should this be harmonized? Should it be descibed on this page? Side question: what about somevalue snaks?

@Ivan A. Krestinin,

Lucas Werkmeister (WMDE)
Jarekt - mostly interested in properties related to Commons
MisterSynergy
John Samuel
Sannita
Yair rand
Jon Harald Søby
Pasleim
Jura
PKM
ChristianKl
Sjoerddebruin
Fralambert
Manu1400
Was a bee
Malore
Ivanhercaz
Peter F. Patel-Schneider
Pizza1016
Ogoorcs
ZI Jony
Eihel
cdo256
Epìdosis
Dhx1
99of9
Mathieu Kappler
Lectrician1
SM5POR
Infrastruktur

Notified participants of WikiProject property constraintsMisterSynergy (talk) 13:48, 31 May 2018 (UTC)[reply]

@MisterSynergy: yes, WikibaseQualityConstraints currently doesn’t check the format constraint on unknown value Help and no value Help at all. --Lucas Werkmeister (WMDE) (talk) 10:49, 1 June 2018 (UTC)[reply]
KrBot represents both somevalue and novalue as empty string. This allows to reject these values if needed. — Ivan A. Krestinin (talk) 23:49, 1 June 2018 (UTC)[reply]
Thanks. To my understanding, unknown value Help and no value Help are always permitted, and obviously they do not need to be checked for format issues. In that sense, we do not need a mechanism to reject them. Do you know of a situation where this does not hold? Anyway, I’d like to suggest to consider both as not violating any format constraint in any situation, which would also mean that we could remove all trailing pipes once your bot was able to ignore them while checking formats. Thoughts? —MisterSynergy (talk) 07:24, 2 June 2018 (UTC)[reply]
Not sure. Should we really have "unknown" or "novalue" with every conceivable identifier on items?
--- Jura 07:29, 2 June 2018 (UTC)[reply]
Well, there are always situations where a given item is expected to have a particular identifier, but for whatever reason there is none. no value Help is a perfect way to indicate this fact—there are ~6000 such cases. This does not indicate bad data quality, and I think it also does not need to be permitted explicitly. unknown value Help is rather uncommon for identifiers. —MisterSynergy (talk) 07:44, 2 June 2018 (UTC)[reply]
Isn't it "always" the same identifier? [1]. It does seem helpful for the ones with massive uses, but not necessarily for the ones with less than 300 potential uses.
--- Jura 08:04, 2 June 2018 (UTC)[reply]
I don’t think so. Typically these cases are exceptions, so only a relatively small amount of items is affected by it. An absolute limit seems inapplicable anyway since the absolute numbers of uses are very different.
Besides this I know that users tend to remove no value Help snaks when they are listed on Ivan’s covi lists, as they often don’t know about the undocumented trailing pipe hack. Formally the trailing pipe is not correct anyway, as it allows empty strings. It only works because we cannot add empty strings, so we can re-interpret it as no value Help or unknown value Help. —MisterSynergy (talk) 08:12, 2 June 2018 (UTC)[reply]
Well, exception handling isn't really optimal under the current system. Some users tend to think that every constraint needs to be set to mandatory. As for the input format of constraints, that's yet another question. BTW, for the main results on the above queries (imdb, viaf), afaik, these are set as people checked them and found entries to be absent.
--- Jura 08:32, 2 June 2018 (UTC)[reply]
somevalue and novalue should always be allowed options for all identifiers. Nobody should remove these snaks only because they are showing up on a report. It is natural that we don't have an all-encompassing and complete knoweldge of the world, and for these reason somevalue and novalue were introduced. --Pasleim (talk) 08:48, 2 June 2018 (UTC)[reply]
I can't think of a meaningful use of "somevalue" for VIAF and imdb. Why should we add that to hundreds or thousands of items?
--- Jura 08:52, 2 June 2018 (UTC)[reply]
Unlike mentioned by me earlier, there are substantial numbers of unknown value Help snaks for identifiers: 1870 results. Whether they make sense or not is beyond my knowledge. Anyway, this is a separate issue as the trailing pipe hack does not really help us to prevent those cases if we consider them undesired. —MisterSynergy (talk) 09:00, 2 June 2018 (UTC)[reply]
I came across a user that systemically set a series of properties to "unknown" when they created an item for a living person (e.g. place of birth, nationality, date of birth, etc.) Maybe some do the same for identifiers. It seems we are lacking a way to disallow "unknown" while allowing novalue.
--- Jura 09:07, 2 June 2018 (UTC)[reply]
Correct. There is some uncertainty about unknown value Help anyway: some users use it “because they checked thoroughly and didn’t find any (publicly available) information” and others use it only when “explicit consensus in sources/literature/etc. is that this information is generally unknown”. —MisterSynergy (talk) 09:13, 2 June 2018 (UTC)[reply]

regular expression syntax parameter[edit]

@Dhx1: Where does this regular expression syntax (P4240) parameter that you added (diff) come from? I’m certain WikibaseQualityConstraints doesn’t support it; do you know if it’s supported by KrBot? --Lucas Werkmeister (WMDE) (talk) 16:20, 3 August 2020 (UTC)[reply]

@Dhx1: But this should be discussed on Wikidata talk:WikiProject property constraints, in my opinion, and if everyone agrees, then the parameter should be added to the documentation – and even then, only with a note about what it actually does. Right now, I think the documentation gives the impression that regular expressions can be specified in any number of formats and it’ll somehow work, when that really isn’t the case. (I also have other reservations about this proposal, but I’ll hold those for the discussion on the project talk page.) --Lucas Werkmeister (WMDE) (talk) 10:54, 5 August 2020 (UTC)[reply]
@Lucas Werkmeister (WMDE): Agreed that further discussion is necessary. I've added a discussion thread at Property_talk:P1793#Mandatory_qualifier_regular_expression_syntax_(P4240)? and reverted the changes to the Help page in the interim. --Dhx1 (talk) 15:26, 5 August 2020 (UTC)[reply]

Subject type constraint class missing on P1793?[edit]

The most common type triggering constraint violations when using format as a regular expression (P1793) is Wikidata property with datatype string that is not an external identifier (Q21099935), with 18 cases. This class contains 105 properties. Only 33 of them actually use format as a regular expression (P1793), however. It looks like a property class "Wikidata property with datatype string that uses regex-verified syntax" or something is missing. Should such a class be created, or could the subject type constraint be amended to include all of Wikidata property with datatype string that is not an external identifier (Q21099935) as well?

Regardless of what property class is chosen for this purpose, relative position within image (P2677) would probably like to be added to it. --SM5POR (talk) 20:56, 21 January 2023 (UTC)[reply]