Property talk:P8966

From Wikidata
Jump to navigation Jump to search

Documentation

URL match pattern
regex pattern of URL that an external ID may be extracted. Qualifier "URL match replacement value" can overwrite the default \1. Use non-capturing groups when needed "(?:www)?"
RepresentsUniform Resource Identifier (Q61694), regular expression (Q185612)
Has qualitycase sensitive (Q257869)
Data typeString
Allowed values.*(?!>\\)\((?!\?:).*|(?!.+\bwd\b).*
Example
According to this template:
  • IMDb ID (P345) → (one of multiple values) https:\/\/www\.imdb\.com\/(?:title|name|news)\/([a-z0-9]+)(\/.*)?
<replacement value> \1
<replacement value> \1
  • ISNI (P213)https?:\/\/www\.isni\.org\/(\d{4})(| |%20)(\d{4})(| |%20)(\d{4})(| |%20)(\d{4})
<replacement value> \1 \3 \5 \7
  • ZVG number (P679)http:\/\/gestis-en\.itrust\.de\/nxt\/gateway\.dll\/gestis_en\/0+([1-9]\d+)\.xml.*
<replacement value> \1
<replacement value> \1
<replacement value> \1:\3
<replacement value> \1
  • VIAF ID (P214) → (ignoring the s after http and ignoring www.) ^https?:\/\/(?:www\.)?viaf\.org\/viaf\/([1-9]\d(?:\d{0,7}|\d{17,20}))($|\/|\?|#)
<replacement value> \1
According to statements in the property:
ISBN-13 (P212)[a-z{2,}\.wikipedia)\.org\/wiki\/Special:BookSources\/(\d[\d\-]{15}\d)$ ^https?:\/\/(?:www\.wikidata|[a-z]{2,}\.wikipedia)\.org\/wiki\/Special:BookSources\/(\d[\d\-]{15}\d)$]
IATA airport code (P238)[A-Za-z{3}) ^https?:\/\/(?:www\.)?iata\.org\/en\/publications\/directories\/code-search\/\?airport\.search=([A-Za-z]{3})]
Library of Congress authority ID (P244)[4-9[0-9]|00|20[0-2][0-9])[0-9]{6}) ^https?:\/\/id\.loc\.gov\/authorities\/(?:(?:name|subject)s\/)?((?:n|nb|nr|no|ns|mp|sh)(?:[4-9][0-9]|00|20[0-2][0-9])[0-9]{6})]
Library of Congress authority ID (P244)[4-9[0-9]|00|20[0-2][0-9])[0-9]{6}) ^https?:\/\/lccn\.loc\.gov\/((?:n|nb|nr|no|ns|mp|sh)(?:[4-9][0-9]|00|20[0-2][0-9])[0-9]{6})]
When possible, data should only be stored as statements
Formatter URLhttps://regex101.com/?regex=$1
See alsoformatter URL (P1630), URL match replacement value (P8967), web page title extract pattern (P10999)
Lists
Proposal discussionProposal discussion
Current uses
Total6,914
Main statement6,89199.7% of uses
Qualifier220.3% of uses
Reference1<0.1% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Scope is as main value (Q54828448): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Scope, SPARQL
Item “formatter URL (P1630): Items with this property should also have “formatter URL (P1630)”. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Item P1630, search, SPARQL
Allowed entity types are Wikibase property (Q29934218): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Entity types
Format “.*(?!>\\)\((?!\?:).*|: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Format, SPARQL
Format “(?!.+\bwd\b).*: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Format, SPARQL
Required qualifier “web page title extract pattern (P10999): this property should be used with the listed qualifier. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#mandatory qualifier, SPARQL
Item “class of non-item property value (P10726): Items with this property should also have “class of non-item property value (P10726)”. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Item P10726, search, SPARQL
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P8966#Item P2302, search, SPARQL

Query[edit]

Here is how you get all the regexes with their property and replacement pattern--Shisma (talk) 09:10, 20 December 2020 (UTC)[reply]

SELECT ?p ?s ?r WHERE {
  ?stat ps:P8966 ?s.
  OPTIONAL { ?stat pq:P8967 ?r. }
  ?prop  p:P8966 ?stat.
  BIND(REPLACE(STR(?prop), 'http://www.wikidata.org/entity/', '')  AS ?p ).
} ORDER BY STRLEN(str(?s))
Try it!

Case sensitivity[edit]

@Shisma: I was thinking maybe we should make an additional qualifier for the regex specifying whether it should be treated as case sensitive. maybe something like has characteristic (P1552) case insensitive (Q55121297)? We could make all regex's case insensitive manually but that's a pain. I also would like to see properties marked like this (e.g. twitter usernames are case insensitive) but that probably requires a broader discussion. Thoughts? BrokenSegue (talk) 18:26, 4 January 2021 (UTC)[reply]

what is the concrete problem you are trying to solve? how about a qualifier for regex flags? --Shisma (talk) 18:45, 4 January 2021 (UTC)[reply]
@Shisma: yeah regex flags would also work but I'm not sure which ones other than "i" I want. I mean currently a lot of the URL match pattern (P8966) values are overly restrictive since domain names are not case sensitive (and often other parts of URLs). In many cases we really should be using the "i" flag but not in all cases. Further I cannot tell if this user's twitter handle is already linked if I don't know that twitter handle's are case insensitive. BrokenSegue (talk) 19:13, 4 January 2021 (UTC)[reply]

@Shisma: I went ahead and just implemented it in like this though there's still the issue with marking the regex's themselves as case-insensitive. BrokenSegue (talk) 03:52, 10 January 2021 (UTC)[reply]

@BrokenSegue, Shisma: You can use (?i) within the regular expression. While the standard regular expression in JS does not support it, the one meant to be canonical is the one defined in the SPARQL standard and
select ?s where {
  values ?s { "ab" "AB" "A" "c" "a" "B" }
  filter regex(?s, "(?i)a")
}
Try it!
does work. Those flags within the regex itself is already in other regular expression in Wikidata. --CamelCaseNick (talk) 20:48, 11 February 2021 (UTC)[reply]
@CamelCaseNick: "the one meant to be canonical is the one defined in the SPARQL" are you sure? I would be very unhappy if people used non-standard regex features in this property given I use it in javascript. I do notice lots of competing regex standards in use here. Maybe we need a qualifiier for this? BrokenSegue (talk) 20:52, 11 February 2021 (UTC)[reply]
@BrokenSegue: You are right, it is more complex than that: format as a regular expression (P1793) says PCRE and Help:Property constraints portal/Format says that KrBot uses PCRE while the Wikibase constraint system is based on the WDQS one, which is currently Java flavored regex and is with that does not fulfill the SPARQL standard which requires XPath 2.0 regex. I've skimmed the XPath 2.0 regex standard and couldn't find those flags within regex. The other ones support it. It would've been great to decide to only stick to the common denominator, but the choice has been made in favor of different definition(s). With qualifier, do you mean regular expression syntax (P4240) --CamelCaseNick (talk) 21:07, 11 February 2021 (UTC)[reply]
@CamelCaseNick: yeah I guess that's what I meant. didn't know it already existed. BrokenSegue (talk) 21:11, 11 February 2021 (UTC)[reply]

regex for URLs or after escaping?[edit]

Is the value meant to match the URL as it defined in RFC 3986 or an escaped variant? It seems still to be undecided as in the property proposal it was not discussed (and neither was it here) and the example for ISNI (P213) would match an escape space and the space character as if escaped before matching against the regular expression. And this query suggests why should discuss it:

select ?regex where {
  [wdt:P8966 ?regex]
  filter contains(?regex, "%")
}
Try it!

--CamelCaseNick (talk) 00:43, 16 February 2021 (UTC)[reply]

agreed this should be decided. optimally we would include both as regex's somehow and give preference to one. I think it really depends on what the property expects unfortunately. BrokenSegue (talk) 01:35, 16 February 2021 (UTC)[reply]
I am only using decoded urls so far. which actually is a problem is this scenario:
The Enemy Within (Q2984502)Fandom article ID (P6262)de.memory-alpha:Kirk_:_2_=_?
cannot be resolved on this url https://memory-alpha.fandom.com/de/wiki/Kirk_:_2_%3D_%3F
--Shisma (talk) 15:39, 16 February 2021 (UTC)[reply]
@Shisma: That is not exactly true. You still can do the unescaping afterwards – after matching. I never suggested to not do de-escaping. --CamelCaseNick (talk) 19:22, 16 February 2021 (UTC)[reply]
  • It's also occasionally a problem with values added with described at URL (P973) that should eventually be converted to property values.
    As the bot checks against the regex, such conversion wont happen, even when the same bot fixes values in the target property that include escaped characters.
    Practically, what would be needed is a function that de-escapes before P973 is applied. Other functions are some that convert values to lowercase or uppercase.
    I know this is not directly linked to P8966, but the same problem . @Ivan_A._Krestinin: --- Jura 17:06, 16 February 2021 (UTC)[reply]
@Jura1: The question is not if, but merely when to do the de-escaping. --CamelCaseNick (talk) 19:22, 16 February 2021 (UTC)[reply]
  • P973 conversions currently assumes it's done before, but KrBot (in some cases only) does it afterwards. Uppercasing/lowercasing might have the same problem. --- Jura 19:30, 16 February 2021 (UTC)[reply]

I've just started playing with this property for WD:EE, and have hit this issue. I guess the de-escaped versions look cleaner? Can all the cases on the query provided by User:CamelCaseNick be reduced to de-escaped versions without problems? --99of9 (talk) 13:27, 4 October 2021 (UTC)[reply]

applying changes to ids for stylistic reasons (?)[edit]

@UWashPrincipalCataloger: thinks at least 2 external ids (ERIC Thesaurus ID (P8539) and Discogs style ID (P9219)) should be altered from how they are used in URIs of the service providing the identifier. For example

https://www.discogs.com/style/progressive+metalDiscogs style ID (P9219)progressive metal (not progressive+metal)

I.e. any occurrence of a plus (+) should be replaced by a space ( ) because “The + is unnecessary”. This would unfortunately break URL match pattern (P8966) because, as of yet, it wouldn't handle arbitrary stylistic changes to the id. The resolution works only one way 😕.

@CamelCaseNick:, @BrokenSegue: do you think we should support arbitrary stylistic changes with a qualifier? --Shisma (talk) 08:02, 28 February 2021 (UTC)[reply]

@Shisma, Jura1: Rather than solving the conversion for adding statements here, I think a better approach would be to do it in two distinctly separated steps using Template:Autofix on the properties in question. This however only solves the problem, where this property is used to add statements for identifiers and not detect those to find the existing entries in Wikidata. I have no elegant approach that would make sure nobody would constantly add progressive+metal using Wikidata for Firefox to then trigger KrBot to replace it with progressive metal only to have KrBot then remove the duplicate statement. And another question for @Jura1: Could you maybe link here to the automatic conversion of described at URL (P973). Currently, for this property we don't state whether it is a first-party or third-party identifier. For the detection purpose the latter are of importance, but maybe we wouldn't want them for the former? --CamelCaseNick (talk) 18:47, 28 February 2021 (UTC)[reply]
  • To enable conversion, I changed the property constraint on P8939, then added an autofix to P973 that includes the "+" [2] and another one that removes "+" [3]. This were a bit silly if it wouldn't allow to keep adding everything with P973 and leave formatting to Ivan's Krbot. BTW, there is some discussion on project chat to figure out if it's P973 that should be used for that or P2888 and what the purpose of P973 actually is (oddly P973 always had autofixes whereas P2888 not). --- Jura 19:07, 28 February 2021 (UTC)[reply]
  • BTW, if the conclusion is that finally the "+" should be in the value, please change the autofix accordingly. --- Jura 19:21, 28 February 2021 (UTC)[reply]

I don't think there's a great solution here. If multiple non-identical identifiers map to the same value in a database our only options are to find a way to canonicalize on our end. There's a few options there.

  1. We could include all variants of the identifier and mark one as canonical/preferred using a bot
  2. We could add create a property that tells us how to canonicalize the property
    1. This could be done with a series of sed expressions
    2. This could be done with a series of character replacement expressions
    3. We could create a pseudo programming language using a series of ordered properties with values/qualifiers like "replace characters" / "a-z -> A-Z"?

At the end of the day we need some kind of means to express canonicalization logic (which could be turing complete t_t). What about wikilambda/wikifunctions? We could also consider just embedding a short program in a qualifier? It's a mess. Fortunately the easiest "case" is case sensitivity and we have that one covered. Maybe there's not that many other cases? BrokenSegue (talk) 23:45, 28 February 2021 (UTC)[reply]

Out of those, I like the idea of – not reinventing the wheel – sed expressions as a main statement on the target property and then suggest any data user of this property to run them and additionally the property template can include the Autofix for KrBot. That would lead to no duplicate content and potential mismatch. (Note: tools like Wikidata for Firefox trying to find matches based on the url can look up before and after the change.) @BrokenSegue, Shisma, Jura1: Do you see any drawbacks yet? --CamelCaseNick (talk) 04:06, 1 March 2021 (UTC)[reply]
I mean unfortunately sed is not standardized (BSD v. GNU versions) and most critically cannot be run from javascript/python where it would probably be most useful. Unless we wanna write a sed interpreter in js or somehow use a restricted subset? BrokenSegue (talk) 04:27, 1 March 2021 (UTC)[reply]
As I mostly rely on KrBot, I probably have too keep adding autofix templates in one way or the other. So for this property, feel free to do whatever suits you best. I think adding an item that describes a given conversion method (possibly including code) would be possible, it's something I ended up doing for number of words (P6570). --- Jura 11:10, 1 March 2021 (UTC)[reply]

@Uzume, ArthurPSmith, Shisma, BrokenSegue, Jura1, Lockal: WikiProject Properties has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Although I've participated in many prop proposals, I wasn't familiar with this prop.

@Btcprox: was so kind to explain its purpose to me: "URL match pattern is more for helping of automated scraping of IDs from URLs. It doesn't automatically convert a URL string into the corresponding ID if you input that URL directly into the property's field on Wikidata. However, there is potential for tools to automatically extract the proper IDs to append to the items. Currently right now, Wikidata for Web (Q99894727) uses this to a great degree."

I thought there's got to be a strong link between URL match pattern (P8966) and formatter URL (P1630), and the latter should be usable to generate (a draft version of) the former. Here is a sample of 50 rows:

select ?x ?xLabel ?fmt ?re ?pat {
  ?x wikibase:propertyType wikibase:ExternalId.
  ?x wdt:P1630 ?fmt; p:P2302/pq:P1793 ?re
  filter not exists {?x wdt:P8966 []}
  bind(replace(?fmt,"([^a-zA-Z0-9$])","\\\\$1") as ?fmt1) # escape regex-special chars
  bind(replace(?re,"\\\\","\\\\\\\\") as ?re1) # double up backslashes
  bind(replace(?fmt1,"\\$1",concat("(",?re1,")")) as ?pat)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} limit 50
Try it!

If several ?fmt exist for the same prop, I think we should use all of them, not only the best-ranked ?fmt .

Here are some counts:

select (count(*) as ?props) (sum(?pat) as ?pats) (sum(?fmt_re) as ?fmt_res) {
  ?x wikibase:propertyType wikibase:ExternalId
  bind(if(exists {?x wdt:P8966 []},1,0) as ?pat)
  bind(if(exists {?x wdt:P1630 []} && exists {?x p:P2302/pq:P1793 []},1,0) as ?fmt_re)
}
Try it!

Of 5997 ExternaId props, 5097 have formatter+regex but only 817 have pattern. So we can generate 84% more patterns --Vladimir Alexiev (talk) 08:59, 5 April 2021 (UTC)[reply]

I would refrain from adding trivially computable statements (because they don't add anything new - they are just a burden and all editors should update them to keep in sync). Entity Explosion already knows how to build these expressions. Also there are 2 important things:
  1. "wd" letters in formatter usually mean that a part of URL contains a random slug (non-conventional). There are 88 of such formatters.
  2. A lot of URLs created with URL formatters redirect to other canonical addresses. The most common pattern is https://example.com/artist/123 -> https://example.com/artist/123/joe-smith. In some cases link depends on host language. It is possible to find such properties with curl, but there is no generic formula for all 5097 statements. --Lockal (talk) 09:22, 5 April 2021 (UTC)[reply]
  1. @Lockal, Salgo60, 99of9: Then "wd" should be replaced with a minimal non-greedy non-capturing group: (?.*?)
  2. Is it possible to suggest such computed URL match pattern (P8966) value when a property is being edited, and ask the editor to check it? --Vladimir Alexiev (talk) 06:54, 12 April 2021 (UTC)[reply]
Isn't the minimal non-capturing group: (?:.*?) If so, I've made this change in one property, and it seems to be working for my purposes. --99of9 (talk) 23:05, 4 October 2021 (UTC)[reply]
As far as I can see values for this statement can be derived easily from other existing statements. This is what User:BrokenSegue probably did in their recent batch where they added P8966 to 1000s of properties. Having this property looks like unnecessary maintenance burden indeed. 2001:7D0:81DA:F780:412D:66E4:AA02:A795 08:22, 14 September 2021 (UTC)[reply]

You are mistaken. This property is needed. BrokenSegue (talk) 14:34, 14 September 2021 (UTC)[reply]

Mistaken about what? I'm sure you can make use of this property. The question rather is why not instead make use of other properties that were already there. 2001:7D0:81DA:F780:890E:3DF4:4520:31C8 18:01, 14 September 2021 (UTC)[reply]
as was explained above you can not perfectly infer the match pattern from the formatter string. BrokenSegue (talk) 18:58, 14 September 2021 (UTC)[reply]

Could we get some "walk thoughs" examples in the documentation section[edit]

I spotted a "quick win" for P5297 where its just extracting one variable from the middle of a URL, could we have a step-by-step example for simple ones like that? I tried and couldn't work it out for myself! Back ache (talk) 08:00, 10 September 2021 (UTC)[reply]

there are some tutorials around the web. But i can figure this out for you if you like. —-Shisma (talk) 08:38, 10 September 2021 (UTC)[reply]

Not escaping plus's when passing through to regex101[edit]

There is a nice feature in URL match pattern where if you click on the regex it holds, it passes you over to regex101's debugger, however when doing so it does not escape the plus-symbol in the URL it creates so a valid piece of regex show as invalid in regex101

Presumably its formatter URL (P1630) that need altering to do a regex substitution of [+] with %2b but I don't know how to do that

Back ache (talk) 08:54, 14 September 2021 (UTC)[reply]

Blue versus black text[edit]

I have just created the url matches for stock ticker P249, as you'd expect several sites use it so I have made several entrys, initally only the first is in blue, the others black (they are all blue now), what does this mean? Back ache (talk) 11:28, 9 December 2021 (UTC)[reply]

Regex303?[edit]

What is this URL and why is it not working? Is it a mirror of https://regex101.com or is it a spam link? Pinging @Tatupiplu: who changed to this formatter URL. Regards Kirilloparma (talk) 18:50, 23 December 2021 (UTC)[reply]

See also other properties using this URL. Regards Kirilloparma (talk) 19:03, 23 December 2021 (UTC)[reply]
doesn't really explain why it was changed... BrokenSegue (talk) 21:41, 23 December 2021 (UTC)[reply]
Hi, I found this site on the web. It looks like one site's of the developer. If you think it doesn't work as intented. you can feel free to revert the edit to the previous one. Thanks :) -Tatupiplu (talk) 05:18, 24 December 2021 (UTC)[reply]
Alright, thanks for the clarification. Regards Kirilloparma (talk) 22:00, 24 December 2021 (UTC)[reply]

Items needing URL match adding[edit]

This SPARQL needs refing so as to filter out troublesome URLs suchas those that need transformation (using something on toolforge) but it'll do for now

select ?x ?xLabel  ?fmt {
  ?x wikibase:propertyType wikibase:ExternalId.
  ?x wdt:P1630 ?fmt; 
  filter not exists {?x wdt:P8966 []}
  FILTER (!regex(?fmt, "web.archive.org","i"))
  FILTER (!regex(?fmt, "toolforge.org","i"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} limit 50
Try it!

Back ache (talk) 09:25, 1 February 2022 (UTC)[reply]

very helpful 👌 – Shisma (talk) 07:33, 14 September 2022 (UTC)[reply]

Choose your wildcards with a world view[edit]

Just had to fix a URL match I created because I found the site in questions used acccents, ths meant using \w as a wildcard failed to pickup Télécoms Sans Frontières instead I switched to using [^\/] (anything but a forward-slash) and that did the trick.

Back ache (talk) 14:29, 11 February 2022 (UTC)[reply]

I think that’s legit👌 Shisma (talk) 17:48, 12 February 2022 (UTC)[reply]

Strictly speaking it should be in CAPS[edit]

Have come across this a few times, the unique ID itself should be in capitals but the websites that use it don't care, for example IATA airport code (P238) should be three uppercase letters, however "flight radar" uses lowercase https://www.flightradar24.com/airport/bqh

How can we make sure we store it in the case it strictly should be in, whilst making sure it matches all sites?

I have asked on the discussion page of "URL match replacement value" (P8967) to see if it could do the conversion (for storage) but also need to find a way to let things like "Wikidata for Web" (Q99894727) know to do a case-insensitive comparison.


Back ache (talk) 02:01, 9 March 2022 (UTC)[reply]

The way I have found to do it is in the property itself so for P238 in the earlier example what is set is "has quality" (P1552) "all caps" (Q3960579)
What's not clear is what the other case-related values should be, for example should it be "lowercase" Q4444253 or "lowercase text" Q65048529
Back ache (talk) 11:19, 11 November 2022 (UTC)[reply]

url encode/decode[edit]

I was looking into creating a pattern for "Wolfram Language entity code" (P4839) but it uses encoded values, how can we convert these for use/storage by wikidata

typical URL

https://www.wolframalpha.com/input?i=Entity%5B%22MusicWork%22%2C+%22Imagine%3A%3A462b2%22%5D

what wikidata needs to see Entity["MusicWork", "Imagine::462b2"]

The pattern I first tried with

^https:\/\/www.wolframalpha.com\/input\?i=(Entity%5B.+%5D)

My hack-y way around it

^https:\/\/www.wolframalpha.com\/input\?i=Entity%5B%22(.+)%22%2C\+%22(.+)%3A%3A462b2%22%5D

(with a URL match replacement value of)

Entity["\1", "\2"]

Is there a better, less hack-y way of doing it?

Back ache (talk) 11:10, 6 June 2022 (UTC)[reply]

Why is platform not a valid qualifier?[edit]

In an entry with a lot of these (for example Hashtag), logging which pattern coresponds to which platform is useful (as it isn't always obvious), so why is platform flagged as not a valid qualifier? Back ache (talk) 07:56, 14 October 2022 (UTC)[reply]

I don't think platform really makes sense here. What use case are you imagining? Seems niche. Also this isn't really how the platform property is used across wikidata (mainly used for video games). BrokenSegue (talk) 19:51, 14 October 2022 (UTC)[reply]
Makes sense, so using hashtag as an example again, the 3rd part formatter section uses "Operator (P137)" for the same purpose, so perhaps if we can hunt down the use of "platform" for this, we can switch them over to operator, I wonder if there is a way to put hint text against the error message, so instead of saying "don't use platform" we say "don't use platform, use operator" instead Back ache (talk) 13:46, 19 October 2022 (UTC)[reply]

replacing underscores from captured ID's[edit]

What would be the best way to replace underscores (an encoded space) from an ID extracted from a URL

For example extracting the ID from

https://commons.wikimedia.org/wiki/Category:Refrigerator_magnets

is easy enough, but we need the result to be "Refrigerator magnets" not "Refrigerator_magnets" for it to match up with Q121569

Back ache (talk) 10:14, 10 November 2022 (UTC)[reply]

probably some kind of qualifier which the application would need to know how to interpret. BrokenSegue (talk) 20:08, 10 November 2022 (UTC)[reply]

Language[edit]

I've added language of work or name (P407) as allowed qualifiers constraint (Q21510851), because of this. If there are any problems, please reply in the other thread. Horcrux (talk) 09:32, 3 January 2023 (UTC)[reply]

Revision "‎Undo revision 1807927904 by Lectrician1 (talk): Nobody is going to create and maintain 5000 items for each external identifier with URL match pattern " by Lockal[edit]

@User:Lockal Revision The purpose of the constraint is so that this progressively happens. Nobody has to fix all 5000 constraints at once. This is similar to how we've been progressively adding stability of property value (P2668) to properties. Lectrician1 (talk) 15:29, 13 January 2023 (UTC)[reply]

stability of property value (P2668) is different: first of all, it was added as suggestion constraint (Q62026391), but more important, it does not create a modeling burden. As I can see, you have already started creating items like VK ID (Q116153597), but they serve no purpose, other than to be a burden. There won't be a page in Wikipedia for such items, nobody is going to link to such items, except for same-named properties, models are not invented for such items. If you attempt to invent a model for VK ID (Q116153597), you will quickly discover, that it has exactly everything that is already present in VK ID (P3185). That's why I'm strongly against of this constraint. Lockal (talk) 15:51, 13 January 2023 (UTC)[reply]
@Lockal It was decided that we should create items for all external identifiers. Please see the reasoning here: Wikidata:Project chat/Archive/2022/12#Creation of items for all external identifier identifers Lectrician1 (talk) 18:45, 15 January 2023 (UTC)[reply]
Ouch... That's an interesting discussion, haven't seen that one. I've reverted myself, but marked with "suggestion constraint". --Lockal (talk) 19:48, 15 January 2023 (UTC)[reply]