Wikidata:Property proposal/name (string)
name (string)
[edit]Originally proposed at Wikidata:Property proposal/Generic
Description | name that identifies the subject (different from P2561, which has the data type "Monolingual text", this property has the data type "string") |
---|---|
Data type | String |
Domain | items |
Example 1 | keywords: WHERE (Q2668363) → "WHERE", LIMIT (Q12032401) → "LIMIT", etc. |
Example 2 | HTML tag names: h1 (Q94098812) → "h1", form (Q5863890) → "form", etc. |
Example 3 | HTTP header names: Do Not Track (Q692672) → "DNT", HTTP/1.1 Upgrade header (Q5636136) → "Upgrade" |
Example 4 | domain names: .com (Q159371) → "com", .org (Q32131) → "org", example.com (Q306656) → "example.com", etc. |
Example 5 | files: hosts file (Q1149007) → "/etc/hosts" |
Example 6 | commands: yes (Q306574) → "yes" |
Example 7 | functions: strlen (Q1952837) → "strlen" |
Example 8 | hooks: SkinVectorStyleModules (Q21675668) → "SkinVectorStyleModules" |
See also | name (P2561), short name (P1813) |
Motivation
[edit]Keywords in programming languages, tags in SGML/XML/HTML, headers in HTTP or emails and domain names are all identified by a unique name.
We currently do not have a dedicated property for such strings. Instead the current practice appears to be to use short name (P1813), which is wrong for two reasons:
- semantics: These strings often aren't short names e.g. "SELECT" is not a short name and neither is "form" or "Upgrade".
- data type: short name (P1813) has the data type "monolingual text" meaning a language has to be associated with the value. This does not make any sense for these strings, e.g. "h1" and "strictfp" are certainly not English and strictly speaking neither is "Referer", which is a misspelling of the English word "referrer".
Note that sometimes no linguistic content (Q22282939) is used as a language value for name (P2561), however using a plain old string is still preferable for data consumers compare:
{"datavalue":{"value":{"text":"WHERE","language":"zxx"},"type":"monolingualtext"},"datatype":"monolingualtext"}
with
{"datavalue":{"value":"WHERE","type":"string"},"datatype":"string"}
while data consumers might not know what "language": "zxx"
means, using a plain old string is obvious.
I therefore propose the introduction of a new "name (string)" property of data type string so that we can avoid this misuse of name (P2561)/short name (P1813). Aliases for this new property would be:
- tag name
- header name
- keyword name
- has tag name
- has header name
- has keyword name
This new property would be the superproperty of Unicode character name (P9382).
--Push-f (talk) 05:44, 8 November 2022 (UTC)
Discussion
[edit]- WikiProject Informatics has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.. --Push-f (talk) 06:01, 8 November 2022 (UTC)
- Support but is it true that these keywords can only have a single string value? At least with some languages shortened versions of keywords are also understood in some cases... ArthurPSmith (talk) 19:31, 8 November 2022 (UTC)
- While I cannot think of such an example (and would be curious to see some), I'd argue that these are in fact separate keywords with the same semantics, so I think it would make sense to model them as separate keyword (Q1072684) instances. --Push-f (talk) 19:50, 8 November 2022 (UTC)
- I agree about the distinct semantics and data types, but I don't think the single-value constraint is true or even necessary to warrant a new property, so I would suggest dropping "unique" from the property label. I even find "name" a bit misleading, as it suggests something alphabetic. Is "H1" really a name of anything? I'd rather call it a "label" or a "token", as the latter could even be completely non-alphabetic, such as "(*" (the alternative begin-comment delimiter in some Pascal variants, equivalent to "{" in standard Pascal).
- Would extending the property to cover also non-alphabetic string tokens interfere with its intended use? SM5POR (talk) 08:57, 17 November 2022 (UTC)
- I have dropped the single-value constraint and renamed the property accordingly. In the computer domain names can certainly contain contain numbers and sometimes even special characters or arbitrary Unicode characters. And yes according to the HTML specification "h1" is a tag name.
- I don't consider tokens such as "(*" or "{" to be in the scope of this proposal (besides they probably don't meet WD:N).
- --Push-f (talk) 12:00, 17 November 2022 (UTC)
- Citing your description, "name that uniquely identifies the subject within its class", why is it important that the name (or string) uniquely identifies the subject? Isn't the unique identity implied already by the subject itself, such as the .org (Q32131) top-level domain? In contrast to the item, the string "org" actually isn't unique but may be written in lowercase, uppercase ("ORG"), or any of a number of mixed-case variants ("Org", "oRg" etc which are all valid but often normalized to one form or another).
- While letter case is insignificant in the domain name system, this is not so in many programming languages, so you should probably add a qualifier indicating how letter case is interpreted, and you may want to tell which form of several is this unique name.
- Can you provide an example showing why you couldn't use this property as planned if the string value isn't truly unique?
- Do you consider using this property as a qualifier in certain contexts, say, sortedinput set (P1851)boolean data type (Q520777)
unique name"reverse" or will it be applicable only as a main value on an item belonging to the domain(s) mentioned? Why such limited use? SM5POR (talk) 11:20, 17 November 2022 (UTC) - I have also dropped the domain constraint since as you pointed out the property is really of general use. The property is only meant to be used as a main value. --Push-f (talk) 12:04, 17 November 2022 (UTC)
- I'm afraid we are talking past each other. I still don't see what you need the property for, other than replacing current use of short name (P1813), the purpose of which is also a bit unclear. Is it really only meant to say that the programming language keyword "WHERE" is spelled "WHERE" when it appears in a computer program? Then I can almost understand why short name (P1813)} has been used, because that limited use certainly doesn't warrant the creation of a new property.
- But if we broaden its purpose to "how to express a particular concept in a programming language", regardless of whether that concept is a reserved word, an operator, a function name, a variable name, a pre-processor directive, or a comment, then it begins looking somewhat useful, say, an AI engine could match various strings in a computer program with these concepts and determine what programming language has been used, what the program is supposed to do.
- The value data type of the proposed property is string, not Wikibase item.
- Notability is however for items, not strings. The in-line program comment (Q1141067) class is apparently notable enough to warrant a new item, and a programming language item may then contain statements explaining how to write function definitions, external library imports, and in-line comments, say
- Pascal (Q81571)has part(s) (P527)program comment (Q1141067)
string token"(*" - which is similar to what I suggested in the Wikidata talk:WikiProject Informatics discussion regarding function parameter keywords, as I hadn't then yet read the proposal above in detail. In order to allow this, you need to expand the subject type of the property to include functions and programming languages, and also to let it appear as a qualifier to indicate what part of the function call or language it's used to express. SM5POR (talk) 13:33, 17 November 2022 (UTC)
- I have also dropped the domain constraint since as you pointed out the property is really of general use. The property is only meant to be used as a main value. --Push-f (talk) 12:04, 17 November 2022 (UTC)
- While I cannot think of such an example (and would be curious to see some), I'd argue that these are in fact separate keywords with the same semantics, so I think it would make sense to model them as separate keyword (Q1072684) instances. --Push-f (talk) 19:50, 8 November 2022 (UTC)
- Question Is this just official name (P1448) for concepts? "Official" understood as regulated by a standard or specification. --Tinker Bell ★ ♥ 03:10, 9 November 2022 (UTC)
- Kind of. But note that e.g. the Upgrade HTTP header is always referred to as "Upgrade header" by the official RFC but the value of the proposed property should of course still only be "Upgrade". Also note that the intended scope also includes domain names which usually aren't regulated by a standard or specification, e.g. if we considered the domain name "wikidata.org" to be notable enough to get its own Wikidata item it would be eligible for this property with the value "wikidata.org".
- I have added the following note to the proposed description: "(and is defined in either a technical specification (use P1343 or P973 to state where) or a name system)". I hope that clarifies things. Push-f (talk) 06:10, 9 November 2022 (UTC)
- I would say that a token should not include multiple sub-tokens, but constitute an atomic lexical unit. Therefore, "wikidata.org" would not be a token of its own, but rather a concatenation of the three tokens "wikidata", ".", and "org".
- I would drop the requirement that the keyword/token is defined in a technical specification or a name system. In a computer program, every function or variable name defined becomes a token that is recognized by the compiler along with any words or strings reserved by the language specification. SM5POR (talk) 08:59, 17 November 2022 (UTC)
- I have dropped that requirement, so that the property can also be used for function and file instances.
- The property is meant for names, while names are often tokens they don't have to be, case in point "example.com" is a domain name. Domain names can have many levels we certainly don't want "foo.bar.baz.example.com" to require 9 values, because that would be a nightmare to edit and to consume, besides Wikibase does not even let you rearrange values by default. --Push-f (talk) 12:10, 17 November 2022 (UTC)
- Seriously, how many domain names are notable enough to have their own items? But I can drop that suggestion if the label isn't going to be "token" anyway. Instead I'd like to know how you intend to use this property for any practical purpose when its only value will be inherent in the description of the item it's attached to, and therefore static. By excluding it from being used as a qualifier, you eliminate its potential use for identifying anything that doesn't already have its own (notable) item, such as a named function parameter.
- And please pick a more distinctive label than "name (string)". I previously considered "computer keyword", but adding system files and sample domains makes that a bit awkward. Maybe the latter two could get their own filename and domain name properties, to indicate what context they belong in (such as enclosing them in quotes to distinguish them from regular keywords in a program)? SM5POR (talk) 15:14, 17 November 2022 (UTC)
- Oppose "data consumers" are too stupid for understanding what language codes mean seem to me like a bad reason for creating a new property. Having two different name properties that differ in the data type means it's more confusing for users to choose which one to pick. The Wikidata team is also working on 'mul' which might make sense to use in this context. ChristianKl ❪✉❫ 10:41, 20 November 2022 (UTC)
- Withdrawn Fair enough. --Push-f (talk) 01:16, 21 November 2022 (UTC)