Property talk:P11292

From Wikidata
Jump to navigation Jump to search

Documentation

[create Create a translatable help page (preferably in English) for this property to be included here]
Format “[^/]+\.[^/]+: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P11292#Format, SPARQL
Scope is as main value (Q54828448): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P11292#Scope, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P11292#Entity types
Single best value: this property generally contains a single value. If there are several, one would have preferred rank (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P11292#single best value, SPARQL

Move the section into a qualifier[edit]

I propose removing the section from the format constraint. Instead, a qualifier should be required, using a required qualifier constraint (Q21510856), resulting in statements in the form cd (Q283438)man page (P11292)cdsection, verse, paragraph, or clause (P958)1posix.

This would allow importing content from sources where a manpage is indicated without an explicit section (this is the case for most manpage links in tldr-pages, for example, and manned.org is able to resolve them to a default section — from their about page, we can see that the section is optional, and "if no section is given and multiple sections are available, the lowest section number is chosen". Of course, we should prefer complete data, but partial (section-less) info is nevertheless valuable, and we shouldn't discard it (the manned.org pages in tldr-pages, in particular, are added manually, so a human will have checked that they resolve to the right page — it's not a URL built mechanically from the command name).

Decoupling the section from the format would also allow matching the data to sources that use different ways to indicate the section, e.g. man/<section>/<page> or man<section>/<page> or <page>(<section>).

Pinging the participants in the property proposal discussion: @Push-f, Jsamwrites, Laftp0, Dexxor. --Waldyrious (talk) 23:58, 13 December 2022 (UTC)[reply]

 Oppose For many reasons:
  1. formatter URL (P1630) cannot use values of qualifiers, so the man pages could not longer be linked reliably to a specific section (without implementing property specific logic)
  2. The headings within man pages are called sections (according to the DESCRIPTION section of man.7), so using section, verse, paragraph, or clause (P958) for the section number would mean we could no longer use it to reference a section within a man page but it would also be too ambiguous (which section is meant?) ... so we would have to introduce a "man page section" property.
  3. I do not consider section-less man page references to be valuable enough to warrant importing them or supporting them because doing so would make both data consumption as well as data entry more difficult. I'd rather have us create a mix'n'match catalog via the distro-specific package identifiers, see #Matching software instances to their man pages.
  4. Strings in that format can absolutely be matched to "<page>.<section>" ... it just requires a tiny bit of logic ...
--Push-f (talk) 03:35, 14 December 2022 (UTC)[reply]
Fair enough, good points. I have an OpenRefine project with ~200 entries manually matched between tldr-pages and Wikidata, but most of the man page references don't contain a section number. I wonder if it would be possible to automatically obtain the resolved section that manned.org produces, and then use the specific link in a data import. WDYT? --Waldyrious (talk) 23:16, 14 December 2022 (UTC)[reply]

Matching software instances to their man pages[edit]

For instances of software (Q7397) with a distro-specific package identifier such as Debian stable package (P3442) we could find the related man pages via the files contained in the distribution package, for example:

We of course don't want to do an HTTP request for every package so we would instead download the Contents index files from the distributions, e.g:

I think the idea would be to write a bot that downloads all Contents files from all distributions and then downloads all QID to package name mappings from the WDQS and then attempts to match that data. Sidenote: Comparing the contents of packages across distributions could even let us add missing package identifier claims because if e.g. ADebian stable package (P3442)B and the Arch Linux package X contains the same files as the package B then we can also infer AArch Linux package (P3454)X ... though this would take a bit more effort than simply inferring man pages.

--Push-f (talk) 05:21, 14 December 2022 (UTC)[reply]

Allowing the property to be used as a reference in conjunction with reference URL[edit]

I intentionally restricted the scope of the property to main statements because just the page name is too ambiguous for a reference because man pages can vary greatly from distribution to distribution and version to version. However I just had the idea that we could allow man page (P11292) to be used as a reference in addition to reference URL (P854), for example:

reference URL (P854)https://manned.org/man/debian-wheezy/doc/manpages/3.44-1/cpuid.4 could be extended with:

Ideally these additional reference properties could be added automatically by a bot. However:

And the problem with this is that I don't think property constraints currently let us express "this property may be used as a reference but only in conjunction with these other properties". And I explicitly do not want to allow man page (P11292) to be used as a reference without a reference URL (P854) or software version identifier (P348) to be used as a reference without man page (P11292). So I think we have to table this till WikibaseQualityConstraints (Q54812269) supports such constraints.

--Push-f (talk) 04:13, 14 December 2022 (UTC)[reply]