Property talk:P11292
Documentation
man page that describes the subject
List of violations of this constraint: Database reports/Constraint violations/P11292#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P11292#Scope, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P11292#Entity types
List of violations of this constraint: Database reports/Constraint violations/P11292#single best value, SPARQL
Move the section into a qualifier[edit]
I propose removing the section from the format constraint. Instead, a qualifier should be required, using a required qualifier constraint (Q21510856), resulting in statements in the form cd (Q283438)man page (P11292)cd
This would allow importing content from sources where a manpage is indicated without an explicit section (this is the case for most manpage links in tldr-pages, for example, and manned.org is able to resolve them to a default section — from their about page, we can see that the section is optional, and "if no section is given and multiple sections are available, the lowest section number is chosen". Of course, we should prefer complete data, but partial (section-less) info is nevertheless valuable, and we shouldn't discard it (the manned.org pages in tldr-pages, in particular, are added manually, so a human will have checked that they resolve to the right page — it's not a URL built mechanically from the command name).
Decoupling the section from the format would also allow matching the data to sources that use different ways to indicate the section, e.g. man/<section>/<page> or man<section>/<page> or <page>(<section>).
Pinging the participants in the property proposal discussion: @Push-f, Jsamwrites, Laftp0, Dexxor. --Waldyrious (talk) 23:58, 13 December 2022 (UTC)
- Oppose For many reasons:
- formatter URL (P1630) cannot use values of qualifiers, so the man pages could not longer be linked reliably to a specific section (without implementing property specific logic)
- The headings within man pages are called sections (according to the DESCRIPTION section of man.7), so using section, verse, paragraph, or clause (P958) for the section number would mean we could no longer use it to reference a section within a man page but it would also be too ambiguous (which section is meant?) ... so we would have to introduce a "man page section" property.
- I do not consider section-less man page references to be valuable enough to warrant importing them or supporting them because doing so would make both data consumption as well as data entry more difficult. I'd rather have us create a mix'n'match catalog via the distro-specific package identifiers, see #Matching software instances to their man pages.
- Strings in that format can absolutely be matched to "<page>.<section>" ... it just requires a tiny bit of logic ...
- --Push-f (talk) 03:35, 14 December 2022 (UTC)
- Fair enough, good points. I have an OpenRefine project with ~200 entries manually matched between tldr-pages and Wikidata, but most of the man page references don't contain a section number. I wonder if it would be possible to automatically obtain the resolved section that manned.org produces, and then use the specific link in a data import. WDYT? --Waldyrious (talk) 23:16, 14 December 2022 (UTC)
Matching software instances to their man pages[edit]
For instances of software (Q7397) with a distro-specific package identifier such as Debian stable package (P3442) we could find the related man pages via the files contained in the distribution package, for example:
- ripgrep (Q91665055)Debian stable package (P3442)ripgrep → https://packages.debian.org/stable/amd64/ripgrep/filelist →
/usr/share/man/man1/rg.1.gz
→ man page (P11292)rg.1 - ripgrep (Q91665055)Ubuntu package (P3473)ripgrep → https://packages.ubuntu.com/lunar/amd64/ripgrep/filelist →
/usr/share/man/man1/rg.1.gz
→ man page (P11292)rg.1 - ripgrep (Q91665055)Arch Linux package (P3454)ripgrep → https://archlinux.org/packages/community/x86_64/ripgrep/files/json/ →
usr/share/man/man1/rg.1.gz
→ man page (P11292)rg.1
We of course don't want to do an HTTP request for every package so we would instead download the Contents
index files from the distributions, e.g:
- https://ftp.debian.org/debian/dists/stable/contrib/Contents-all.gz for Debian
- https://mirror.webworld.ie/ubuntu/dists/bionic/Contents-amd64.gz for Ubuntu
- https://geo.mirror.pkgbuild.com/extra/os/x86_64/extra.files.tar.gz for Arch Linux
I think the idea would be to write a bot that downloads all Contents files from all distributions and then downloads all QID to package name mappings from the WDQS and then attempts to match that data. Sidenote: Comparing the contents of packages across distributions could even let us add missing package identifier claims because if e.g. ADebian stable package (P3442)B and the Arch Linux package X contains the same files as the package B then we can also infer AArch Linux package (P3454)X ... though this would take a bit more effort than simply inferring man pages.
--Push-f (talk) 05:21, 14 December 2022 (UTC)
Allowing the property to be used as a reference in conjunction with reference URL[edit]
I intentionally restricted the scope of the property to main statements because just the page name is too ambiguous for a reference because man pages can vary greatly from distribution to distribution and version to version. However I just had the idea that we could allow man page (P11292) to be used as a reference in addition to reference URL (P854), for example:
reference URL (P854)https://manned.org/man/debian-wheezy/doc/manpages/3.44-1/cpuid.4 could be extended with:
- publisher (P123)Debian (Q7715973)
- Debian source package (P9128)manpages
- software version identifier (P348)3.44-1
- man page (P11292)cpuid.4
Ideally these additional reference properties could be added automatically by a bot. However:
- Debian source package (P9128) would need to be expanded to allow usage in the reference scope
- software version identifier (P348) would need to be expanded to allow usage in the reference scope
- man page (P11292) would need to be expanded to allow usage in the reference scope
And the problem with this is that I don't think property constraints currently let us express "this property may be used as a reference but only in conjunction with these other properties". And I explicitly do not want to allow man page (P11292) to be used as a reference without a reference URL (P854) or software version identifier (P348) to be used as a reference without man page (P11292). So I think we have to table this till WikibaseQualityConstraints (Q54812269) supports such constraints.