Wikidata talk:WikiProject Data Quality/Issues/P642

From Wikidata
Jump to navigation Jump to search

Qualifier value class

[edit]

I'm somewhat confused as to how the Qualifier value class column is supposed to be used in the different tables; is it descriptive in trying to cover current usage of of (P642), or is it prescriptive in defining how the replacement qualifier should be used?

In any case, the query added to the TOPS-20 (Q1481861) example is wrong as it doesn't catch the two use cases given in that item; the identified parts of this software are both classes only, not instances of classes, and the instance of (P31) property path components should thus be removed. I would also argue that this is the typical situation in most software, as the individual parts of a (named) program seldom come with their own names and identities. Thus Microsoft Word (Q11261) comes with its own graphical user interface (Q782543), but that interface doesn't have its own name (as far as I know) and is therefore unlikely to have its own item in Wikidata. Therefore instance of (P31) should be an optional part of the property path if it's included at all, and I doubt it will see much use.

Since nobody but me seems to have been using of (P642) with programmer (P943) it's a bit of a fringe example anyway, but I suppose it could serve as a model for similar properties such as author (P50), composer (P86), librettist (P87), or performer (P175), where several people of the same profession may have contributed different parts (some named, others unnamed) of a major creative work such as a movie, an encyclopedia, or a work of architecture. --SM5POR (talk) 08:27, 16 February 2022 (UTC)[reply]

The tables are both descriptive and prescriptive: the idea is to capture current uses and map them to replacements. Each row is meant to capture all uses for which the proposed handling is appropriate, and (more importantly) none for which it is not. This is easier said than done, and help with the use case definitions and corresponding queries is very much appreciated. Swpb (talk) 16:18, 16 February 2022 (UTC)[reply]
Thank you for the clarification. I added a table for creator (P170) with subproperties, aiming at handling them more or less the same for consistency. However, I seem to easily get timeouts when author (P50) is included, so I have split some of the queries to process them separately. I'd also like to make programmer (P943) a subproperty of (P1647) developer (P178) to include it in the creator (P170) tree. Are there any particular precautions or community interactions that should be observed before playing around with subproperty of (P1647)? --SM5POR (talk) 09:42, 17 February 2022 (UTC)[reply]
Thanks for the new table! Yeah, the timeout limit is tough; I like your solution. I'm not aware of any issues around subproperty of (P1647), but I'm probably the wrong person to ask. Swpb (talk) 20:40, 17 February 2022 (UTC)[reply]

Help page

[edit]

I was going to fix the has use (P366) claim for the BARK (Q4384247) computer which I added some time ago, and ended up wondering how I should express calculation (Q622821) of ballistics (Q184631) (a specific application) without using the of (P642) qualifier. So, I went to the Help:Qualifiers#Examples help page section and found the suggestion ... of (P642) (a specific place for the position of mayor).

Obviously, we ought to fix the documentation first. But which qualifier to use in this case then? Is there a list of generic qualifiers somewhere, or do we have to create one as part of this effort? --SM5POR (talk) 19:18, 25 February 2022 (UTC)[reply]

Searching the Property namespace for advisory (?) mentions of "P642" currently yields 197 hits. They will have to be replaced, too. --SM5POR (talk) 04:17, 28 February 2022 (UTC)[reply]

Statistics

[edit]

These complex queries keep timing out, so I started digging around to see what could be done about it. I get the impression that there are quite a few redundant qualifiers that could simply be dropped. Also, there seems to be less variation among the qualifier values than among the properties, which suggests it may be more efficient to search for common values across property space in order to identify the largest sets of qualified statements and get them out of our way.

I therefore added two initial table sections, one to obtain statistics to find out where the bulk of the redundancy is, and one for processing those redundancies. When that has been done, I hope it will be easier to develop more complex queries and begin assigning replacement qualifiers for the remaining cases.

You bring out a shovel in order to remove that annoying piece of rock in your front lawn, and end up uncovering the top of the bedrock beneath. Like, some 10,000 claims of "Category:Films directed by NN" and qualified by film directed by this person (Q29017630). Some items even have two copies of the same claim. --SM5POR (talk) 21:49, 27 February 2022 (UTC)[reply]

Well, the qualifiers for related category (P7084) perhaps technically aren't redundant, but the constructs do look a bit clumsy, and anyhow getting 13,000+ statements out of the way (preferrably eliminate duplicate claims first) would probably ease things up a little. Maybe position held (P39) or subject has role (P2868)? --SM5POR (talk) 04:02, 28 February 2022 (UTC)[reply]
@Swpb: Fixing related category (P7084) is likely going to be a major deal, and I have notified the parties involved in the discussion at Property talk:P7084 to hopefully obtain their feedback and cooperation. --SM5POR (talk) 12:02, 28 February 2022 (UTC)[reply]
I was wrong; related category (P7084) is hardly to blame for the timeouts. Instead, it's P2215 (P2215) (look at the statistics section). Was Wikidata really intended for this..? --SM5POR (talk) 21:17, 1 March 2022 (UTC)[reply]
@Swpb: I have constrained the redundant nationality qualifier destined for removal to subproperties of creator (P170) again, to avoid catching things like Celine Dion winning the Eurovision Song Contest for Switzerland while being a Canadian citizen (hypothetical claim not actually found), or similar situations at sporting events. I also enumerated the properties and classes implied, to work around the timeout issue. Until the timeouts are gone, I can't even analyze the remaining claims fully, so I hope some of the bulkier issues (see examples mentioned above or listed in the statistics section) can be resolved early on. --SM5POR (talk) 11:11, 28 February 2022 (UTC)[reply]
@SM5POR: Fantastic work! Yeah, certainly better to make the queries too narrow than too broad, if we're to have any hope of automating parts of the migration. I've been a bit busy IRL, but I'll try to absorb your stats work when I can. Swpb (talk) 18:01, 28 February 2022 (UTC)[reply]

Performance and database design

[edit]

@Swpb: Okay, after spending several hours on this timeout issue I have a weird feeling of Wikidata approaching the technological singularity (Q237525) and that it will appear to us in the image of the .NORM Normal File Format (Q66498719)...

When you search for any of (P642) qualifier without specifying either its value or the statement it belongs to, it will match also those claims that have values other than Wikibase items, including the spherical vectors representing the P2215 (P2215) of over four million stars (as each vector consists of two scalar values, and some stars come with multiple vectors, the total number of statements is over ten million), and this is what I believe is causing those annoying timeouts.

Therefore, if you need to do such a search, try to eliminate those ten million matches as quickly as possible (unless you are actually looking for them, of course). I don't know which is the most efficient method, but you could use either one of the following lines:

  • FILTER(?qval NOT IN (wd:Q13442, wd:Q76287))
  • FILTER(?qval != wd:Q13442 && ?qval != wd:Q76287)

That said, I also want to comment on the very idea of representing the vectors in this way. I'm not an expert on database design, but I have a general background in computer science, and this representation intuitively seems just ... weird. It may also be a performance hog even beyond causing us trouble when searching for of (P642) qualifiers (which admittedly is not what Wikidata users typically do), but I'm not sure.

It might be acceptable for just a handful of objects entered for demonstration purposes, but converting an entire database of four million stars into this format is like, I don't know what, maybe wrapping up a small airplane in paper packaging, attaching a label with an address across your own street and putting it in the mail? Just because it can be done doesn't mean it's a good idea!

My point here is that in this case I don't think it will be enough to just replace of (P642) with a different qualifier, even if a new one is defined exclusively for use with P2215 (P2215), but this astronomical database seems to call for a substantially different approach.

I'll stop here, because this goes way beyond the scope of the of (P642) deprecation effort, but I would suggest referring this back to the Wikidata community as a separate issue of technical performance rather than data quality, perhaps to be resolved through a Phabricator (Q16509734) ticket to create another vector datatype (or something like that).

The next item on our laundry list is another database, this time of sporting achievements, though fortunately it's not quite as large... --SM5POR (talk) 07:00, 2 March 2022 (UTC)[reply]

The tables are still hard to read

[edit]

I'd love to understand these tables but they're just too big.

--Push-f (talk) 06:14, 10 December 2022 (UTC)[reply]

So true! We should convert them to queries like I've done at the bottom of the page instead. Lectrician1 (talk) 06:33, 10 December 2022 (UTC)[reply]
A simple list of queries does not contain the information needed to identify, distinguish, and track the use cases. We can look into displaying the tables differently, but all the information currently in them is necessary. Swpb (talk) 20:00, 12 December 2022 (UTC)[reply]
One thing we can do, which I just did, is repeat the table headers every 10 rows or so so you don't have to scroll back up to see them. Is there any other aspect of the tables you're currently having trouble understanding? Swpb (talk) 20:10, 12 December 2022 (UTC)[reply]
Shouldn't "scope of use case" be the very first column? https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality/Issues/P642#Use_cases_by_property_being_qualified Lectrician1 (talk) 20:24, 12 December 2022 (UTC)[reply]
The main aspect that makes these tables so unreadable to me is that they have so many columns. (I am not saying that this information isn't necessary ... but simply displaying everything as a column just makes the information too hard to grasp for me). --Push-f (talk) 03:36, 13 December 2022 (UTC)[reply]

I suggest that instead of:

Scope of use case
Case ID Query Quantity Statement property Statement value class Qualifier value class Proposed handling Status Example item
c2 Query (limited) 11 creator (P170) human (Q5) (instances only) creative work (Q17537576) (classes only) Verify, then qualifier applies to part (P518) Grosshaus (Q19362549)

We format the tables as follows:

Case Query Quantity Proposed handling
Screator (P170)Oof (P642)X
Sinstance of (P31)human (Q5)
Xinstance of (P31)creative work (Q17537576)
Query 11 Screator (P170)Oapplies to part (P518)X

--Push-f (talk) 04:00, 13 December 2022 (UTC)[reply]

looks good. put quantity before query though Lectrician1 (talk) 16:52, 14 December 2022 (UTC)[reply]
  1. Case ID, status, and example item are useful pieces of information - the first provides a means to refer to the case without spelling it out, the second is vital for tracking progress, and the third is vital when no succcssful query has yet been written for the case. And I think the "example" column should stay next to the query and quantity columns, logically.
  2. Putting the whole case definition in one cell might work (although it makes things a lot wordier), but we need to be careful to get it right. For one thing, in your example, O, not S, would be instance of human, and X would be subclass of, not instance of, creative work. Pardon the offense, but with two errors in one straightforward example, I'm not confident you're ready to convert the tables in this way. If you can be patient, I can try my hand in the next few days. Swpb (talk) 18:35, 14 December 2022 (UTC)[reply]
Can we simplify the instance of (P31) and subclass of (P279) selectors in the case column somehow? When I edited these tables earlier this year, I found them quite cumbersome to specify and/or determine from what was written, especially when the "instances/classes only" selector seemed to deviate from the default specified in the column heading (and I think I wrote that very sample case you quoted above). These two selectors (maybe also the combined form, "instance or subclass of") will effectively be part of every case and should be easy to recognize but not occupy too much horizontal space. Abbreviations "I", "C" or "IC" (or maybe lowercase not to look like the variables)?
This suggestion does not apply when the property being qualified actually is instance of (P31) or subclass of (P279) and is specified in the first line, but applies only to the S, O and X variable definitions below.
Do you plan to convert the preparatory statistics table too? I would suggest not spending too much effort on it, as it was just intended to give an initial assessment of where the bulk of the use cases were, back when we had to deal with millions of proper motion statements. Most of the columns read "any" or "n/a" anyway. I recently added a few queries to that table, suggesting that "position held" will be a chore, maybe some 25 percent of the remaining cases. SM5POR (talk) 19:47, 14 December 2022 (UTC)[reply]
Sorry, I misremembered; I was referring to subject has role (P2868), not position held (P39). But position held (P39) comes next. SM5POR (talk) 20:47, 14 December 2022 (UTC)[reply]
I have split the preparatory statistics table into two, based on what columns are left unspecified. Now the first table could be simplified a lot more, while the three entries in the latter could be broken down into more specific cases used later on (according to the new table format being defined). SM5POR (talk) 21:27, 14 December 2022 (UTC)[reply]
Putting the case definition in one column will save a lot of horizontal real estate (at the expense of vertical), so I don't know if the abbreviations do much for us, and I don't want to confuse people further by using a C for "subclass of" which starts with an s. It might be good though to have a shorthand, maybe an asterisk, that says S or X must be a direct member of the class, as opposed to its subclasses (which I've been doing with the word "exactly"). Swpb (talk) 21:34, 14 December 2022 (UTC)[reply]
I picked "C" merely to avoid confusion with the S (subject) variable, but if we stick to lowercase we could use "sc" as an abbreviation for "subclass". Since an asterisk is used in SPARQL to indicate an arbitrary number of the same property in a path, I'd rather use that as a shorthand for the indirect class than for the exact one. Consider:
Especially the i/sc* expansion will be pretty long to spell out in full. Making the table row fit in the browser window is just one part of the idea; another is not allowing the reader's visual or mental space to be dominated by repetitious constant expressions. But of course we can use a less drastic shorthand, such as:
Just make sure to specify before the tables which shorthand you are using, so that readers won't have to guess what each expansion is. SM5POR (talk) 06:17, 15 December 2022 (UTC)[reply]
Fair, I'll use the asterisk in the SPARQL sense. I'll also use the slash in the SPARQL sense, meaning stringing together properties -- so for "or", I'll have to use the word "or". Swpb (talk) 15:24, 16 December 2022 (UTC)[reply]
I have also added a couple of queries for statistics per property and per property type, respectively. SM5POR (talk) 00:09, 15 December 2022 (UTC)[reply]
Since the first case field line is mostly constant, maybe put SPOof (P642)X in the column heading and define P in the case field?
Also, consider replacing O (object) with V (value) to allow for non-item property types (string etc which is neither a class nor an instance of anything). SM5POR (talk) 21:02, 14 December 2022 (UTC)[reply]
I agree with both of the above suggestions. Swpb (talk) 21:35, 14 December 2022 (UTC)[reply]

Trying it out

[edit]

@SM5POR, Push-f, Lectrician1: I've converted the section Qualifying instance of (P31) or subclass of (P279) (expand to is a list of (P360)?) as worked out above. Please take a look and see if you consider this easier to understand than separate columns. To me, the differences between i, sc, i/sc, i/sc*, i/sc+, etc. seem easy to miss - maybe they should be uppercase and/or bold? Swpb (talk) 16:16, 16 December 2022 (UTC)[reply]

I agree about the meta-syntax being easy to miss, but I doubt uppercase will make much of a difference. However, bold and italic plus perhaps underlined or a surrounding pair of brackets like [i/sc*] ? Also for or, any and = (but without brackets). Not sure about TBD; if jammed into the property expressions it should probably be TBD as well, but when standing alone in a field it doesn't matter. Suggested new properties can be set in italics only (not bold) to indicate they will be replaced with the real thing. SM5POR (talk) 19:21, 16 December 2022 (UTC)[reply]
Considering the content of the is3 case ID it should probably be split into multiple cases anyway (to group similar classes together but put very different ones well apart) even if the resolution happens to be the same. That should reduce the complexity of the expression (and the actual query) quite a bit. SM5POR (talk) 19:31, 16 December 2022 (UTC)[reply]
By similar classes I mean:
I haven't checked the actual items in those classes or what the statements intend to convey. Speaking of which, I see no property (P=) definitions in those use cases. Lost in table format conversion? SM5POR (talk) 20:29, 16 December 2022 (UTC)[reply]
Never mind, I forgot this was about instance of (P31) and subclass of (P279). That however brings me to the point I made below: Those properties preferably shouldn't have qualifiers, but possibly main value statements on the same item (if necessary). Another reason to split that case entry. SM5POR (talk) 20:47, 16 December 2022 (UTC)[reply]
There's a lot for me to read and absorb here and below, but I want to clarify one thing: the proposed handlings don't qualify instance of (P31) or subclass of (P279). They replace qualifiers on those properties with main statements. The confusion is probably that "X" is a qualifier value in the statement being replaced, but (with the exception of is5), it's a main value in the replacement statement. I don't know if I agree that instance of (P31) and subclass of (P279) should never be qualified, but that's the opposite of what we're doing here. Swpb (talk) 17:00, 19 December 2022 (UTC)[reply]

Discrete quantities

[edit]

Is is a number of (P11279) intentionally restricted to discrete quantities only? The property labels and constraints seem to indicate that, but when I search for subclasses of number of entities (Q614112) (the subject type constraint) I find items such as natural density (Q752723), quantity of information (Q2070563), Schnirelmann density (Q2482753) and pulse count (Q115536778), all of which I believe may sometimes evaluate as non-integers (possibly rounded to a number given with zero decimals, such as pulse count (Q115536778), if that's actually equivalent to pulse (Q37723634) or heartbeats per minute; I find this one unclear).

Since these items are inconsistent with certain property values assigned to number of entities (Q614112), something should be done about those items anyway, but I want to check whether this will affect the ongoing transition from of (P642) use, and if so, how. Should the constraint be changed or upheld as is, and should of (P642) in some cases be retained for replacement with a different qualifier? Are there any properties associated with the quantities mentioned? SM5POR (talk) 18:51, 14 December 2022 (UTC)[reply]

Positions and occupations

[edit]

I have inserted a new table section before "other" properties since there are a substantial amount of positions and occupations. I include roles as well since they may overlap, but they may also concern non-human items (corporations etc). Diplomats require special attention as they have relations to multiple countries, and of (P642) may refer to different attributes depending on which government they serve (the Holy See vs regular countries). This concerns entry o1 which I think must be further subdivided, and possibly also o2. SM5POR (talk) 12:03, 15 December 2022 (UTC)[reply]

Are "assigned to" and "in service of" new properties to be proposed? Swpb (talk) 18:05, 15 December 2022 (UTC)[reply]
Tentatively; I'm not writing a formal proposal yet, as I would like to see what other replacements for of (P642) may be desired. Today I have searched for similarly labelled properties or items just in case something may already exist, but I haven't found any.
In any case, there will be a need for two different qualifiers as they may be attached to the same position held (P39) statement in the case of diplomats ("in service of" one country or intergovernmental organization such as the United Nations or the European Union, "assigned to" another), delegates from an organization to an official conference where they represent their home entity, and similar.
If they are created (or found), besides being used jointly the qualifiers may also be useful individually, in particular any public office (emperor, president, prime minister, prosecutor general, ombudsman, mayor, speaker) or corporate management role (CEO, CIO) "in service of" a country, a municipality, a private business enterprise, an intergovernmental organization, an NGO, a fraternity or other not-for-profit organization. It's not meant to replace employer, but rather to cover specific situations where naming an employer is insufficient or inappropriate (such as for an elected but unpaid member of a corporate board, or when the entity being served is not identical to the one paying the person's salary).
Likewise, a judge or an employee of a law firm may be "assigned to" a legal case, a member of parliament as a delegate to a legislative committee, a business consultant or a field service engineer to a particular customer with a long-term service contract etc without either of them stated to be "in service of" somebody or something for that particular position, occupation or other role.
I initially thought of these as being qualifiers only (to qualify position held (P39) and possibly other properties), but then I noticed that there are several instances of offices tied to specific countries such as "Ambassador to the United States of America", in which case "assigned to" could be used as a main statement on that item, and thereby be redundant with respect to the current holder of that office ("Ambassador to Chile" of (P642) "Chile" should simply have that qualifier dropped, no need for a replacement).
While I'm not sure about "in service of" qualifying anything besides position held (P39), "assigned to" could definitely be combined with "subject has role" (presiding judge, defense attorney, sales representative etc) together with start time and end time, possibly even occupation (though I have no good examples of that; look for instance at Franz-Josef Sehr (Q22075663) for an extreme case of over-use of "occupation", the majority of those statements instead ought to use position held (P39), in my opinion).
When considering the diplomats, I came up with a potential third qualifier, "stationed at", such as when an ambassador assigned to Tuvalu is actually stationed at the embassy in Tonga, but I doubt it will be necessary merely to phase out of (P642) (and there may even already exist an appropriate qualifier for that; I haven't checked). Or, a (redacted) in service of the United States Department of Justice being temporarily stationed at the (redacted) in the Hague while assigned to the January 6th (redacted) case, among other things (trying not to attract undue attention by search robots)... SM5POR (talk) 21:49, 15 December 2022 (UTC)[reply]
I do support "assigned to" and "in service of" as new qualifiers.
One remark about the handling: 'diplomat' is an occupation rather than a Position, whereas ambassador is the specific position which does need the qualifiers "assigned to" and "in service of". Wikipeter-HH (talk) 11:21, 6 February 2023 (UTC)[reply]

Qualifying classes

[edit]

Looking at case ID is3, where I find various public offices relevant to position held (P39), in general I don't think it's a good idea to stuff arbitrary qualifiers on the instance of (P31) or subclass of (P279) properties, as that may lead to ridiculously complex property constraint definitions. Taking shah (Q184299) as an example, it now has a statement shah (Q184299)subclass of (P279)ruler (Q1097498)of (P642)Iran (Q794). That qualifier could be replaced with a main statement shah (Q184299)in service ofIran (Q794), leaving a clean, unconditional subclass of (P279) statement.

Qualifiers on instance of (P31) of subclass of (P279) statements should be limited to situations where the class relationship of the item is somehow conditioned or dependent on other factors not inherent in the item, say, the statement is either disputed or subject to legislation.

I also question the routine use of the country (P17) main statement, in particular when it comes to diplomatic missions. Does it indicate which country the diplomats come from (in service of) or which country they are assigned to? Besides being ambiguous in that way, applies to jurisdiction (P1001) has the additional problem of referring to a legal concept to define the relation to a territory. The Supreme Court has a natural connection to a jurisdiction, while the schools or social services of a municipality do not. And "mayor applies to jurisdiction"? I believe not; I wouldn't use that property even with a judge.

It's probably a bad idea to try to find replacement properties for "any" statement value class, as that class is likely to be more varied than the qualifier value class. SM5POR (talk) 09:09, 16 December 2022 (UTC)[reply]

I'm having second thoughts about using in service of with offices established; they could rather use country (P17) as given by the main statement. Then in service of wouldn't belong on anything but human (Q5) items as those are temporary roles, given using the qualifier Sin service ofIran (Q794). In contrast, the office belongs to the government throughout its entire existence, and is therefore specified using main statement shah (Q184299)country (P17)Iran (Q794).
The property assigned to may however be applied either as a qualifier on a human (Q5) or as a main statement on an office aimed at a different country (such as an ambassador/embassy, consulate or other diplomatic mission). SM5POR (talk) 07:31, 17 December 2022 (UTC)[reply]
@SM5POR: Since you've put a lot of thought into how to break down and refine is3/o1/o2/j*, I'd be very happy for you to take the lead on that cluster of use cases; I don't want to own this effort. I want to make a couple of points though:
  1. I'm largely not advocating qualifiers on instance of (P31) and subclass of (P279); in fact in almost all cases using those properties, the proposed handling is to remove a qualifier and replace it with a main statement. (One exception is i20, which I think is pretty idiosyncratic.)
  2. My personal approach, just for expediency, has been to try to use existing properties and expand their scope where reasonable, since the bar for new properties can be high. Obviously some new properties will be needed though.
  3. I agree that for instance of (P31) and subclass of (P279), we have to be quite careful about writing cases where the statement value class is "any", which is why the only cases that currently do so, i7 and i9, have question marks on them to suggest more thought is needed. But in the "Qualifying other specific properties" table, I think the properties being qualified are narrow enough in scope that the value class can be left open without much risk of surprise - are there any particular ones there that worry you? (There's an odd duck in x3, where both the property and the value class are "any", but in that case, the relationship between the statement value and the (original) qualifier value constrains things to where the proposed handling is sure to make sense.)
  4. I appreciate the depth of your thinking on these matters, but I wonder if we should all try to keep our comments somewhat concise. I'm imagining someone coming into this effort fresh and wanting to use the talk page to get up to speed, and being pretty overwhelmed. Just a thought, no offense intended -- obviously this comment is fairly long! Cheers, Swpb (talk) 20:27, 19 December 2022 (UTC)[reply]
    Concerning your final point, I definitely agree, even as I have a habit of writing long essays (I had just prepared another argument, but reading your comments above I think it may be largely redundant, so it will be reworked now).
    But still, this talk page is likely to continue to grow anyway, and I wonder whether it would be a good idea to create a separate Wikidata:WikiProject Data Quality/Issues/P642/Strategy sub-page or something, to summarize our general plan with motivations, and refer the more elaborate ideas to its corresponding talk page (we could even move some of our past discussions above to it, such as my fuming over the proper motion 800-pound gorilla now gone). Anyone interested in helping out just a little bit shouldn't need to read those discussions. Also, the strategy page should be subdivided by problem area to function as a lookup reference, not adhere to some sequential timeline. --SM5POR (talk) 23:27, 19 December 2022 (UTC)[reply]
Yes, sub-talkpages for organization/archiving is a good idea. Swpb (talk) 15:28, 20 December 2022 (UTC)[reply]
@Swpb, @Push-f, @Lectrician1: Well, I created that strategy page and provided it with an initial section layout based on topic. As for our ongoing discussion here, I think it belongs on Wikidata talk:WikiProject Data Quality/Issues/P642/Strategy from section 3 "Statistics" (inclusive) and onward. Can we move those sections in their entirety, do you think, and continue this conversation there? --SM5POR (talk) 12:11, 21 December 2022 (UTC)[reply]
@Swpb, @Push-f, @Lectrician1: ... or maybe move this entire Talk page down one level (to become the new Strategy talk page before we create it by other means), to have its history moved along with it (that's how it works, right)? The detailed history function is my favorite feature in the Mediawiki software, and this page doesn't contain much of interest anyway to new editors who may be joining the deprecation effort hereafter. The main P642 page could then remain a compact status page, and the corresponding talk page should be limited to comments on what actually is on the status page. Your approval, suggestions or potential objections are much appreciated, as I'm still a bit unfamiliar with those "drastic" changes. --SM5POR (talk) 19:25, 21 December 2022 (UTC)[reply]
For the record, the idea to move this entire talk page has been reconsidered. We may shorten the page in other ways. --SM5POR (talk) 15:16, 15 January 2023 (UTC)[reply]

Process schedule

[edit]

In step 3 of the migration process (seeking consensus) I'd like to see a pause after the new properties have been created, to allow for them to be properly labelled in the major languages and the new labels verified not to cause confusion, in case some last-minute adjustment might be warranted. It would also allow for some manual intervention before the automated conversion process is started, for instance to handle changes to properties using the of (P642) qualifier (I'm a bit wary of letting a robot do properties as well, but I don't know what the plan is for them).

Several of these use cases are being worked on in parallel anyway. When the new properties have been created, can we set dates in the status column (say, two or three weeks into the future) telling when processing is planned to commence? Or will that result in too much delay in total?

There may also be editors working on complex items who should be allowed to process them manually according to their own plans, once they get access to the new qualifiers they need (I have been talking to one so far). --SM5POR (talk) 15:13, 15 January 2023 (UTC)[reply]

I found 124 cases of of (P642) used on property items themselves. I believe we can process these early on and thereby get them out of the way. --SM5POR (talk) 19:16, 15 January 2023 (UTC)[reply]
Seems reasonable, added to process. A fixed waiting period might not be needed if there's confidence that the labels are good, but in general, it's best to take things slow and get them right. Swpb (talk) 18:57, 17 January 2023 (UTC)[reply]
@Swpb: Thanks! Per our earlier understanding, I'm now placing comments about the translations on Wikidata talk:WikiProject Data Quality/Issues/P642/Property labels.
When going through some of the Property namespace entities, some of them disappeared from my view for other reasons before I had worked out how to handle them. I therefore left the use case entry in place with a "Closed" rather than "Migrated" status label.
I added a special query for other mentions of the qualifier, and found Category:Pages using Wikidata property P642 (Q35489153) with some Wikipedia links. Now, have a look at, say, the code in sv:Modul:Cycling race. Who is supposed to assume responsibility for that, and when (in relation to our migration schedule)? Who should be pinged?
Most of the items I found through that query are property constraints. There may be properties supporting of (P642) without them having ever been used (we can deal with those when all the actual use cases have been migrated), as well as use cases that aren't supported but already resulting in constraint violations (wherefore I see no point in adding even more constraints to discourage such use). --SM5POR (talk) 12:05, 18 January 2023 (UTC)[reply]

how-to instructions?

[edit]

Is there a concise user guide - how to do it without "of"? After reading these discussions I still don't have a clue. E.g. motor-generator has use transformation of energy, frequency et al. What is the better alternative(s)? Retired electrician (talk) 06:24, 12 February 2023 (UTC)[reply]

A guide needs to be written, but it would look like this:
1. Identify the four parts of the statement for which you're trying to replace "of" – in this case:
subject: motor-generator (Q1784738)
property: has use (P366)
statement value: transformation (Q65757353) (note, this is the item you meant. transformation (Q461499) is a term in biology.)
qualifier value: energy (Q11379), frequency (Q11652)
2. Go to Use cases by property being qualified and look for a table covering the property in your statement. If there is no table for that property, see if there are use cases for that property in the table Qualifying other specific properties. If not, go to the last table, Qualifying any property. (In most cases, the property is the most important factor in determining how a statement should be handled, which is why the tables are organized by property. In this case, there are no use cases based on the property has use (P366), so we end up at the last table, which has use cases based on the statement value and qualifier value.)
3. Look for a use case in the table that matches the statement value and qualifier value. In this case, we note that the statement value, transformation (Q65757353), is (by inference) an occurrence (Q1190554). There are two use cases matching that, x4 and x5. The qualifier values, energy (Q11379) and frequency (Q11652), are properties of the system being transformed, so x4 applies well: motor-generator (Q1784738)has use (P366)transformation (Q461499)criterion used (P1013)energy (Q11379).
There are two very important things to remember here:
1. In many cases, no alternative has yet been identified/agreed on/created, in which case it is fine to keep using "of". That is why the property isn't actually deprecated yet. Where replacements have been settled on, we are implementing constraints and property use statements to direct users to those replacements, so they won't have to search these tables. In some cases the replacement can be automated, with e.g. {{Autofix}}.
2. In most cases, the migration of statements will be done using the sparql queries for each case – in other words, going from case definition to instances, not the other way around. So there shouldn't be much need to search these tables for a use case that matches a particular statement, which is always going to be tedious given the nature of the problem. Swpb (talk) 16:28, 13 February 2023 (UTC)[reply]

Qualifying statements on Lexemes

[edit]

@SM5POR and Swpb While rarely, how to handle such 33 Lexeme usages of P642? Liuxinyu970226 (talk) 00:53, 10 August 2023 (UTC)[reply]

@Liuxinyu970226: Until I learn about a situation requiring otherwise, I think the most useful selector for how to deal with of (P642) is the main statement property this qualifier is attached to, not which namespace (Property, Lexeme etc) this statement is found in.
Therefore I suggest using the same queries already tabulated on the main page to identify use cases, but modify them to search the Lexeme namespace only (or any other namespace you are interested in). --SM5POR (talk) 15:01, 27 August 2023 (UTC)[reply]

instance of: P1056 may be applicable

[edit]

instead of "instance of production process of X" Binarycat32 (talk) 22:45, 23 December 2023 (UTC)[reply]

That's a good rule, but there doesn't appear to be anything to migrate. Here's the query for statements using P642 that way: [1]. As of this morning, there was only result, which you created: [2]. I just removed that "of". Swpb (talk) 14:15, 3 January 2024 (UTC)[reply]

in service of

[edit]

I work a lot with the elements of local officials (e.g.: mayor), I would like to see the qualification "in service of".

I have created the request page: Wikidata:Property proposal/in service of, but have not yet inserted it into the discussion page. If you want to complete the description or the examples, I'll wait another day. Pallor (talk) 15:00, 9 February 2024 (UTC)[reply]

Looks like all the examples given would already be well covered by employer (P108). I'd recommend not proceeding with the proposal unless there's a good reason that property won't work. Swpb (talk) 22:03, 9 February 2024 (UTC)[reply]
Swpb: the employer is not good in any way, since a significant part of the elements belonging to the qualifier are settlements. A settlement can never be an employer, even though it is referred to as such in everyday speech. A well-defined institution of the settlements can be the employer (for example, the mayor's office). But even this is not always the right solution, for example the leader appointed by the state to head an administrative unit (e.g. prefect) does not work in the institution of the settlement, but in the state administration.
On the other hand, the term "in service of" accurately covers the system of relations that connects the specific position and the settlement. This relationship can be different, for example, between the elected leader of a county and the leader appointed by the state to head the county, but on Wikidata we can describe it as "serving" the county for both, but the position is already different.
Thank you for raising the question, I will explain this in the paper as well, and maybe I will also remove the examples representing the employment relationship from the examples. Pallor (talk) 22:49, 9 February 2024 (UTC)[reply]
@Pallor,@Swpbː I'm sorry I wasn't paying attention when you brought this up earlier this year, at that time I was occupied with moving from one place of residence to another for primarily medical reasons, thus I had no chance to participate in this discussion. I discovered only today that "in service of" had already been formally proposed and added my comments to the proposal discussion just in case you may find them useful (the proposal currently looks like it was rejected due to lack of support, maybe it could be revived somehow).--SM5POR (talk) 11:20, 10 May 2024 (UTC)[reply]
My reply is here. Swpb (talk) 16:47, 10 May 2024 (UTC)[reply]

Assigned to

[edit]

@Swpb,@HotMessː The "assigned to" qualifier is meant sometimes to be used in conjunction with "in service of", and their definitions thus contrast each other, I suggest we compare them here, in addition to the comparisons provided on the [./Property̠labels] talk page (which is mainly intended for label translation issues--SM5POR (talk) 13:16, 15 May 2024 (UTC)[reply]