Wikidata:Requests for comment/Current or contemporary location and country for events

From Wikidata
Jump to navigation Jump to search
An editor has requested the community to provide input on "Current or contemporary location and country for events" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

What might might seem simple requests "Show me all battles that occurred within the current boundaries of Poland" and "Show me all battles on Prussian soil" are fraught with problems.

Events that occur at a particular place can be characterised by coordinate location (P625), country (P17) and location (P276).

  • The coordinates are unambiguous, but as far as I know there is no SPARQL feature that allows country or any other level of admin area at a particular date to be deduced from it.
  • Should the country be the country at the time, or now. The former makes more sense in context, but disconnects it from contemporary geography. You could have separate entries for a battle and its battlefield, but that would normally be overkill.
  • So you could always quote a location for a event, and choose to use that to get the current country indirectly. We have no SPARQL query that returns historical country for a location at a point in time, but the use case for "Roman battles on Prussian soil" is rare. But this requires the location to be recognisable in current geography, when again a contemporary location might make more sense in context. But we are creating a dataset for modern people, so modern locations seem best.

Historians tend to talk about "modern day X", but we don't currently handle that in WD. I suggested a new property "contemporary_country" to go with country of origin (P495) and country of citizenship (P27), as country (P17) is often abused, but that wasn't popular (Property_talk:P17#Possible_new_property_contemporary_country)

My proposal would be events have the country at the time, and a contemporaneous warning put in place, and best practice suggested that the location used be both modern and be accurate enough to not straddle multiple current countries (unlike Gaul (Q38060)). I don't know if that can be encoded as a constraint.

I'm conscious that my focus is narrowly on battles, so would appreciate input from other sorts of events. Sports seem to use contemporary country for example. Vicarage (talk) 09:01, 3 February 2024 (UTC)[reply]

So you are proposing the value for country (P17) correspond to the time of the event, while location (P276) should be a fine-grained modern location, is that it? ArthurPSmith (talk) 21:36, 5 February 2024 (UTC)[reply]
Yes. The former is the bigger change, my concern with the latter is that every location has a modern country, which is nearly always the case. Only in rare cases has a location have all its historical countries recorded, and even for those a filter could reject them. Vicarage (talk) 21:41, 5 February 2024 (UTC)[reply]
First: this does not only concern events like battles, but also destroyed building, lost villages and more.
Since the beginning of Wikidata P17 is used for both the current country and sometimes also for the past country. This also for all existing as well as destroyed objects that were destroyed before the foundation of a country. For example, a Roman villa from about 2000 years ago has its location in a nowadays country. We all know that for example the Netherlands did not exist at the time of the Romans, we still say Romans in the Netherlands (Q2272797).
Further, the concept of countries is only a few centuries old. The Roman Empire was not a country. Before the concept of countries, there were duchy (Q154547), countship (Q353344), and so on.
Also, in the past year a successful project ran to give every single building, destroyed and existing, the country of where it is located now.
But yes, also battles get the nowadays country. For example the Battle of Waterloo (Q48314) took place in 1815, where? In Belgium, a country founded in 1830. In Belgium and elsewhere it is common to say it took place in Belgium. If anyone wants to know which battles took place in Belgium since it founding, excluding the ones before the country was founded, this can be indicated in a query in multiple ways. Putting a contemporary constraint in place, basically would mean that on thousands of items data gets removed, and based on no data a query can't run. Putting a contemporary constraint in place for geographical indicators is a bad idea. Romaine (talk) 01:04, 20 February 2024 (UTC)[reply]
The vague use of country (P17) sometimes for current, sometimes for contemporary country, is what I'm trying to address.
I agree with you that existing objects, whether they be castles or paintings, should have current country. But the contemporary constraint applies to end time (P582), which applies to events, not objects and institutions, and they should be using dissolved, abolished or demolished date (P576). Indeed the 12 castles with the former should be moved to join the 484 with the latter. The 2 terms need better guidance for applicability, perhaps constraints themselves.
Romans in the Netherlands (Q2272797) is always going to be anomalous and raise flags, that's why we call them anachronism (Q189203). They are few in number and can be marked as such.
If you wish to find the current location of a battle, it can be done indirectly via location (P276), the country at the time is of much more use historically. If we ever get the the SPAQRL ability to ask for country at a point in time, this will be done by location or coords. Its easy to query battles in Belgium (Q31), battles fought in the United Kingdom of the Netherlands (Q15864) is currently very hard.
The concept of historical country (Q3024240) is well understood, and goes back to pre-history.
Country boundaries change, asserting that an event occurred in Germany when at the time it was France is confusing, especially for political items.
I am not proposing removing existing country records, but nudging people to add the historically correct and more useful value. If country is missing, then the location based fallback is trivial.
I still propose adding the contemporary constraint for occurrence (Q1190554) Vicarage (talk) 07:57, 20 February 2024 (UTC)[reply]
The constraint you added worked applied also to objects.
You say Romans in the Netherlands (Q2272797) raises flags, but this example (of many) is how this is done in daily life. Anachronisms can be queried, data that is removed can't. I don't see a reason to mark them as such, but if you want to add a qualifier to indicate the anachronism I am fine with it. (Adding data (qualifiers) is better than removing data.)
The concept of historical country (Q3024240) is an anachronism (Q189203), as the concept of countries is relatively new.
I understand it can be confusing but a contemporary constraint (aka removing data) is not a solution here. If you add a constraint, you basically ask people to remove this data. Then being confusing isn't taken away. The Battle of Waterloo (Q48314) (as example) took place in Belgium says the history books/media/etc, even while that is an anachronism. So if you want to take the possible confusion away, it is better to add a qualifier to indicate the anachronism. Romaine (talk) 16:51, 20 February 2024 (UTC)[reply]
I don't want people to remove data, I want them to replace it with the correct values. Constraints have never been used to remove data, just to give people cause to reconsider it.
Country is a shorthand for sovereign state, and throughout history people have understood the chain of control over their region, and we have identified a set of high level control bodies, and called them historical countries. And even if someone quotes the Spanish Netherlands rather than the Holy Roman Empire or Spanish Crown, that's still better than claiming an event in 1600 occurred in Belgium. The political situation is the time is the relevant thing to use when using country (P17) for events. Qualifiers are rarely a good solution to problems, as they are too dependent on the whim of those entering the data, so in practice prove impossible to query.
And if we are to be consistent with your approach, should change all the claims of country of citizenship. Was Beethoven born in Germany, because that's where Bonn is now? He's ethnically German, but that's a shorthand for a very braided history. I see that country of citizenship (P27) has a contemporary constraint, as does country of origin (P495). Vicarage (talk) 17:11, 20 February 2024 (UTC)[reply]
Constraints mark two things: or something is missing or something is wrong. If you add a constraint, and users don't see a solution, users will remove data, period. That is the practical answer users have if they see a constraint. Constraints maybe are only to "reconsider" it, that is the theory, practically it results in removing. I do not see you bringing any arguments why this is not the case. Even worse, you say that users should "reconsider it", and you write here above that adding a country that did not exist at the time should not be added, so you basically ask users to remove the data!
The concept of nationality is even more recent than the concept of country. Comparing it with country of citizenship (P27) does not help.
The strict perspective you describe is not the common way how this is approached in society. I literally have books here on the shelf with titles as "The Romans in the Netherlands", "Netherlands in the Bronze Age", etc etc, books that are considered qualitative sources. On Wikimedia platforms we base ourselves on sources.
I have described why this constraint (as it was added) is a problem and not a solution. I recognise that there are anachronisms, but it seems you consider that as a problem, while I don't see this as a problem as this is done in various sources in this way. Based on what you describe, having anachronisms is not a problem for building queries. Therefore you haven't provide a reason why adding a more modern country is to be considered a problem. Still I am willing to think about alternative ways to add/mark this data. Therefore I have suggested to use qualifiers, as those are often used when it concerns suboptimal data. I can imagine a situation that we establish a specific way to mark anachronisms with a qualifier, together with a constraint that only marks it as problematic if the qualifier is missing, with the suggestion to add the qualifier. Alternatively we can think of a qualifier to mark a country as being nowadays and not thenadays. You rejected the use of qualifiers purely because users are not consistent in what they add as data, even while qualifiers are massive used. It is true users are not consistent, but this is the case with everything and with the help with constraints, queries and bots this can be dealt with.
If you stick with you idea that you want a constraint and nothing else is possible, please consider your constraint proposal as rejected. If you do want to find a solution in what both perspectives are taken into consideration, I am happy to think and work with you towards a solution. Romaine (talk) 07:02, 25 February 2024 (UTC)[reply]
Opinions sought from others, Paging @Jheald, @abi, @VIGNERON, @Jarekt@Jmabel, @Infovarius, @Oravrattas, @SilentSpike, @Ghouston, @gerardM @Yair rand, @jura1, @Syced @Fralambert who contributed to a country (P17) discussion in 2019. Vicarage (talk) 09:43, 25 February 2024 (UTC)[reply]
I suspect the solution to this is going to have to be to entirely rethink our concept of "country" in all the properties that refer to it, and probably replace most or all of those properties with better ones. "Country" is one of those concepts that seems useful, but it's too poorly-defined, especially over time. Everyone thinks they instinctively understand it, but that starts to break down really badly at all sorts of edge cases. We've had many discussions about lots of those edge cases over the years, and usually there's no real resolution, largely, IMO, because there can't be, as the underlying concepts we're working with are fundamentally flawed. Oravrattas (talk) 10:01, 25 February 2024 (UTC)[reply]
My two cents here. I concur with Oravrattas that country is simply too general and easy to misuse. Vicarage's query highlights a common concern that I also had in more than one occasion.
In general I think in our discussion we should clearly distinguish between at least two broad categories:
  • Point-in-Time Events: (battles, uprisings, treaties, sports events...), which are tied to a specific historical "country" and on the other side
  • Evolving Items: (villages, buildings...) which may change national affiliations over time.
For "points in time events" I would prefer the country (P17) to be as it was historically at the time of the event, to maintain historical accuracy and context (can be very useful on other wiki projects as well) and use location (P276) for the modern (fine-grained?) location to provide a connection to contemporary geography. If this is clearly established, then it would also be "easy" to query these events via SPARQL or other means.
For items subject to change over time I find the approach of country + qualifiers to clearly indicate the relevant time frame still the best we have. Nastoshka (talk) 10:24, 25 February 2024 (UTC)[reply]
For cases with a location, I would use located in the present-day administrative territorial entity (P3842). As a main value or as a qualifier, I leave open for discussion. RVA2869 (talk) 10:39, 25 February 2024 (UTC)[reply]
@Vicarage: thanks for raising this topic but I'm not sure to fully understand where the problem is. Yes, location is hard (even coordinates carry often a lot of ambiguity) but we have many ways (and properties) to manage it more-or-less (or at least well enough). PS: SPARQL is good but it's not the solution to everything and it's only a tool, we should not model the data based on SPARQL alone. Cheers, VIGNERON (talk) 11:20, 25 February 2024 (UTC)[reply]
@VIGNERON I don't think this is an issue about location only. You're totally right, there are many ways to query and handle location (I've been playing myself with SPARQL after reading this thread, and yes, it's cool but just a tool).
I think Vicarage's issue is more about what is the correct value for historical events like battles, given that as per definition accepted values are both modern day countries and also historical country (Q3024240) (or even historical region (Q1620908)) and it's heavily used also on e.g. wikipedias for the Infoboxes. That there is a bit of confusion and need for clarification is clear to me based on the recurring questions on this matter in both Project Chat (March 2019) and P17 Talk page (ex.1, ex.2, ex.3) and this thread of course. Nastoshka (talk) 11:59, 25 February 2024 (UTC)[reply]
If we want to model this right, we probably need to be able to talk about both political entities of the time and political entities now (and "now" probably should be qualified with a date). The present notion of a "country" only begins to form as the Middle Ages end and, with a few exceptions, only really begins to mean much even in Europe roughly in the Enlightenment era. Then it becomes predominant rather quickly, and was particularly consolidated when most of Latin America followed Europe in organizing itself that way.
The Roman Empire was not a "country". The Duchy of Holstein was not a "country". And, arguably, the polities of most of the North American "First Nations" weren't even "states". - Jmabel (talk) 19:37, 25 February 2024 (UTC)[reply]
  • In my view it is exceedingly useful for all places and events, including historical ones, to have country (P17) = <present-day country>.
Therefore Hard  Oppose any proposal that would remove statements of this kind; also Hard  Oppose any contemporaneity constraint that would discourage people from adding such statements.
This was already my view in 2019. If it is correct, as I believe, that since then projects have systematically been adding country (P17) based on geographical coordinates, to achieve near 100% coverage for particular parts of the world, that IMO is an excellent thing, and only strengthens my opinion. If users have gained the expectation that all places and events within a country's present-day territory can be returned by querying on P17, that seems to me very very useful, as well as simple and intuitive, and we should honour it.
IMO location (P276) is not a substitute. One could make the argument that country (P17) is always redundant, because one could always trace up from P17 and the located in the administrative territorial entity (P131) chain to identify the present-day country. But in practical experience, that is not the case. Having P17 turns out practically to be an enormously useful time-saving ways to short-circuit indefinitely long P131* path queries that can be significantly slow and very large (in terms of their unwanted intermediate solutions sets -- which is what makes them slow, often fatally slow if one is also trying to do another tree-walk, eg on type of item; or quality-control of some aspect over all items in the geographical area of a country). Being immediately able to pull everything with a given P17 and then filter it (or vice-versa) is a time-saving feature which can make or break whether a query, particularly a big one, can complete or not in the available 60 seconds.
(Additional considerations: for some events it may not even be possible to accurately specify a location (P276), although one can specify a present-day country. Also, as others have noted above, it may only be in the modern period that it is possible specify a "country" with near-universality; therefore, given the choice, it makes sense that it is the present-day that should be specified wherever possible, since the present-day country is what most usually can be specified wherever possible).
For all these reasons, therefore, my hard oppose to anything that would undermine the strong expectation (which IMO we should on the contrary be encouraging) that it should be possible to retrieve the very largest number of events and places via country (P17) = <present day country> in a query. But some thoughts on flagging the issues of non-contemporaneity, to follow. -- Jheald (talk) 19:12, 27 February 2024 (UTC)[reply]
What I do think would be valuable would be to flag with suitable qualifiers any cases where the event or place or territory is not contemporaneous with the present-day country. As per what I wrote in 2019, some possibilities:
determination method (P459) = present-day boundaries (Q61912379)
valid in period (P1264) = present (Q193168)
object has role (P3831) = present-day country
... or some other qualifier = value combination
IMO we should standardise on one of these qualifier + value combinations, and roll it out systematically for all the non-contemporaneous cases. This would (i) confirm that the non-contemporaneity was intentional, not a mistake; (ii) warn the non-suspecting; (iii) allow the values to be filtered out from queries or templates if not desired.
Here's a query https://w.wiki/9JrJ that returns the most common qualifiers on a random sample of 1,000,000 country (P17) statements. It would seem that none of the above patterns are widely used as yet, though start time (P580) / end time (P582) are widely used where subject and object are contemporaneous, but only before and/or after specific dates. reason for preferred rank (P7452) = currently valid value (Q71536244) is also found; I do think the present-day value should be the preferred one, if a historical value is also given, but this particular qualifier/value combination is perhaps not appropriate, if we are wanting to specifically highlight the non-contemporaneity.
It might be possible to develop a complex constraint to flag non-contemporaneity with no such qualifier present - I'm not sufficiently up to date on complex constraints to know if they can be made to flag up on the page; and if they could be made to suggest "add appropriate explanatory/warning qualifier" rather than "change statement, it's not appropriate".
Equally, if we go down this route, we should also standardise on a qualifier/value combination to flag a country (P17) value that is historic -- eg perhaps object has role (P3831) = original value (Q115018754) or something similar, so that these too can be readily identified by either a template or a query and extracted if present. But I would suggest avoiding anything containing or based on the word "contemporary", as it becomes too ambiguous as to whether that is signposting a value that is contemporary with the historical event -- or a value that is contemporary with us. Jheald (talk) 21:43, 27 February 2024 (UTC)[reply]
I don't see why you object to the single SPARQL P276/P17 combination, and then propose a qualifier one. As far as I can see qualifiers rarely work, as most contributors don't apply them, or when they do they pick them at whim. The fact that there you can't suggest an obvious qualifier, and the ones you mention are for present countries, not at the time shows the problem. I'm sure we could design a system, but I'm equally sure it would never be properly implemented or maintained.
To be consistent, we should to the same for country of citizenship (P27) and country of origin (P495), and we'd have double entries galore but I've no idea why you'd want to say that Julius Caesar was born in the Roman Empire, led Legions from the Roman Empire who fought with the gladius, a Roman weapon, but his battles were in Italy.
But it seems I've failed to convince people, so I expect the muddle will remain. I just hope that there won't be a purge of historical information so I'll never be able to list battles in the Roman Empire. Vicarage (talk) 22:29, 27 February 2024 (UTC)[reply]