Wikidata:Requests for comment/How to deal with given names and surnames
An editor has requested the community to provide input on "How to deal with given names and surnames" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Stale RFC. Not enough input for any sort of consensus to be developed. John F. Lewis (talk) 15:09, 21 July 2015 (UTC)[reply]
It has been brought to my attention that there are several interlinks conflicts between articles related to names. Problems that IMHO Wikidata may finally solve, but that at the same moment cannot be solved, again IMHO, without a common discussion.
For example, Andrew and variants (Q389) links it:Andrea to en:Antero, instead of en:Andrew or even en:Andrea. Now, "Antero" is the Finnish version of "Andrea"/"Andrew", but then on en.wp (and possibly on other wikis) we have an "Antero" article. This replicates with quite every other name (or even surname) we have: Washington is both a name and a surname; Andrea is a feminine name in German, but it's masculine in Italian; Luis, Alois, Luigi are quite the same name, but expressed in different languages; and so on.
The aim of this RFC is to discuss and lay down some general rules, given also that now we have four properties (given name (P735), family name (P734), P513 (P513), pseudonym (P742)) related to names and surnames. I have an idea about it, which is fairly simple: we use a single item for every kind of name we have, and then we link the similar names with a "same as" or "related to" property, i.e. Andrea, Antero, Andrew, Andy, Ondrej, Andrei, Andrzej, Andreas, Andrés... are each represented by an item, but are all linked with each other through a property.
I want to stress that it's just a proposal, and not the final proposal. It's just to start a discussion, I'd like to reach a common and shared proposal in a fair time span. So, please, have your say now. :) --Sannita - not just another it.wiki sysop 16:15, 9 February 2014 (UTC)[reply]
- Note the discussion at Wikidata:Project chat/Archive/2014/02#Name of person and the proposal discussions of the existing name properties. There are several problems we need to discuss, I believe:
- Translation problems as outlined by Sannita above (incl. gender issues, spelling differences in transliterations and translations)
- People having several names, used under different contexts (qualifiers?)
- People with names that don't fall under the given name/family name dichotomy
- The datatype the name properties are to use (item or MultilingualText?)
- @Wylve: Exactly because I noticed the discussion at the project chat I thought to open this RfC. Since the discussion has been already archived, I think it's better to have a page that cannot be archived so easily and to keep discussing. We can also transclude that discussion here, if you wish, to keep all things in one page. Sannita - not just another it.wiki sysop 10:59, 10 February 2014 (UTC)[reply]
- Note: If you talk about pseudonyms please also think about abbreviations ("not done"). This property might get useful later, for example if WP wants to cite a journal without using the official long title used by the WD item. --Kolja21 (talk) 18:44, 10 February 2014 (UTC)[reply]
- @Wylve: Exactly because I noticed the discussion at the project chat I thought to open this RfC. Since the discussion has been already archived, I think it's better to have a page that cannot be archived so easily and to keep discussing. We can also transclude that discussion here, if you wish, to keep all things in one page. Sannita - not just another it.wiki sysop 10:59, 10 February 2014 (UTC)[reply]
Contents
- Support We have to switch from item datatype to monolingual datatype for properties dealing with names. Multilingual datatype will be more interesting for some cases like names written in non-latin fonts but even in that case there are 2-5 different ways to write the names. For example Novak Djokovic is the same in French or in English, the difference is only function of the alphabet used to write the name, not of the language. Snipre (talk) 19:42, 10 February 2014 (UTC)[reply]
- Comment The last name of the Serbian tennis player is "Đoković" or "Ђоковић". I don't see how monolingual datatype will solve this ambiguity. --Kolja21 (talk) 20:02, 10 February 2014 (UTC)[reply]
- We need to have the monolingual datatype with the alphabet tag and not with the language tag: then you can have three values for the same statement. With the alphabet there is no unique value. But the advantage is to be able to select the correct value according to the tag of the datatype. Snipre (talk) 10:09, 11 February 2014 (UTC)[reply]
- I'm afraid that the situation is more complex than this. Not all languages use the alphabet system. I think we need to know how the monolingual datatype and the multilingual datatype are going to be implemented from the development team before any substantial discussion on the datatype should continue. --Wylve (talk) 10:57, 11 February 2014 (UTC)[reply]
- My opinion is that given name (P735) and family name (P734) which link to items are different from the other properties which have string or monolingual datatypes. family name (P734) and given name (P735) should link to items for the given names or the surnames and these items should, in general, cover all the different spellings of those names, including any male/female variants. Where there are separate items in one language for various variants of the name then these should link to a more general item covering all the variants for that name. family name (P734) and given name (P735) link to items which are not necessarily spelled exactly the same as the spelling used by the subject.
- Where a person has a 'birth name' and a 'pseudonym' and an 'official name' then family name (P734) and given name (P735) can be used as a qualifier to those names. Filceolaire (talk) 17:56, 11 March 2014 (UTC)[reply]
- Oppose. This is valuable data which, unlike an equivalent Monolingual or Multilingual String-based property, is actually structured data. The name item conflicts, which are usually also interwiki conflicts, need to be resolved whether the properties are deleted or not, so that's not a reason for not using the item datatype. --Yair rand (talk) 03:59, 27 August 2014 (UTC)[reply]
- This data is structured, but unsourced and unreliable outside of their "native" language. To resolve the item and interwiki conflicts is a good plan, but won't settle the problem, that some persons' names are being translated when used in other language while other persons' names are used in the original. At least, the statements should have a qualifier specifying language(s) of its validity (maybe language of work or name (P407), since it has been used as a qualifier for named after (P138) already...), but it will still be a time-bomb, because if anybody changes the label for one language (e.g. the English label for Milosz to Miłosz, Milos or Miloš), hundred's of statements have to be checked and corrected. I doubt anybody will do it...--Shlomo (talk) 13:51, 27 August 2014 (UTC)[reply]
These issues are being raised since given names and surnames are quite similar to dictionary entries. And a relational database (Q192588) like OmegaWiki (Q154436) is certainly the best way to handle relations between 'similar' names. Wikidata is not a relational database (Q192588), and that's why it fails with dictionary-like entries such as Wiktionary pages (in fact, Wiktionary was deliberately excluded from the Development plan).
We should adopt OmegaWiki and use it in such cases, whenever possible. --Ricordisamoa 00:17, 11 February 2014 (UTC)[reply]
- I don't understand, and I don't agree. Wikidata has relations, and there is no doubt we can represent OmegaWikis data here. What do you like in OmegaWiki ? we will discuss a wày to do the same thing here. TomT0m (talk) 11:48, 11 February 2014 (UTC)[reply]
Before discussing the most appropriate datatype for the properties given name (P735), family name (P734), P513 (P513), pseudonym (P742), I think it would be useful to clarify what these properties are needed for ? To list all people in wikidata with en:Andrew as a given name or a surname, how does that make sense ? And even if it makes any sense, the list will differ from one language to another according to the spelling used.
- I can't see much need for them myself. I mean, the name (translated into the appropriate language) is already included in the label. Pseudonyms seems most appropriate in the aliases section. Ajraddatz (Talk) 17:06, 16 February 2014 (UTC)[reply]
- Labels, as I see it, is only for the identification of the entity an item is describing. You also cannot differentiate the family name and the given name of an individual by just using the label (machines aren't that smart). --Wylve (talk) 19:06, 16 February 2014 (UTC)[reply]
- I fully understand the restrictions with labels. Yet, what do we need the properties for ? What useful task could machines possibly do with given names and family names ? As far as i am concerned, I cannot think of one relevant use of these properties. --Casper Tinan (talk) 19:34, 16 February 2014 (UTC)[reply]
- Say, I want to query the number of people born with the name Andrew in the English language, in England between 1850 to 1990. This property will allow us to query just that. We can also find out the number of composers with the surname "Mozart". How properties could be used will not be fully realised until they are created and applied into clients both within Wikimedia and out. --Wylve (talk) 06:53, 17 February 2014 (UTC)[reply]
- I don't see much need for given name (P735), family name (P734) and P513 (P513), but I do use pseudonym (P742) especially as a qualifier. Example: Vanadisvägen (Q10712557) named after (P138) => Freyja (Q1647325) qualifier pseudonym (P742) => "Vanadis".
- I would also need a property "old name" or "previous name" for places that have changed names. Example: Globen metro station (Q1531757) have had three different names since it was built. This is reflected by the named after (P138) property, but it is quite silly that is is defined what it was named after, but not what the name was. /ℇsquilo 10:07, 10 March 2014 (UTC)[reply]
- @Esquilo: I think no new property is needed for this. You can use the property name with qualifiers for start date and end date to achieve the same objective and rank the last name as preferred. Casper Tinan (talk) 11:37, 10 March 2014 (UTC)[reply]
- There is no property called "name". /ℇsquilo 08:58, 11 March 2014 (UTC)[reply]
- But there is a proposed property 'official name' which is awaiting monolingual datatype. This can be used for three names of the metro station, with date qualifiers. Filceolaire (talk) 17:32, 11 March 2014 (UTC)[reply]
- Ok, I'll hold on and wait for the multilingual (I assume) datatype. /ℇsquilo 14:52, 18 March 2014 (UTC)[reply]
- Monolingual. The whole point is that it will show the official name of the item even if it is not in your language. If you have official names in three languages then 'official name' can have three values, each with references. If someone gets a 'Translation' or 'Transliteration' property approved, with multilingual datatype, then that an be used as a qualifiers, with translations by users. Filceolaire (talk) 20:42, 18 March 2014 (UTC)[reply]
- Ok, I'll hold on and wait for the multilingual (I assume) datatype. /ℇsquilo 14:52, 18 March 2014 (UTC)[reply]
- But there is a proposed property 'official name' which is awaiting monolingual datatype. This can be used for three names of the metro station, with date qualifiers. Filceolaire (talk) 17:32, 11 March 2014 (UTC)[reply]
- There is no property called "name". /ℇsquilo 08:58, 11 March 2014 (UTC)[reply]
- @Esquilo: I think no new property is needed for this. You can use the property name with qualifiers for start date and end date to achieve the same objective and rank the last name as preferred. Casper Tinan (talk) 11:37, 10 March 2014 (UTC)[reply]
- Say, I want to query the number of people born with the name Andrew in the English language, in England between 1850 to 1990. This property will allow us to query just that. We can also find out the number of composers with the surname "Mozart". How properties could be used will not be fully realised until they are created and applied into clients both within Wikimedia and out. --Wylve (talk) 06:53, 17 February 2014 (UTC)[reply]
- I fully understand the restrictions with labels. Yet, what do we need the properties for ? What useful task could machines possibly do with given names and family names ? As far as i am concerned, I cannot think of one relevant use of these properties. --Casper Tinan (talk) 19:34, 16 February 2014 (UTC)[reply]
- Labels, as I see it, is only for the identification of the entity an item is describing. You also cannot differentiate the family name and the given name of an individual by just using the label (machines aren't that smart). --Wylve (talk) 19:06, 16 February 2014 (UTC)[reply]
Now that we have the monolingual datatype I think we should take a new look to how to enter names, surnames, nicknames, and all thousand of variations, combinations, uses, etc. It is very complex issue that would require hundreds of properties. In my opinion a better alternative would be to have a few basic properties that come often and a general monolingual "naming" property (perhaps "name"? or "human identifier"? or plainly "identifier"?), with a qualifier "type of identifier" to specify what it is. What do you think about it? --Micru (talk) 08:30, 20 August 2014 (UTC)[reply]
- @Micru: I think a few examples of what you have in mind would help to understand the scope of this discussion. TomT0m (talk) 14:30, 20 August 2014 (UTC)[reply]
- @TomT0m: Some examples , we would need to create some items for the type of name:
- ⟨ Georgios Grivas-Digenis (Q712817) ⟩ name (P2561) ⟨ Διγενής (Greek) ⟩
type of name Search ⟨ nome de guerre ⟩ - ⟨ Joan Maragall (Q562402) ⟩ name (P2561) ⟨ Maragall i Gorina (Catalan) ⟩
type of name Search ⟨ surname with conjunction ⟩
- And also for transliterations, historical names of places, official names, etc.
- ⟨ official names of India (Q3248913) ⟩ name (P2561) ⟨ Bhārtiya Gantāntrā (Gujarati) ⟩
type of name Search ⟨ transliteration of Gujarati ⟩ - ⟨ official names of India (Q3248913) ⟩ name (P2561) ⟨ ભારત (Gujarati) ⟩
type of name Search ⟨ official short name ⟩
- Pinging @Filceolaire, Sannita, Zolo:--Micru (talk) 15:14, 20 August 2014 (UTC)[reply]
- Conversation moved from the Project Chat.--Micru (talk) 12:40, 21 August 2014 (UTC)[reply]
- Look like this would be equivalent to a "name" item with a(n) instance of (P31) statement. Don't know why but it seems a little weird put like that. Just a feeling though. Seems there is too much name in the property/qualifier name. Also I think we need insight of what is called a name here. It's text after all, but a sound might also do the trick. Actually this could plebd for items for some names equally, with monolingual properties like textual representation, prononciation and so on. Or properties like . Just thoughts. TomT0m (talk) 20:55, 21 August 2014 (UTC)[reply]
- @TomT0m: Some examples , we would need to create some items for the type of name:
The question is simple: do we want to delete all items with an item datatype dealing with names and to remplace them by the new monolingual datatype ? Snipre (talk) 18:20, 21 August 2014 (UTC)[reply]
- The answer is also simple (at least for me): the name properties with item datatype are bad, useless and shouldn't have been created. They should be deleted ASAP, before they start being used by Wikipedias/Wikisources etc. and getting bad reputation among the communities for the whole Wikidata project. Of course we can have a legitimate discussion whether to use the monolingual or multilingual text datatype instead and how to use it; anyway we'll have to start from zero again.--Shlomo (talk) 19:53, 21 August 2014 (UTC)[reply]
- mmm that is not possible, there is articles about names. TomT0m (talk) 20:42, 21 August 2014 (UTC)[reply]
- Why not? I'm not calling for deleting the articles, not even the items. Just the properties responsible for curious statements like "First name of Joachim Löw is Jáchym (or Gioacchino, Хоакин - depends on settings...)--Shlomo (talk) 22:18, 21 August 2014 (UTC)[reply]
- That make sense to have those, I can't read Kanjis and appreciate a translation. TomT0m (talk) 09:11, 22 August 2014 (UTC)[reply]
- I agree! A better way of doing it could be like this:
- <Joachim Löw> name <Joachim Löw (DE)>
- instance of <birth name>
- given name <Joachim (Q4926961)>
- surname <Löw (Q1879904)>
- <Joachim Löw> name <Joachim Löw (DE)>
- That way we would have it all condensed in a statement, pointing to the several linguistic components of the string (when they exist). Of course the best would be to have all these qualifiers in the label itself, but at the moment that option is not there, so...--Micru (talk) 09:13, 22 August 2014 (UTC)[reply]
- @Micru: ??? Sorry, you don't seem to agree with me, and I see only one language condensed in your statement. Can't make sense out of this comment :) TomT0m (talk) 09:34, 22 August 2014 (UTC)[reply]
- @TomT0m: I was agreeing with Shlomo, not with you, but then there was an edit conflict :) Those properties only make sense for the language the name is written. If you had a different language then it would be "instance of <transliteration>" and then "given name" would (or should) point to a different linguistic form. --Micru (talk) 09:45, 22 August 2014 (UTC)[reply]
- @Micru: then it would be great if you used indentation like that:
- :msg1
::first repl to msg1
:
::second repl to msg1 - Apart from that I'm not sure I understand your idea. TomT0m (talk) 09:57, 22 August 2014 (UTC)[reply]
- @TomT0m: I was agreeing with Shlomo, not with you, but then there was an edit conflict :) Those properties only make sense for the language the name is written. If you had a different language then it would be "instance of <transliteration>" and then "given name" would (or should) point to a different linguistic form. --Micru (talk) 09:45, 22 August 2014 (UTC)[reply]
- @Micru: ??? Sorry, you don't seem to agree with me, and I see only one language condensed in your statement. Can't make sense out of this comment :) TomT0m (talk) 09:34, 22 August 2014 (UTC)[reply]
- I agree! A better way of doing it could be like this:
- That make sense to have those, I can't read Kanjis and appreciate a translation. TomT0m (talk) 09:11, 22 August 2014 (UTC)[reply]
- Why not? I'm not calling for deleting the articles, not even the items. Just the properties responsible for curious statements like "First name of Joachim Löw is Jáchym (or Gioacchino, Хоакин - depends on settings...)--Shlomo (talk) 22:18, 21 August 2014 (UTC)[reply]
- mmm that is not possible, there is articles about names. TomT0m (talk) 20:42, 21 August 2014 (UTC)[reply]
┌────────────────────────────────────────────────────────────────────────────────────────────────────┘@TomT0m: Sure. And ok, I will try again from the beginning:
- <Joachim Löw> name <Joachim Löw (DE)>
- instance of <birth name>
- given name <Joachim (Q4926961)>
- surname <Löw (Q1879904)>
- <Joachim Löw> name <योआखिम ल्योव (MR)>
- instance of <transliteration>
- <Joachim Löw> name <Йоахім Лев (UK)>
- instance of <Ukrainian National transliteration (Q17039352)>
- given name <Йоахім>
Some articles have many languages that is why I say that it would be better to have these qualifiers directly on the label itself (on top of the item page), perhaps as a sort of badges. For "mottos", as LaddΩ was asking in the project chat, the principle would be the same:
- <Nunavut> motto <ᓄᓇᕗᑦ ᓴᙱᓂᕗᑦ (Inuktitut)>
- <Nunavut> motto <Nunavut Sannginivut (Inuktitut)>
- instance of: <Latin script transliteration>
- <Nunavut> motto <Nunavut our strength (English)>
- instance of: <translation>
And the icing of the cake would be if instead of a string we could enter also links to item pages (like [[Q68027]]), because how else to link <India> with Satyameva Jayate (Q680277)?--Micru (talk) 11:28, 22 August 2014 (UTC)[reply]
- @Micru: Not sure I like this. Transliteration applies to another name, we lose it here. Imagine we want to transliterate a mome de guere this would imply to put a transliteration of mome de guere instance of. I don't think it's a godd idea of trying to put all information in statements. Conceptually, given the intended initial meaning of qualifiers, I would translate such a statement [[«the warrior» has for name «a transliteration of one of its warname»] the previous snacks is a transliteration of a warname giving snack ], do not forget that instance of (P31) intended meaning is to link an object to a set of objects of the same kind. And the link to the transliterated name is lost. TomT0m (talk) 11:50, 22 August 2014 (UTC)[reply]
- @TomT0m: What is your preferred choice? If we had a more sleek method for creating items, I wouldn't mind using the datatype item, but with the current workflow it takes too much effort.--Micru (talk) 12:35, 22 August 2014 (UTC)[reply]
- @Micru: Not sure I like this. Transliteration applies to another name, we lose it here. Imagine we want to transliterate a mome de guere this would imply to put a transliteration of mome de guere instance of. I don't think it's a godd idea of trying to put all information in statements. Conceptually, given the intended initial meaning of qualifiers, I would translate such a statement [[«the warrior» has for name «a transliteration of one of its warname»] the previous snacks is a transliteration of a warname giving snack ], do not forget that instance of (P31) intended meaning is to link an object to a set of objects of the same kind. And the link to the transliterated name is lost. TomT0m (talk) 11:50, 22 August 2014 (UTC)[reply]