Wikidata:Bot requests/Italian Wikipedia person data
Jump to navigation
Jump to search
Italian Wikipedia person data[edit]
- General discussion: Wikidata:Project chat#Italian person data (now archived -> consensus found); original location User_talk:Legobot/properties.js#Italian_person_data
- Category: w:it:Categoria:BioBot (200+ thousands people, cf. [1])
- Properties: w:it:Template:Bio#Tabella_completa, to be fetched from template usage (not everything translated to categories)
- Examples: name and gender (mandatory), surname, place/date/year of birth/death, one out of 552 defined jobs
- More details will follow, are you also interested in a mapping for the jobs? There's a lot of them and I doubt any other wiki has them in a structured format. --Nemo 19:27, 17 February 2013 (UTC)
- I will work on implementing template parsing hopefully by this weekend.
- Yes jobs will be great! We can utilize Property:P101 and Property:P106. I think the best way to map this out would be just a list of *[[:it:Politico]] -> [[Q82955]] or something. Legoktm (talk) 19:51, 17 February 2013 (UTC)
- That would be a smart way but I don't know well enough how properties should be fed here (if with a list or what else), I've asked others to join the discussion here and I hope someone will...
Attività
is the main occupation;Attività2
, ..., andAttivitàAltre
the additional ones; they only include the main occupations of the subject, because categorising people by secondary occupations (as en.wiki does) is strictly forbidden on it.wiki. --Nemo 08:49, 22 February 2013 (UTC)- The conversion table is at User:ValterVB/Sandbox; where no entity is defined, one for the it.wiki article in question should be created/used. --Nemo 08:58, 23 August 2013 (UTC)
- That would be a smart way but I don't know well enough how properties should be fed here (if with a list or what else), I've asked others to join the discussion here and I hope someone will...
- Done, example. Map between Italian values and Wikidata items done by Ladsgroup, Nemo checked the equivalencies between Italian names.
- The task is finished now Amir (talk) 13:39, 28 July 2014 (UTC)
- Great, I'll see if we can start using the data. --Nemo 11:16, 29 July 2014 (UTC)
- The task is finished now Amir (talk) 13:39, 28 July 2014 (UTC)
- Yes jobs will be great! We can utilize Property:P101 and Property:P106. I think the best way to map this out would be just a list of *[[:it:Politico]] -> [[Q82955]] or something. Legoktm (talk) 19:51, 17 February 2013 (UTC)
Sesso
: P21 -> 6581097 if M, 6581072 if F (note that this is used only for grammatical purposes so "intersex" is not used; in non-trivial cases, it may reflect the policy here on Wikidata or not)- Done
LuogoNascita
(butLuogoNascitaLink
should prevail if available): P19 -> the entry for the item corresponding to the page with that title- Done
LuogoNascitaAlt
: same as above, for complex cases with alternatives; maybe a secondary statement for P19? no other property is availableNoteNascita
: pull sources for theNascita
statements from the ref tags in here.LuogoMorte
,LuogoMorteLink
andLuogoMorteAlt
,NoteMorte
: same as above but for P20- Done
Nazionalità
: P27 -> linked countryNazionalitàNaturalizzato
: additional statement to P27- Done for countries instances of a subclass of state (Q7275) except a few, see list of articles not imported yet and breakdown by their value.
- Info See the map from adjectives to countries. The local information is based on current sources.
Except 4 entities to sync, all the values used are compatible with this property. See further discussion.
PostNazionalità
: this field may contain sources for any of the previous statements (more general ones could also be right after the end of the template or inFineIncipit
).FineIncipit
: replaces standard occupation etc., maybe add to item description?Immagine
: P18 -> image with this name (check if it's on Commons; over 35k usages)- For each statement: add as reference the Property:P143 with value Q11920, example cat (update: as discussed at project chat).
- First name (
Nome
): P735- Done where it equals an it.wiki article and hence entity.
- Last name (
Cognome
): P734- Done (same); Info below on disambiguation pages,
transliteration
- Done (same); Info below on disambiguation pages,
- Day and month of birth (
GiornoMeseNascita
) + Year of birth (AnnoNascita
): P569 - Day and month of death (
GiornoMeseMorte
) + Year of death (AnnoMorte
): P570- Do not add a date in contrast with a Integrated Authority File (Q36578) statement if available.
- Done in part by Dexbot, dates after 1920. Info ViscoBot had started but stopped long ago.
- Question I also wrote the code to import dates of birth and death but I'm not running it yet because there is one important question: What is the colander model you use as date of birth and death? in some places Gregorian wasn't common until 1912 so I can't add these dates before 1912 because the bot can't be sure about calender model of these dates – The preceding unsigned comment was added by Ladsgroup (talk • contribs).
- We're verifying, I'll let you know the final outcome. Past discussions seem to have all agreed on forcing gregorian calendar in the template, with the option to indicate giulian calendar next to it with a warning. --Nemo 13:27, 27 April 2014 (UTC)
- Title to be used before name, or after it in some languages other than Italian (
Titolo
): P511 (about 3k usages) - Missing properties:
- Unrecognized citizenship (peoples without state), e.g. Kurds (
Cittadinanza
) - Free text notes on dates of birth/death (
NoteNascita
,NoteMorte
): some sources could be extracted from here. Example content is very varied but in 55 % of cases contains an URL, that could be imported as source.
- Unrecognized citizenship (peoples without state), e.g. Kurds (
- This should be it. --Nemo 08:49, 22 February 2013 (UTC)
- Info Local usage of the data has started and expansion is being discussed.
- A proposal on sourcing for Wikidata was moved to Wikidata:Project chat#Proposal: preventive control of imported data correctness
- As far as edit summaries go, the bot actually does send proper edit summaries, in the format of
Bot: Setting [[Property:{pid}|{pid}]] to [[{target_qid}]]; using [[:{lang}:{source}]]; requested by [[User talk:{user}|{user}]]
, it's just that the software doesn't support them yet. It may be worth putting this run on hold until the software does support custom summaries. - I do believe that at this point, we may need to look how to properly source these claims, since they are no longer "obvious". Maybe that should be a discussion on Project chat? I believe there are legitimate concerns before this request can go forward, as well as code that I need to work on. Legoktm (talk) 01:23, 23 February 2013 (UTC)
- If edit summaries are a problem,
we could just use a different username for the bot, like "Italian Wikipedia person data import bot". - What fields are no longer obvious, specifically? Surely place of birth is more "obvious" and less controversial than gender, for instance. I think it makes sense to start only with the "obvious" ones: it seems to me that most worries are about nationalist controversies, so probably those are the only fields to exclude in the first run? Otherwise, sources exist of course, you could pull them at the same time if people feel it can't be done later. --Nemo 08:50, 23 February 2013 (UTC)
- Ping. I have updated the data above, it seems to me that we no longer have anything to wait for? Were the easy parts like gender done already? --Nemo 08:58, 23 August 2013 (UTC)
- If edit summaries are a problem,
- As far as edit summaries go, the bot actually does send proper edit summaries, in the format of
- A proposal on sourcing for Wikidata was moved to Wikidata:Project chat#Proposal: preventive control of imported data correctness
- there is another problem about First name and last name. see w:it:'Abd Allah al-Wafi, the first and last name has to be in Arabic not latin script Amir (talk) 13:47, 27 April 2014 (UTC)
- The transliteration tables we use are at w:it:Categoria:Aiuto lingue straniere. The original full name is within one of the language templates, usually in the
PostCognome
parameter, so you could usually extract it from there or skip the articles which contain one of those templates. Isn't there a property for transliterated names? --Nemo 14:08, 27 April 2014 (UTC)- Amir, let me know if there's something we can help with to get the bot running again, as fast as it was or faster. :) I keep updating this page and there are no blockers anywhere AFAICS. --Nemo 05:04, 30 April 2014 (UTC)
- Nemo Sorry for delay, I started it again, I stopped it in order to fix concerns raised by John in the mailing list and once you verify using the Gregorian date I'll start it on dates too and about P735 because It's using items as datatype it won't be problem but I checked it:Marc Jacobs and it:Alan Turing and both had problem because data item of it:Marc and it:Jacobs are disambiguation and not a given name (see Property talk:P735) and it:Turing is redirected to the article of Alan Turing. so I'm pretty much confused how we can harvest these data Amir (talk) 07:00, 30 April 2014 (UTC)
- Amir, thanks! So, as long as this property uses items there can't be a way to meaningfully store all names for all languages. We don't have articles on all variants of a name in all languages but we still have thousands given names. When a given name X is a disambiguation page, use "X (nome)" instead; when that doesn't exist either, just skip. I see that ValterVBot and BetaBot previously added labels for all Italian names, so we should be able to import names at least for the 70k or so Italian people with bio. --Nemo 10:45, 30 April 2014 (UTC)
- @Nemo_bis:: Thank you for sharing this with me, based on your talks I used a way to import, so is this okay? and if this seems okay to you I'll start on all of people. BTW: If you use WIDAR or other tools and import all of names in the category you said, the bot will do this task better but even now I can do it very good 14:23, 30 April 2014 (UTC)
- Amir, yes, that's ok for us! I checked again with toollabs:wikidata-todo/creator.html, there were only 4 names missing. --Nemo 17:55, 30 April 2014 (UTC)
- @Nemo_bis:: Thank you for sharing this with me, based on your talks I used a way to import, so is this okay? and if this seems okay to you I'll start on all of people. BTW: If you use WIDAR or other tools and import all of names in the category you said, the bot will do this task better but even now I can do it very good 14:23, 30 April 2014 (UTC)
- Amir, thanks! So, as long as this property uses items there can't be a way to meaningfully store all names for all languages. We don't have articles on all variants of a name in all languages but we still have thousands given names. When a given name X is a disambiguation page, use "X (nome)" instead; when that doesn't exist either, just skip. I see that ValterVBot and BetaBot previously added labels for all Italian names, so we should be able to import names at least for the 70k or so Italian people with bio. --Nemo 10:45, 30 April 2014 (UTC)
- Nemo Sorry for delay, I started it again, I stopped it in order to fix concerns raised by John in the mailing list and once you verify using the Gregorian date I'll start it on dates too and about P735 because It's using items as datatype it won't be problem but I checked it:Marc Jacobs and it:Alan Turing and both had problem because data item of it:Marc and it:Jacobs are disambiguation and not a given name (see Property talk:P735) and it:Turing is redirected to the article of Alan Turing. so I'm pretty much confused how we can harvest these data Amir (talk) 07:00, 30 April 2014 (UTC)
- Amir, let me know if there's something we can help with to get the bot running again, as fast as it was or faster. :) I keep updating this page and there are no blockers anywhere AFAICS. --Nemo 05:04, 30 April 2014 (UTC)
- The transliteration tables we use are at w:it:Categoria:Aiuto lingue straniere. The original full name is within one of the language templates, usually in the
- And we see results! :) http://ultimategerardm.blogspot.com/2015/02/where-people-died-perspective-on.html --Nemo 07:13, 20 February 2015 (UTC)
- I hear a new run was made, which found many new values to import.[http://ultimategerardm.blogspot.nl/2015/03/the-italian-job-perspective-on-bias.html Probably User:Rotpunkt's cleanups of the bio data in it.wiki helped, the local data gets better and better. :) --Nemo 10:06, 10 March 2015 (UTC)