Wikidata:Bot requests/Archive/2015/07

From Wikidata
Jump to navigation Jump to search

Remove "(disambiguation)" from English labels

Per autolist (slow) there are over 255k items that have a label that contains "(disambiguation)". However those should not be added to label so I'm asking someone to delete those disambiguations. --Stryn (talk) 15:09, 2 July 2015 (UTC)


There are also quite a lot of items about people that have the disambiguator imported from Wikipedia articles. These should be fixed as well. --- Jura 16:24, 2 July 2015 (UTC)
Yeah, but that's not that easy. Often, the part in the parentheses often belongs to the title (e.g. song titles, places, ...), so it would be wrong to remove them all (some bots did not import them in the first place. This is why many labels now actually are missing some part in parentheses). For specific labels like "(disambiguation)", especially if they can be cross-check with an "is a disambiguation page" statement, automatic removal would be a good thing, though. --YMS (talk) 16:30, 2 July 2015 (UTC)
I can't follow: What's the link between "items about people" and song titles? --- Jura 16:35, 2 July 2015 (UTC)
Perhaps what makes it obvious is an example or two which are false positives for a simple "has parentheses with a phrase inside" or a more complex of "has such a phrase at the end of the title", as in en:Benzo(a)pyrene and en:V (The Final Battle). --Izno (talk) 17:02, 2 July 2015 (UTC)
Neither should have P31:Q5. If you have some others, maybe yes. --- Jura 17:04, 2 July 2015 (UTC)
BTW, here are 2,124 items that need fixing: [1] (takes 5 minutes to run). --- Jura 14:04, 4 July 2015 (UTC)
I have seen villages with names like "Adam, Eve and Cain" be shortend to only "Adam". Also items about psalms with commas in the titles has become shorter than desired. There is also some Swedish villages named "Meatballs (eastern part)" by Statistics Sweden. I am not sure if we have any good practise for those yet.
But when we finally have added name-properties to these items, it maybe does not matter if the labels sometimes are inconsistant. -- Innocent bystander (talk) 14:27, 4 July 2015 (UTC)
Have a look at the autolist. If there are any items that shouldn't be there, please tell us.
If you are looking for labels that have stuff cut-off, that shouldn't have been, try DNB entries @ WS (sample: Q19048544). --- Jura 14:33, 4 July 2015 (UTC)
Do we have any good practise for the Wikisource-items yet? -- Innocent bystander (talk) 14:41, 4 July 2015 (UTC)
We have Johannes who did some work on Pauly's?
For some of the other resources, it's currently easier to link and search entries that are mirrored elsewhere than those at Wikisource. But maybe it's just me who doesn't like P1343. Once the structure is fixed, it should be fairly easy to standardize all DNB entry labels and descriptions. --- Jura 14:58, 4 July 2015 (UTC)
initial request is done. For the discussion about Wikisource-items please use a more suitable talk page. --Pasleim (talk) 14:27, 26 July 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 14:27, 26 July 2015 (UTC)

Connecting it.wiki's categories

There are a few hundreds categorie on it.wiki without a Wikidata item that could be linked to a category on fr.wiki (and possibily other wikis). I wonder if a bot could:

  • connect each "Categoria:Film giapponesi del <year>" to frwiki's "Catégorie:Film japonais sorti en <year>"
  • connect each "Categoria:Film portoghesi del <year>" to frwiki's "Catégorie:Film portugais sorti en <year>"
  • connect each "Categoria:Film brasiliani del <year>" to frwiki's "Catégorie:Film brésilien sorti en <year>"
  • connect each "Categoria:Film polacchi del <year>" to frwiki's "Catégorie:Film polonais sorti en <year>"
  • connect each "Categoria:Film sovietici del <year>" to frwiki's "Catégorie:Film soviétique sorti en <year>"
  • connect each "Categoria:Film svizzeri del <year>" to frwiki's "Catégorie:Film suisse sorti en <year>"
  • connect each "Categoria:Film hongkonghesi del <year>" to frwiki's "Catégorie:Film hongkongais sorti en <year>"
  • connect each "Categoria:Film messicani del <year>" to frwiki's "Catégorie:Film mexicain sorti en <year>"
  • connect each "Categoria:Film senegalesi del <year>" to frwiki's "Catégorie:Film sénégalais sorti en <year>"
  • connect each "Categoria:Film turchi del <year>" to frwiki's "Catégorie:Film turc sorti en <year>"
  • connect each "Categoria:Film israeliani del <year>" to frwiki's "Catégorie:Film israélien sorti en <year>"
  • connect each "Categoria:Film tunisini del <year>" to frwiki's "Catégorie:Film tunisien sorti en <year>"

obviously also adding the proper label. There may be some cases where the category on itwiki is already linked, but I believe that if it is not then the category has no Wikidata item (or frwiki's category does not exist; in both cases, there should be no need for merges). Thanks in advance--Dr Zimbu (talk) 11:50, 2 July 2015 (UTC)

In progress. Popcorndude (talk) 01:09, 26 July 2015 (UTC)
@Dr Zimbu: ✓ Done
This section was archived on a request by: --Pasleim (talk) 13:10, 28 July 2015 (UTC)

military rank (P410): Add gender to zhwiki items

They are likely all not female. If one wants to check: Autolist (1,532 items).

They are currently on Wikidata:Database_reports/Constraint_violations/P410#.22Item_sex_or_gender_.28P21.29.22_violations. --- Jura 07:29, 6 July 2015 (UTC)

I've done. Let some Chinese-speaking users check the correctness. --Infovarius (talk) 21:27, 12 July 2015 (UTC)
Hmm .. I would have checked before. --- Jura 21:33, 12 July 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 13:40, 28 July 2015 (UTC)

Twitter and Instagram property migration

Values of website account on (P553) for X (Q918) and Instagram (Q209330) need to be migrated to X username (P2002) and Instagram username (P2003) respectively. For bonus points, the date of joining Twitter may be fetched via its API (rate limits may apply) and be added as a start time (P580) qualifier. Properties for other such sites are currently proposed; the task may need to be re-run for them, once they are created. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:37, 23 July 2015 (UTC)

Also Facebook username (P2013). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:16, 30 July 2015 (UTC)

It seems that User:Pasleim's User:PLbot has Twitter names in Hand. Will it also be doing Instagram and Facebook? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:30, 31 July 2015 (UTC)

Instagram was done last night, Facebook will follow soon. --Pasleim (talk) 12:38, 31 July 2015 (UTC)
@Pasleim: That's great, thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:54, 31 July 2015 (UTC)
done --Pasleim (talk) 16:11, 31 July 2015 (UTC)
This section was archived on a request by: Sjoerd de Bruin (talk) 18:20, 3 August 2015 (UTC)

Aliases for baronets

The naming convention on enwiki is for articles on baronets to be titled Sir Somebody Somebody, Nth Baronet (sometimes Nth Baronet of Wherever). As a result, most wikidata items now have this title, which can be confusing for someone looking for Somebody Somebody and not realising we have an item. Would it be possible to have a bot add corresponding aliases (in English, at least) to each page that matches this pattern?

Alternatively, using Somebody Somebody as the main title and moving the baronet form to an alias would work as well. Andrew Gray (talk) 19:41, 29 July 2015 (UTC)

✓ Done --Pasleim (talk) 20:32, 5 August 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 20:32, 5 August 2015 (UTC)

Remove disambiguator from P31:Q5 items

As mentioned, there are about 2,100 items with labels such as "Mark Perry (English footballer)": [2] (takes up to 5 minutes to run). --- Jura 04:53, 30 July 2015 (UTC)

✓ Done There are 160 items left which can't be done by bot because there are other items with the same label and description. --Pasleim (talk) 20:41, 5 August 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 20:41, 5 August 2015 (UTC)

Ladino Wikipedia

Hi. I am an administrator on Ladino Wikipedia (lad:). I have an enormous number of items without Wikidata links. Some fall into pretty neat categories, so I'd like to start making some requests for help here. (On Ladino Wikipedia, our Template space is called "Xablón", while our Category space is called "Kateggoría".)

  • First request: Babel templates and categories. As you would expect, these have the format of:
    • Xablón:User langcode[-level code]
    • Kateggoría: User langcode[-level code]
where langcode is the two- or three-letter language code and level code is a digit between 0 and 5 for templates, 1 and 5 for categories. Level code is sometimes missing.

Thank you. StevenJ81 (talk) 13:38, 29 July 2015 (UTC)

@StevenJ81: Are you familiar with the Babel extension (see meta:User language)? If you can convince the Ladino users to use that (it's already enabled on all Wikimedia projects), you don't need to have local templates, except for non-standard codes.
I've done all of the templates except for lad:Xablón:User kl, all of the categories appeared to already be linked. Some notes:
As far as I can tell, none of those are being used, so you could easily rename or delete them.
- Nikki (talk) 07:02, 31 July 2015 (UTC)
Oh and lad:Xablón:User en-n looks like a broken version of lad:Xablón:User en and lad:Xablón:User sh-2/w/w/w/index.php looks like some sort of mistake. I haven't linked either of those either. - Nikki (talk) 07:09, 31 July 2015 (UTC)
Thank you very much, @Nikki.
The problem with "convincing" everyone is that the wiki is not an extremely active one. It would not surprise me if many of the current user pages belong to people who don't come to ladwiki very often (if at all, any more). Getting in touch with them at all will be challenging. The only other option would be for me to import a bot to convert user pages myself, and I'm not so keen to do that. I'm reluctant to change things where there is not a problem, know what I mean?
I will deal with the issues you outlined. Again, thanks for your help. There are some others I will get at shortly (like {{=}} and similar). StevenJ81 (talk) 12:16, 31 July 2015 (UTC)
@Nikki: I have taken care of all of these things:
  1. ASL templates and categories all moved to ase
  2. lk templates and categories all moved to lkt
  3. ma templates and categories all moved to arz (I'm glad you knew this; I would have had no clue)
  4. en-me templates and categories all moved to enm
All of the above were dropped in by a bot about nine years ago. I chose to fix them, rather than delete them, so if you can run a bot to connect them correctly at this point I'd appreciate it.
I deleted Category:User ko-M and Template:User sh-2/w/w/w/index.php, and turned Template:User en-n into a redirect.
The one disadvantage about using the Babel extension, by the way, is that it does not automatically put people into user language categories. (Or, at least, it doesn't on ladwiki.) For me, as a non-Ladino-speaking administrator on ladwiki, those categories are helpful. StevenJ81 (talk) 15:15, 31 July 2015 (UTC)
@StevenJ81: No problem, linking user language templates/categories is one of the things I've been working on, so I'm glad I could help. :) Moving pages automatically updates the linked Wikidata items, so nothing needs to happen there. Regarding the Babel extension, it can put users in categories, so it should just be a matter of asking for the configuration to be changed (probably over at Phabricator). - Nikki (talk) 16:08, 31 July 2015 (UTC)
I'm going to mark as ✓ Done. I'll be back in touch later if I need more help. Thanks. StevenJ81 (talk) 16:51, 6 August 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 21:52, 19 August 2015 (UTC)

Update category labels

Could a bot adjust the labels of category items to their respective sitelinks? There are a lot of old labels out there. Sjoerd de Bruin (talk) 14:09, 1 July 2015 (UTC)

I work on category periodically, have you some example? --ValterVB (talk) 18:53, 1 July 2015 (UTC)
Someone left a message on my talk page about Category:Compositions by Hubert Parry (Q8406409). He didn't understand the situation on Wikidata because the label of the item was still "Categorie:Compositie van Parry" instead of "Categorie:Compositie van Hubert Parry". I think category labels should always be the same as the sitelinks. Sjoerd de Bruin (talk) 19:07, 1 July 2015 (UTC)
done for the main languages. Will repeat it weekly. --Pasleim (talk) 20:56, 30 July 2015 (UTC)
Strong oppose. If the categories of the different sitelinks don't mean the same, then your bot may destroy correct translations. 91.9.109.131 00:02, 31 July 2015 (UTC)

If you adjust labels to sitelinks, e.g. for this set of Q9535840:

enwiki Category:National Assembly (Nigeria)
ptwiki Categoria:Assembleia Nacional da Nigéria

you will create Wikidata internal label inconsistency. And this is just an example, where the meaning still can be guessed.

And already messy is "category's main topic : National Assembly" - National Assembly is a meaningless nonsense label.

Lucky those, that have the interface in French which has "Assemblée nationale du Nigeria". But they have to fear a bot walks around and changes the fr-label to "Assemblée nationale" based on frwiki-sitelink "Assemblée nationale (Nigeria)".

Wikidata is not Wikipedia. sitelink != label. 91.9.109.131 02:22, 31 July 2015 (UTC)

Another example, where the meaning would be hidden from de-label, if de-label=de-sitelink:

dewiki Kategorie:Moravske Toplice 
slwiki Kategorija:Občina Moravske Toplice

No guessing if wikidata, does it more informative than German Wikipedia:

dewiki Kategorie:Gemeinde Moravske Toplice
slwiki Kategorija:Občina Moravske Toplice

91.9.109.131 02:51, 31 July 2015 (UTC)

Dear IP (Tobias Conradi), I know that Wikidata and Wikipedia is not same, however, this bot request is only about Wikimedia categories which only exist in the Wikimedia universe and nowehere else. Inventing new labels in Wikidata for these categories (e.g. Category:National Assembly of Nigeria, Kategorie:Gemeinde Moravske Toplice) to remove "internal label inconsistencies" doesn't help at all. I will continue with updating the labels except another user will comment on this. --Pasleim (talk) 12:42, 31 July 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 22:47, 16 September 2015 (UTC)

Persondata at Ukrainian Wikipedia

Hello. We would like to withdraw from the Ukrainian Wikipedia outdated template w:uk:Template:Persondata. Before that, all his data should be transferred to wikidata. In the English-language versions of documentation are options if you have any questions - feel free to ask. We would appreciate your help. --Максим Підліснюк (talk) 19:08, 28 July 2015 (UTC)

I want to verify, because that did not translate the best to English:
  • You want to deprecate uk.Persondata.
  • You want to verify that uk.Persondata is on Wikidata.
  • You want to add data from uk.Persondata that is not on Wikidata.
Is this all correct? Where is there local consensus on uk.wikipedia to deprecate the template? --Izno (talk) 20:13, 28 July 2015 (UTC)
You are right. Local consensus so far is not necessary because the transfer of information on Wikidata - sometimes forced process that does not require consensus. Removing the template will be undertaken after discussion community. --Максим Підліснюк (talk) 20:24, 28 July 2015 (UTC)
  • It's much simplier just to remove the template from ukwiki. I doubt it has any actual information missing from Wikidata or from local infoboxes. But still i have some framework for comparing such templates with wikidata that can be used. But you will need Wikidata community consensus to batch fill Wikidata with persondata template from ukwiki. -- Vlsergey (talk) 21:11, 28 July 2015 (UTC)

@Максим Підліснюк: All birth and death dates from w:uk:Template:Persondata are imported to Wikidata. --Pasleim (talk) 20:57, 27 August 2015 (UTC)

Birth and death places are imported too. --Pasleim (talk) 21:31, 27 August 2015 (UTC)
For #Import date_of_birth_.28P569.29.2Fdate_of_death_.28P570.29_from_Wikipedia (deaths 2000-2013), I just did an import from uk:Template:Особа. --- Jura 13:32, 7 September 2015 (UTC)
This section was archived on a request by: --Pasleim (talk) 15:16, 16 September 2015 (UTC)

DNB articles on Wikidata

Hello! Could a bot please remove the sitelinks to enwikisource from the items on this list? They are not appropriate as the items are all for persons but the sitelinks to enwikisource are articles from a biographical dictionary which should have their own item. See this discussion for further background. Thanks in advance, Jonathan Groß (talk) 09:41, 27 July 2015 (UTC)

Pinging Charles Matthews, who might want to have a look at this... (but it certainly looks suitable to me) Andrew Gray (talk) 19:37, 29 July 2015 (UTC)
Can we be sure that the wikisource links have been added to "human" articles, rather than "instance of human" being incorrectly added to an article properly created for a DNB article from enWS? As far as I know, both of these things happen. Charles Matthews (talk) 20:15, 29 July 2015 (UTC)
If they have a Oxford Dictionary of National Biography ID (P1415) they should be a "real" person, as they'll have been IDed through the matching. Andrew Gray (talk) 21:25, 29 July 2015 (UTC)

I lost track of this issue during my vacation. I'm still interested in fixing the matter. I found the results of the matching here. But there is still a mess to be cleaned up.

I think we can all agree that adding the DNB articles as sitelinks for person items is not ideal. It would be better to create separate items for the DNB articles and link those with the related person items via described by source (P1343). The DNB items should have the statements instance of (P31) > biographical article (Q19389637) and published in (P1433) > Dictionary of National Biography (Q1210343) (or something similar) as well as main subject (P921) with a link to the respective person.

The question is: How do we go about it? Jonathan Groß (talk) 12:21, 3 September 2015 (UTC)

This section was archived on a request by: --- Jura 19:31, 18 November 2015 (UTC)

100 000 localities of Mexico

Everything in https://sh.wikipedia.org/wiki/Kategorija:Naselja_u_Meksiku,_%C4%8Dlanci_u_za%C4%8Detku that is not in https://sh.wikipedia.org/wiki/Kategorija:Op%C5%A1tine_u_Meksiku should have

  • instance of = locality of Mexico (31:20202352)
  • English label = Spanish label = the part from the shWP article before the ","
  • English description = "in " + the part after ", "
  • Spanish description = "en " + the part after ", "
  • country = Mexico (17:96)

That is a preparation for placing them in the correct municipality and adding INEGI locality ID (P1976) to these items. Eldizzino (talk) 13:11, 7 July 2015 (UTC)

Swedish (sv) label = Spanish label
Swedish (sv) description = "ort i " + as above or "ort i Mexiko" if separator is missing
-- Innocent bystander (talk) 13:39, 7 July 2015 (UTC)
@Eldizzino: The descriptions are looking odd. What about "locality in " + the part after ", " and "localidad en " + the part after ", "? --Pasleim (talk) 09:15, 29 July 2015 (UTC)
@Innocent bystander, Pasleim:
German (de) label = Spanish label
German (de) description = "in " + as above. I would not repeat the type in the description.
91.9.109.131 00:17, 31 July 2015 (UTC)
It's standard to include the type in the description or at least "location in", "place in", "settlement in". See also our guidlines about descriptions. The only descriptions of geographical features starting with "in" were done by socket puppets of Tobias Conradi, see [3], [4], [5]. --Pasleim (talk) 12:37, 31 July 2015 (UTC)
This section was archived on a request by: User blocked --- Jura 12:24, 4 December 2015 (UTC)

Adding SOC job codes

http://www.bls.gov/soc/#materials

I've noticed that SOC Code (2010) (P919) didn't have much data (https://tools.wmflabs.org/wikidata-todo/translate_items_with_property.php?prop=919), while a canonical list from the Bureau of Labor Statistics (thus PD and freely reusable, cf http://www.bls.gov/bls/linksite.htm) does exist (we're talking 7K items).

The data is tabulated as follows:

SOC Codes and Job Titles
2010 SOC Code 2010 SOC Title 2010 SOC Direct Match Title
11-1011 Chief Executives CEO
11-1011 Chief Executives Chief Executive Officer
11-1011 Chief Executives Chief Operating Officer
11-1011 Chief Executives Commissioner of Internal Revenue
11-1021 General and Operations Managers Department Store General Manager

http://www.bls.gov/soc/#materials It is available at : http://www.bls.gov/soc/soc_2010_direct_match_title_file.xls (with more related files at http://www.bls.gov/soc/#materials)

What I think could be nice are adding the SOC codes based on the 2010 SOC Direct Match Title since so many outside services and pages use the SOC codes (which seems to be a requirement for a lot of job offers in the US).

I've already added manually the French equivalent of the SOC code in Wikidata, so being able to match national codes and job titles through Wikidata would be cool.Teolemon (talk) 18:59, 29 November 2014 (UTC)

SOC is based on an UN standard, ISCO. Each country's bureau of national statistics have their own translation/adaptation of ISCO. There is a standard for historical occupations too, HISCO. In Denmark the national adaptation of this code system is called DISCO, for Norway STYRK and so on. Adding all these codes will be messy, since they may be non-overlapping. But it may also becoma a source for statisticians to get translations from one code to another. H@r@ld (talk) 23:12, 19 January 2015 (UTC)
I have created Wikidata:WikiProject_Occupations_and_professions. The idea is to have them all in Wikidata, so that we can display templates in all the Wikipedia, and get public information, both for statisticians and job seekers doing research. The end game would be to help them both be able to compare or research professions in any countries--Teolemon (talk) 21:17, 17 July 2015 (UTC)

Adding Q20651139 to source section

After some discussion with ‎Innocent bystander, we came to the conclusion that it's preferable to use an item such as Q20651139 to indicate that a statement was derived from the inverse or symmetric property. Thus the following changes would need to be made.

For capital of (P1376), sample change: here

(itemA) P1376 (itemB) ref: P248 (itemB)
=>
(itemA) P1376 (itemB) ref: P143 Q20651139

For diplomatic relation (P530), sample change: here

(itemA) P530 (itemB) ref: P248 (itemB)
=>
(itemA) P530 (itemB) ref: P143 Q20651139

For spouse (P26), sample change: here

(itemA) P26 (itemB) ref: P143 (itemB)
=>
(itemA) P26 (itemB) ref: P143 Q20651139

For child (P40), sample change: here

(itemA) P40 (itemB) ref: P143 (itemB)
=>
(itemA) P40 (itemB) ref: P143 Q20651139

If there is already a bot that could do that, that would be most helpful. --- Jura 14:18, 9 July 2015 (UTC)

Sorry, I believe that using specific item is better (as a source) than single Q20651139. --Infovarius (talk) 12:06, 10 July 2015 (UTC)
"Stated in" Brazil (Q155) does not look like a good option. See my diff here. "Imported from Q20651139" "page:Q155" is then maybe an alternative? -- Innocent bystander (talk) 12:17, 10 July 2015 (UTC)
Yes, I like some variant of "imported from Q155". --Infovarius (talk) 21:29, 12 July 2015 (UTC)

Change qualifier type of P1343

According to Wikidata:Project chat#Change described by source (P1343) qualificator for Wikisource articles discussion please change all stated in (P248) qualifiers of described by source (P1343) property to statement is subject of (P805). Documentation and LUA modules will be updated after bot work complete. -- Vlsergey (talk) 20:54, 21 June 2015 (UTC)

@Vlsergey: On Wikidata:Project chat/Archive/2015/06#Change described by source (P1343) qualificator for Wikisource articles I don't see a consensus for this task. Were their any further discussions? --Pasleim (talk) 10:03, 29 July 2015 (UTC)
@Pasleim: there were some discussion before, and couple of users were against usage of stated in (P248) as qualifier. But if you see no consensus, feel free to close the request as "no consensus" and we will continue to use stated in (P248) as qualifier then. Sounds much simpler for me. -- Vlsergey (talk) 10:54, 29 July 2015 (UTC)

missing pairs items

  • here item A (column 1) has P1344:B, but B (column 2) doesn't have P710:A .they are more than 141,000 cases!

I collected a list which shows missing pairs items queries. please also run the bot on it's queries regularly (every month). Yamaha5 (talk) 22:23, 27 July 2015 (UTC)

Ok please remove the case that are not for general case.Yamaha5 (talk) 09:29, 28 July 2015 (UTC)
If you don't mind, can we annotate it? (I hesitate editing pages in people's userspace). --- Jura 09:34, 28 July 2015 (UTC)
ِyour welcome you or others can edit this page (add or remove). Yamaha5 (talk) 10:08, 28 July 2015 (UTC)

new suggestion

we can separate case at User:Yamaha5/List of missingpairs querys in 2 or 3 groups

  1. completely possible like (*Item A:father (P22) > B existed but not B:child (P40) > A .Should add child (P40) > Query) which are completely true
  2. conditional possible like (Item A:P9 (P9) > B existed but not B:P7 (P7) or P9 (P9) > A .Should add P7 (P7) or P9 (P9) > Query)
  3. needs more checks like (Item A:shares border with (P47) > B existed but not B:shares border with (P47) > A .Should add shares border with (P47) > Query). for this example the bot should the geographical level and do not link country with city!

Yamaha5 (talk) 10:19, 28 July 2015 (UTC)

I separated the list. any comments? Yamaha5 (talk) 10:57, 28 July 2015 (UTC)
Good work! --- Jura 11:00, 28 July 2015 (UTC)
BTW, the other day I added missing P22/P25 based on P40/P21. Prior to doing so, I check if all involved items did have P31:Q5. I know it can apply to others, but this approach avoids most potential problems. --- Jura 11:10, 28 July 2015 (UTC)

Population of French communes

Can any bot migrate data about population of French communities from fr.wiki to Wikidata? Data for particular community is in template Modèle:Données/X/évolution population, where X is a name of a community. Mati7 (talk) 18:30, 20 July 2015 (UTC)

I actually came here today to request a bot to import information from fr-WP, including population. It is better to get the information from a reliable source. The French government agency responsible for censuses (INSEE) has published census information since 1962 in XLS format. The only restriction on reuse (in French) is to acknowledge the source, which Wikidata would do anyways in the form of a reference. Spreadsheets with population data from the 2012 census can be found on this page. In addition to population, the spreadsheets contain additional data which I think can be added easily to Wikidata, even if it is not very useful.

For reference, the hierarchy of administrative districts in France is:

  1. région
  2. département
  3. arrondissement
  4. canton - this is primarily a district for elections (ie. an electoral district)
  5. commune

Communes can be split between multiple cantons, cantons can span multiple arrondissements, but other levels cannot be split (eg. a commune cannot belong to multiple arrondissements). There are also "associated communes" ([[:en:w:Associated communes of France|Wikipedia article) which are recognized districts within communes.

INSEE codes

Every administrative district in France has an INSEE code. The ISEE code is used for other purposes where a code is used. The INSEE code for départements is widely used, such as on vehicle license plates and the names of websites, even when not necessary...for example the website of en:w:Haut-Rhin (Q12722) contains numerous subpages with titles that incorporate its INSEE code (68). Since the population spreadsheets contain the codes for all administrative districts, they should be added while adding the populations.

I believe all communes already have the INSEE municipality code (Property:P374). However, the few that I viewed have the Dutch Wikipedia as the source, so the commune INSEE codes should be checked with the INSEE codes in the population spreadsheets and then change the reference to INSEE (the population spreadsheet). A property should be created for "INSEE department code" for departments (French: départements). I don't know if it's necessary to create a property for every administrative level, but the other levels should have "INSEE code" (Q156705) added. The INSEE code should be added to other levels as well (the linked INSEE municipality code is only for communes).

The INSEE codes for arrondissements, cantons, & communes all begin with the two-digit department code. The master file in the next section contains the codes for arrondissements, cantons, & communes without the department prefix. For example, the INSEE code for Colmar in the Haut-Rhin department is 68066 (68 is the INSEE code for Haut-Rhin department), but in the master file there are columns for the department (which has 68) and for the commune (066). The first column in the spreasheet in the "Older populations" section contains the complete INSEE code.

2012 Population

There are two population values:

  • Population municipale is the number of people who have their usual residence in the district, including people in penitentiaries, homeless people present in the commune at the time of the census, and people in mobile homes.
  • Population totale includes the population municipale plus people residing in the district but usually have a home elsewhere (eg. students living away from their usual home, people without a fixed residence).

The master file for the whole of France is here, using data from the 2012 census (reference date: 1 January 2012). It is produced and published by INSEE (Q156616). It contains 9 sheets:

  1. Regions
  2. Départements
  3. Arrondissements
  4. Cantons - ignore (Canton boundaries were adjusted in 2015 so this is no longer relevant)
  5. Communes
  6. Fractions cantonales - ignore (for communes that are divided between multiple cantons or for multiple cantons in one commune, this lists the population that lies in each canton. However, canton boundaries were adjusted in 2015 so this information is no longer relevant)
  7. Communes associées ou deleguées - "associated communes" (explained above), some may not have a Wikidata page
  8. Collectivitées d'outre-mer - populations of communes in overseas territories (collectivities). Unlike the rest of France, the entire area of an overseas territory is not divided into communes.
  9. Documentation

New boundaries were created for cantons effective in 2015. A spreadsheet with the 2012 population of the cantons based on the 2015 boundaries can be found here.

Older populations

I think it is most important to add the most recent population (2012 census). The populations from 1962, 1968, 1975, 1982, 1990, 1999, 2007, & 2012 for each commune is contained in this spreadsheet. It is produced by INSEE. It has three sheets:

  1. Métropole - European France
  2. DOM - Overseas departments, which have the same status as departments in the Métropole (eg. like Hawaii is a US state with the same status as a state in the continental US). Note that first three censuses were in 1961, 1967, & 1974.
  3. Arm (populations in the arrondissements of Paris)

Discussion

Please add comments below, not in the above text. If I do not respond to a comment for a few days, please leave a message on my English Wikipedia talk page en:w:User talk:AHeneen. AHeneen (talk) 07:33, 24 July 2015 (UTC)

Are we allowed tu publish the census data under Creative Commons CC0 License (Q6938433)? --Pasleim (talk) 15:33, 24 July 2015 (UTC)
The only restriction on resuse is to mention the source (similar to CC-by). AHeneen (talk) 05:06, 25 July 2015 (UTC)

Hello. Before anyone starts the job, I bring some clarifications (it is I who updates the census data in fr Wikipedia). You should know that unlike most countries where census are done on the entire territory periodically, in France since 2004 legal population is produced for each municipality every year, but the census type varies according to each municipality :

  • Municipalities with fewer than 10 000 inhabitants are recorded every 5 years by complete census
  • For those with more than 10 000, a sample of the population is counted every year. The annual collection covers a sample of addresses drawn randomly and representing about 8% of the population.

Then every year, there are three types of data:

  • complete census : municipalities less than 10,000 inhabitants that are the subject of a real census
  • estimated : municipalities less than 10,000 inhabitants that are'nt the subject of a real census in the year
  • Sampling : municipalities with more than 10,000 inhabitants

In the French WP, the choice was made not display in graphs and tables only data corresponding to those of the actual census and those of towns of over 10,000 inhabitants. Then in Wikidata, it is essential to have this qualifier characterizing the census type.

Unless I am mistaken, there is currently in Wikidata Q39825 that "census" should be added the following qualifiers

  • complete census
  • estimate census
  • sample census

Without this information, graphs and tables of the french wikipedia will never use wikidata datas. I can give these qualifier for each municipality and for each year (before 2006 all census are actual) but I want to see first these new qualifiers before, to be sure that we tell the same language.Roland45 (talk) 05:53, 21 October 2015 (UTC)

@Roland45: Do you mean that like presently for instance in Urt (Q842706) and Arles (Q48292), population (P1082) would be used with the qualifier determination method (P459) and an appropriate value? This value could be
  • before 2004, or after 2004 under pop. 10,000 every 5 years, the real (full) census: census (Q39825)
  • after 2004 under pop. 10,000 the rest of the time, the estimate (from previous years?) without a census taken that year: estimation (Q791801)
  • after 2004 over pop. 10,000, an estimate (for the full population if I understand correctly?) from a restricted sample: maybe sample (Q49906)?
Oliv0 (talk) 12:35, 21 October 2015 (UTC)
It's just that. To know the type of census for each municipality, simply look at the calendar.
You can see :
Urt is a municipality under 10 000 habitants - collection year : 2017 (and 2012, 2022, 2027, etc)
Arles is a municipality above 10000 habitants : collection year : each year by sampling
And then we have the following data
year Urt (Q842706) Arles (Q48292)
Value type Value type
1999 1702 census 50467 census
2006 1988 estim. 51970 sample
2007 2028 census 52197 sample
2008 2092 estim. 52729 sample
2009 2183 estim. 52979 sample
2010 2195 estim. 52661 sample
2011 2208 estim. 52510 sample
2012 2220 census 52439 sample

2006 is the first year of publication under the new method. I precise that even if the type of census can be different by year, all values are legal populations. For each commune and each year, I can give data from 1999 to 2012 and the correspondant type of census (by crossing this spreadsheet, this one and this other one).Roland45 (talk) 16:53, 21 October 2015 (UTC)

Sources are good to know, and here with their url; does each value use all the three of them? Oliv0 (talk) 19:01, 21 October 2015 (UTC)
Sources are :
Year Definition of the population (in french) Source 1 (data) Source 2 (calendar)
1962-1999 Population sans doubles comptes années 1962, 1968, 1975, 1982, 1990, 1999 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2014/zip/HIST_POP_COM_RP12.zip
2006 Populations légales des communes en vigueur au 1er janvier 2009 - Date de référence statistique : 1er janvier 2006 - limites territoriales en vigueur au 1er janvier 2008 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2008/xls/ensemble.xls
2007 Populations légales des communes en vigueur au 1er janvier 2010 - Date de référence statistique : 1er janvier 2007 - limites territoriales en vigueur au 1er janvier 2009 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2009/xls/ensemble.xls
2008 Populations légales des communes en vigueur au 1er janvier 2011 - Date de référence statistique : 1er janvier 2008 - limites territoriales en vigueur au 1er janvier 2010 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2010/xls/ensemble.xls
2009 Populations légales des communes en vigueur au 1er janvier 2012 - Date de référence statistique : 1er janvier 2009 - limites territoriales en vigueur au 1er janvier 2011 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2011/xls/ensemble.xls
2010 Populations légales des communes en vigueur au 1er janvier 2013 - Date de référence statistique : 1er janvier 2010 - limites territoriales en vigueur au 1er janvier 2012 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2012/xls/ensemble.xls
2011 Populations légales des communes en vigueur au 1er janvier 2014 - Date de référence statistique : 1er janvier 2011 - limites territoriales en vigueur au 1er janvier 2013 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2013/xls/ensemble.xls http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/resultats/doc/annee-collecte-2015-commune.xls
2012 Populations légales des communes en vigueur au 1er janvier 2015 - Date de référence statistique : 1er janvier 2012 - limites territoriales en vigueur au 1er janvier 2014 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2014/xls/ensemble.xls http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/resultats/doc/annee-collecte-2016-commune.xls

From 1962 to 1999, there is no specific calendar because the whole census was done in one year (1962, 1968, etc). From 2006 to 2010, I have no longer the source of the calendar. In fact the qualifier is easy to deduce from the other sources. The only difference can come from the municipalities which population crosses the threshold of 10000 habitants.Roland45 (talk) 05:10, 22 October 2015 (UTC)

So the "calendar" means the table showing (like in the smaller Urt/Arles table above) which year in the 5-year cycle a given municipality has a "census" and not an "estimation". Oliv0 (talk) 06:29, 22 October 2015 (UTC)
That's right. In fact with these sources you can upload data and qualifiers, by doing year by year. The problem you can find is for the municipalities which have disappeared between 2006 and 2012. You won't have the qualifier for theses municipalities.Roland45 (talk) 11:31, 22 October 2015 (UTC)

License

I would like to add, in case this was felt as a problem here, that as Roland45 said last autumn on his French talk page, INSEE population data such as the 2010 data is published as "Open Data" on data.gouv.fr under "Open Licence" which needs only to attribute the data to the "name of the Producer" (here INSEE), meaning on Wikidata a reference that mentions The National Institute of Statistics and Economic Studies (Q156616). Oliv0 (talk) 07:40, 23 January 2016 (UTC)

Thus, it seems to be incompatible with the CC-0 license. - Bzh-99 (talk) 10:27, 26 February 2016 (UTC)
Why? WD:CC-0 does not forbid us to do in the reference field the only thing asked for by "Open Licence": "acknowledging the source (at least by the name of the Producer)" = INSEE. Oliv0 (talk) 11:37, 26 February 2016 (UTC)
I see how it can seem incompatible for some people but I'm pretty sure it's compatible (and if was incompatible, we should deleted all information about France on Wikidata since all data - and even trivial data like the name of the cities - are in the INSEE database). Regardless of the Open Licence and its requirement, the source should always be indicated per Wikidata rules. Cdlt, VIGNERON (talk) 11:56, 26 February 2016 (UTC)
Puisqu'on est entre francophones, je passe au français
La licence ouverte impose de citer le nom de l'auteur. Si je ne m'abuse, cela n'est pas requis par CC-0. - Bzh-99 (talk) 17:33, 26 February 2016 (UTC)
La Licence Ouverte (ici en français) ne requiert pas d'utiliser la même licence sur les dérivés, donc on n'a qu'à indiquer INSEE et date de mise à jour comme demandé et les utilisateurs de Wikidata font ce qu'ils veulent, tel quel je comprends ; mais ce serait bien d'avoir confirmation par des gens qui s'y connaissent, il n'y a pas ici d'équivalent de c:COM:VPC ? Oliv0 (talk) 18:51, 26 February 2016 (UTC)
Non, la réutilisation implique toujours de mentionner le nom que ce soit sur WD ou ailleurs. Celui qui vient prendre la donnée sur WD doit, selon la Licence Ouverte, également respecter cette condition. La condition de réutilisation s'applique à toutes les réutilisations. La conservation de la même licence n'est pas requise, mais tu ne peux pas sous ce prétexte éliminer une des rares conditions posées par la licence d'origine. Snipre (talk) 22:06, 2 March 2016 (UTC)
Non celui qui prend la donnée sur WD ne la voit pas sous Licence Ouverte mais sous CC-0. Seul le robot WD doit mettre INSEE en réf pour respecter la condition de la Licence Ouverte, et a le droit de changer la licence en CC-0. Mais est-ce qu'il n'y a pas ici un forum spécialisé pour ces questions légales ? Oliv0 (talk) 20:26, 3 March 2016 (UTC)
@Oliv0: Merci de lire la license: dans licence ouverte, on lit que cette licence est compatible avec une licence CC-BY. Une licence CC-BY n'est pas une licence CC0 et implique que tout utilisateur et réutilisateur doit mentionner l'auteur des données. Tu ne peux pas réduire les droits de l'auteur, par contre tu peux modifier les données ou les mettre sous une licence plus restrictive. Par exemple mettre les données en CC-BY-NC. Snipre (talk) 15:48, 11 March 2016 (UTC)
@Snipre: Lis-la toi, "compatible" ne veut pas dire que cette licence exige les conditions de CC-BY (puisque les conditions exigées sont différentes) mais que ça peut être utilisé ensemble sans incompatibilité (pas de contradiction entre les conditions exigées), comme par exemple GFDL est compatible avec CC-BY-SA. Selon la licence, seul le "réutilisateur" défini (ici Wikidata) doit mentionner le nom du producteur et la date de mise à jour (ou un lien vers la source), il n'est pas exigé de conserver cette licence (pas comme CC-SA) ni de la rendre plus restrictive. C'est une licence faite pour faciliter l'usage au maximum, pas pour créer des difficultés. Oliv0 (talk) 16:11, 11 March 2016 (UTC)
Pas sur la question de la mention du l'auteur où CC-BY et licence ouverte ont la même demande:
* Licence ouverte: "Mentionner la paternité de l'"Information": sa source, (a minima le nom du Producteur) et la date de la dernière mise à jour."
* CC-BY: "Cette licence permet aux autres de distribuer, remixer, arranger, et adapter votre œuvre, même à des fins commerciales, tant qu’on vous accorde le mérite de la création originale en citant votre nom".
Je ne vois pas où les 2 licences sont différentes sur ce point. Tous les autres point mentionnés par CC-BY ne sont pas demandé par Licence ouverte, mais ce point-là est similaire. Et c'est ce point-là qui fait défaut à la licence CC0. Cela ne te semble pas bizarre que sur 2 licences citées par l'auteur des données, on puisse faire l'impasse sur l'un des uniques points communs entre les 2 licences ?
Et un réutilisateur de Wikidata est un réutilisateur au sens de la licence ouverte. Une licence ne fait pas de distinction entre le premier utilisateur des données (ici WD) et un réutilisateur qui utilise des données de Wikidata. Il n'y a pas de limitation de licence en fonction de l'origine des données, que tu utilises les données directement du site de l'INSEE ou via un autre site tel que WD, la licence ouverte s'applique ou alors au minimum une licence compatible comme la CC-BY. Va falloir que tu prouves qu'il y a une distinction entre utilisateur, réutilisateur et réréutilisateur (et pourquoi pas réréréutilisateur?) pour pouvoir passer ces données sous une licence qui diminue les droits de la licence d'origine. Tout utilisateur des données de l'INSEE qui tire ces données de WD tombe sous le coup de la licence ouverte en tant que réutilisateur. Snipre (talk) 16:51, 11 March 2016 (UTC)
Un réutilisateur au sens de licence ouverte est "toute personne physique ou morale qui réutilise l'"Information" conformément aux libertés et aux conditions de cette licence". Aucune mention que cette licence ne s'applique qu'aux utilisateurs primaires du site de l'INSEE. Un utilisateur des données de WD tombe sous le coup de cette définition, car, encore une fois le fait de tirer les données directement de l'INSEE ou via une source tierce n'est pas mentionné dans le texte de la licence, tout simplement parce que cette distinction n'a pas de sens. Snipre (talk) 17:08, 11 March 2016 (UTC)
La question n'est pas CC-BY mais Licence Ouverte, qui est une autorisation accordée par un "Producteur" INSEE à un "Réutilisateur" Wikidata, sous certaines conditions parmi lesquelles ne figure pas l'utilisation de la même licence. Une fois ces conditions satisfaites, Wikidata peut donc accorder à ses utilisateurs la licence qu'il veut, ceux-ci ne sont alors pas un "Réutilisateur" au sens de la Licence Ouverte puisqu'ils ne réutilisent plus "conformément aux libertés et aux conditions de cette licence" mais d'une autre. Oliv0 (talk) 18:26, 11 March 2016 (UTC)
@Bzh-99, Snipre, Oliv0: la question de la paternité me semble une fausse question dans la mesure où d'une part, quelle que soit la licence, de toute façon la loi impose de façon générale de mentionner la paternité (entre autres, cf. notamment art 121-1 du CPI pour la France mais des dispositions similaires existent dans la plupart des pays du monde, c'est bien pour cela que les textes des licences se ressemblent sur ce point) et d'autre part, il est dans les us et coutumes des projets Wikimédia d'indiquer la source. De toute façon le droit de paternité fait partie des droits moraux, or la licence CC0 indique clairement 1. Droit d'Auteur et Droits Voisins. Il est possible qu'une Œuvre mise à disposition sous CC0 soit protégée par des droits d'auteur et des droits voisins (« Droit d'Auteur et Droits Voisins »). (qui cite plus loin les droits moraux).
Bref, la situation n'est pas claire comme de l'eau de roche (l'est-ce jamais en ce concerne les licences et leurs compatibilités) mais il ne me semble pas vraiment y avoir d'obstacle à l'importation des données INSEE (d'autant moins que l'INSEE est dans une optique d'ouverture de ses données). Et comme je le disais plus haut, vu le nombre de donnée de l'INSEE que l'on reprend sur Wikidata, si on considère que l'importation est impossible, il y aurait quelques propriétés (au moins INSEE municipality code (P374), INSEE canton code (P2506), INSEE department code (P2586), INSEE region code (P2585)) et des milliers d'éléments à supprimer (en gros, toutes les communes et les cantons de France).
Cdlt, VIGNERON (talk) 15:00, 17 March 2016 (UTC)

fr:WP:Legifer/mars 2016#Changement de licence and c:Commons:Village pump/Copyright/Archive/2016/03#"Open License" and CC-0 also conclude that everything is OK and the imports can be done by a bot owner. Oliv0 (talk) 19:09, 24 March 2016 (UTC)

I added a question to Open Source Stackexchange where the answer is that incorporating CC-By content (and here similarily licensed content) is incompatible with CC-0 : http://opensource.stackexchange.com/questions/4094/can-wikidata-which-runs-under-cc-zero-incorporate-cc-by-content  – The preceding unsigned comment was added by ChristianKl (talk • contribs).

Some stats and info about INSEE codes

Hi,

I've quickly looked into the use of INSEE municipality code (P374). Currently, it's used on 37,196 items (Query: claim[374]) (for information, there is 36,658 on January, 1st 2015 but there was more in the past), all with country (P17) = France (Q142), with instance of (P31) = commune of France (Q484170) (or = municipal arrondissement (Q702842)), with coordinate location (P625), with located in the administrative territorial entity (P131), etc. (see contraint violation of INSEE municipality code (P374) for more informations).

To go further :

For information, there is INSEE canton code (P2506). It could be useful to check the consistency : the INSEE municipality code (P374) of a commune and the INSEE canton code (P2506) of canton where the commune should begin the the same 2 or 3 number (which the INSEE departement code, there is no know exception, and wich is the end of the ISO 3166-2 code (P300) too for Metropolitan France (Q212429)).

tldr; I did it quickly and further inspection should be done but everything seems pretty fine.

Cdlt, VIGNERON (talk) 12:51, 26 February 2016 (UTC)

 Not done due to incompatible license --Pasleim (talk) 13:32, 11 July 2016 (UTC)

This section was archived on a request by: --Pasleim (talk) 13:32, 11 July 2016 (UTC)