User talk:Ivan A. Krestinin/To merge

From Wikidata
Jump to navigation Jump to search

Disambiguation pages[edit]

Possible improvement: there would be less false matches if you could discard items with instance of (P31)=Wikimedia disambiguation page (Q4167410)  View with Reasonator View with SQID when the other item has instance of (P31)= "something that is not Wikimedia disambiguation page (Q4167410)  View with Reasonator View with SQID ". See the false match of Air Mali (Q407498)  View with Reasonator View with SQID and Air Mali (Q1199298)  View with Reasonator View with SQID for example. thanks - LaddΩ chat ;) 12:56, 6 May 2014 (UTC)[reply]

page size[edit]

Hi Ivan, this is a brilliant resource. However, the page it too large at the moment (almost 1MB). It would be easier if it was split into separate subpages, say: part 1, dewiki, eswiki, itwiki, etc. --Marcol-it (talk) 10:27, 7 May 2014 (UTC)[reply]

cswiki[edit]

Hello, could you add more languages? sk-cs, sk-en, cs-en... ? JAn Dudík (talk) 06:05, 12 May 2014 (UTC)[reply]

jawiki #[edit]

Hello, jawiki (zh, ko) articles # are about numbers, not years, so you can exclude them. JAn Dudík (talk) 19:54, 12 May 2014 (UTC)[reply]

Exclude items already fully linked[edit]

Hi Ivan, the merge of city of Japan (Q494721)  View with Reasonator View with SQID and list of cities in Japan (Q735757)  View with Reasonator View with SQID is proposed on fr-en ("rating 37"). Both items have links to Wikipedia articles in both fr and en - merging is unnecessary and inappropriate. Could you exclude such item pairs? thanks - LaddΩ chat ;) 00:19, 14 May 2014 (UTC)[reply]

Exclude disambiguation items with mismatching labels[edit]

Hi, another suggestion. To_merge/frwiki#enwiki has section "'# (homonymie)' — '# (disambiguation)' (rating 43)", that proposes a number of merge of disambiguation items with other disambiguation items, like Morocco (Q488261)  View with Reasonator View with SQID and Maroc (Q3294558)  View with Reasonator View with SQID ; however, according to Wikidata:Disambiguation pages task force, Wikimedia disambiguation page (Q4167410) items should only link to WP pages with identical labels ("The item should only contain links to Wikipedia disambiguation pages with the exact same spelling..."). "Morrocco" and "Maroc" must remain distinct, even if in English/French they represent the same country. There would be less false positives if you could exclude such item pairs. Thanks - LaddΩ chat ;) 01:26, 14 May 2014 (UTC)[reply]

nl?[edit]

Could you include nlwiki with your merge candidates? Would be nice :) Lymantria (talk) 05:23, 14 May 2014 (UTC)[reply]

Checked items[edit]

Hi Ivan, where should we note if items are checked and the result is "don't merge"?

Example: 1919 in broadcasting (Q346668) and 1919 in radio (Q16831988)
#1: broadcasting (radio and television), #2: only radio

--Kolja21 (talk) 15:58, 17 May 2014 (UTC)[reply]

Hi, punctual response to your answer is nowhere currently. But the situation is interesting. Bot uses existing items to find disconnected items. "rating" is number of existing connections. For example for your case: there are 95 items like 1986 in radio (Q1299002), 1987 in radio (Q926278), 1988 in radio (Q923214) where #1 and #2 are connected. So to remove discussed pair from list we need split 95 items or merge 1 item... — Ivan A. Krestinin (talk) 16:33, 17 May 2014 (UTC)[reply]
Is it possible to use a list like User:Pasleim/whitelist to filter out false positives on the list here? There is probably overlap between items listed there and those found by your bot. Rigadoun (talk) 03:34, 17 July 2014 (UTC)[reply]
✓ Done, bot uses same page as whitelist now. — Ivan A. Krestinin (talk) 09:42, 19 July 2014 (UTC)[reply]

svwiki[edit]

... is my wish. Would it be possible?

Also, I will link the lists from Help:Merge to make more users find them. Matěj Suchánek (talk) 12:09, 18 May 2014 (UTC)[reply]

Updating[edit]

Hello, there is problem when #1 and #2 are about smae item in linked languages, but other links in #2 are about anything else. When I move link to correct item, in next update is this item not deleted from list

example for cs-sk:

Item 1
  • cs:1999, en:1999
Item 2
  • en:1998, es:1998, sk:1999

Luxembourg at the 1992 Winter Olympics (Q144061) - Luxembourg at the 1992 Summer Olympics (Q144968) is one of these situation. JAn Dudík (talk) 05:51, 28 May 2014 (UTC)[reply]

Indeed. 2012 Guinea-Bissau coup d'état (Q621827) and United Nations Security Council Resolution 2048 (Q15718627) is another example. Kind regards, Lymantria (talk) 09:09, 28 May 2014 (UTC)[reply]
Bot uses dumps. New dumps appear 2 times per month. Bot removes deleted items in cross-dump period only. — Ivan A. Krestinin (talk) 20:33, 4 June 2014 (UTC)[reply]
No problem, Your bot performs a great job! Lymantria (talk) 12:58, 10 June 2014 (UTC)[reply]

Category: YYYY works and Category: YYYY are, in general, not the same thing. E.g. take Category:1307 works (Q8090261), Category:1307 (Q6583731); you can see that w: has a page for both of those, suggesting that they aren't the same category. It Is Me Here t / c 18:11, 4 June 2014 (UTC)[reply]

dawiki[edit]

Would it be possible add dawiki? --Steenth (talk) 12:31, 10 June 2014 (UTC)[reply]

Suggestion: nationality adjectives[edit]

Perhaps it is an idea to let a bot search for merge candidates involving nationality adjectives in order to find couples like nl:Categorie:Thais saxofonist and en:Category:Thai saxophonists? Lymantria (talk) 06:56, 17 June 2014 (UTC)[reply]

Good idea, but it is not simple for implementation... — Ivan A. Krestinin (talk) 18:24, 17 June 2014 (UTC)[reply]
@User:Lymantria: There have been some lists with countries by User:Byrial, but unfortunately he has been inactive for almost a year now. - FakirNL (talk) 12:24, 4 August 2014 (UTC)[reply]

False positives for Category:XYZ and XYZ[edit]

Hello,

Q200794 (en:Eurovision Song Contest 1961) - Q9014598 (en:Category:Eurovision Song Contest 1961) made it to the list. There's probably a way to avoid such false positives. Place Clichy (talk) 11:06, 29 July 2014 (UTC)[reply]

This pair is listed correctly, one of them have mixed main [1] [2] and category namespace. And as Ivan wrote above, such links are removed only when is available new dump. JAn Dudík (talk) 11:41, 29 July 2014 (UTC)[reply]
OK then, I had not seen this past edit. Place Clichy (talk) 17:41, 29 July 2014 (UTC)[reply]

elwiki done[edit]

@FakirNL: Hello,

I believe I have corrected everything at elwiki, and added the few false positives to User:Pasleim/whitelist. You may clean up this page, or run another dump. Place Clichy (talk) 16:59, 30 July 2014 (UTC)[reply]

Good and thanks! Question to Ivan, does Pasleims whitelist have any effect on your merge-project? Other thing I realize now, maybe my timing wasn't ideal, suggesting this project to several users just as the operator seems to be less active (in his Q160169 maybe?) :-) - FakirNL (talk) 17:12, 30 July 2014 (UTC)[reply]
@FakirNL: #Checked items, User talk:Magnus Manske#Merge game. He also stated that he would be more active in two weeks. Matěj Suchánek (talk) 17:35, 30 July 2014 (UTC)[reply]
Thanks Matěj! - FakirNL (talk) 17:40, 30 July 2014 (UTC)[reply]

Show links possible merges are based upon[edit]

Sometimes your project gives false positives and that could have to do with other false links; especially sometimes when it's based on only 3 of 4 links. How hard would it be have a button "based upon similar items here" above every caption? Just another idea. - FakirNL (talk) 22:20, 7 August 2014 (UTC)[reply]

✓ DoneIvan A. Krestinin (talk) 20:32, 9 August 2014 (UTC)[reply]
Hi User:Ivan A. Krestinin, could you, in your next run, perhaps list slightly more of these "based upon"-links? How about 6 or 10 instead of 3? If a false positive is based upon 4 or 5 links, it's sometimes hard to find the fourth and fifth though it needs to be changed. - FakirNL (talk) 19:01, 21 September 2014 (UTC)[reply]
Hello, the reports are too large already... Is this really often situation? — Ivan A. Krestinin (talk) 19:15, 21 September 2014 (UTC)[reply]
Well, the reports are getting smaller every time! Polish had a lot of false positives and was decreased from over 1 Mb to 105 kb last time. The last run French was the largest but I decreased it from 378 kb to 147 kb and I believe now Russian is the largest at 167 kb but it has a lot of items fixed so it will be much smaller next time. Let's say 6 items (not 10) as workable compromise? - FakirNL (talk) 20:05, 21 September 2014 (UTC)[reply]
I would like to repeat my request to give six "based on"s instead of three and I'll explain why. If there is a false positive based on five items, only three are listed. Now if you handle those three, two false positives remain and the next run the item will not be listed though two false are still around and they might be hard to find. If six items are listed, it's much easier to catch all those false positives in one fix. So, if possible, based on six items would be an improvement. - FakirNL (talk) 09:57, 31 October 2014 (UTC)[reply]
✓ Done, increased to 6 items. ~3 hours are needed for the reports update. — Ivan A. Krestinin (talk) 19:04, 31 October 2014 (UTC)[reply]

Wikinews[edit]

please, add wikinews to merging process. JAn Dudík (talk) 05:20, 21 August 2014 (UTC)[reply]

✓ Done, bot will create the reports in 4-5 hours. If crash will not happen :-). — Ivan A. Krestinin (talk) 17:57, 21 August 2014 (UTC)[reply]
There is no any wikinews links in the latest wikidata dump. We need wait new dump. — Ivan A. Krestinin (talk) 18:47, 21 August 2014 (UTC)[reply]

Commons[edit]

Would it be possible add commons to merging process? --Steenth (talk) 08:04, 21 August 2014 (UTC)[reply]

A request[edit]

Would you please add guwiki, mrwiki, newiki, newwiki, sawiki, bhwiki, maiwiki and piwiki? I will try to cover all of them.--Vyom25 (talk) 17:49, 3 May 2015 (UTC)[reply]

  • Hello, bot process these wikies, but find nothing to merge. Bot will create report if find something. Also please note, bot uses existing interwiki links to find additional unlinked pairs. — Ivan A. Krestinin (talk) 19:34, 3 May 2015 (UTC)[reply]
Okay... Thank you.--Vyom25 (talk) 06:19, 4 May 2015 (UTC)[reply]

Hello, can you generate reports for tawiki, mlwiki and knwiki? thanks in advance.--Vyom25 (talk) 07:17, 11 October 2015 (UTC)[reply]

A blacklist for false positives?[edit]

Hi Ivan, I was made aware of some false positives on your otherwise very useful lists: User:Ivan A. Krestinin/To merge/frwiki#enwiki, then all items in section Catégorie:Rameur (aviron) aux Jeux olympiques d'été de # — Category:Rowers at the # Summer Olympics. Do you provide a way to exclude these items via a blacklist? Otherwise Wikidata users tend to merge them, although they should not be merged. Thanks and regards, MisterSynergy (talk) 21:01, 15 December 2015 (UTC)[reply]

@MisterSynergy: See #Checked items. Matěj Suchánek (talk) 14:03, 16 December 2015 (UTC)[reply]
Thanks, I’ve just added these cases to the whitelist. —MisterSynergy (talk) 15:42, 16 December 2015 (UTC)[reply]

IsRedirect check[edit]

Since updating occurs only once a week I would propose to add IsRedirect check info to the list (like 3rd column here). --Wintik (talk) 18:24, 23 August 2016 (UTC)[reply]

If you want to see whether a page is already a redirect, add this to your stylesheet. Matěj Suchánek (talk) 18:45, 23 August 2016 (UTC)[reply]
Thanks, good for me, nice point to create my own common.css --Wintik (talk) 19:20, 23 August 2016 (UTC)[reply]