Wikidata:Requests for permissions/Bot/Tmdbzhbot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 18:49, 3 June 2021 (UTC)[reply]
Tmdbzhbot[edit]
Tmdbzhbot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Tomyang001 (talk • contribs • logs)
Task/s: Add TMDB person ID (P4985) and TMDB movie ID (P4947) to those movie/TV related items have an associated IMDb ID (P345) while missing the TMDb ids.
Code: I wrote a simple python script with "requests" library. It get "imdb_id" of a certain person in TMDb via api, and search for haswbstatement:"P345=tt0068646" to locate the item, then add the TMDb ids if missing.
for tmdb_id in range(1, 3000000):
imdb_id = requests.get("https://api.themoviedb.org/3/person/{}?api_key={}".format(tmdb_id, tmdb_key)).json().get('imdb_id', '')
if imdb_id:
value = '"'+str(tmdb_id)+'"'
result = requests.get('https://www.wikidata.org/w/api.php?action=query&format=json&list=search&srsearch=haswbstatement%3A%22P345%3D{}%22'.format(imdb_id)).json().get('query').get('search')
if result:
wiki_id = result[0].get('title')
check = requests.get('https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity={}&property=P4985'.format(wiki_id)).json().get('claims')
if not check:
while 1:
R = S.post(URL, data={'action':"wbcreateclaim", 'format':"json", 'entity':wiki_id, 'snaktype':"value", 'property':"P4985", 'value':value, 'bot': 1, 'token':TOKEN, 'maxlag': 5})
if R.json().get('error', {}).get('code') == "maxlag":
time.sleep(5)
continue
GUID = R.json()['claim']['id']
R = S.post(URL, data={'action':"wbsetreference", 'format':"json", 'statement':GUID, 'snaks':"{\"P248\":[{\"snaktype\":\"value\",\"property\":\"P248\",\"datavalue\":{\"type\":\"wikibase-entityid\",\"value\":{\"id\":\"Q20828898\"}}}],\"P345\":[{\"snaktype\":\"value\",\"property\":\"P345\",\"datavalue\":{\"type\":\"string\",\"value\":\""+imdb_id+"\"}}]}", 'bot': 1, 'token':TOKEN})
print(tmdb_id)
break
Function details:
- Add TMDB person ID (P4985) and TMDB movie ID (P4947) to movie/TV related items.
Is the source code available? Looking at the edits you've done up to now it'd be nice if you added a reference indicating it was fetched using the IMDB ID and from the TMDb API. BrokenSegue (talk) 02:32, 26 May 2021 (UTC)[reply]
- @BrokenSegue: I added python3 code for dealing with TMDB person ID (P4985). Please take a look, thanks. --Tomyang001 (talk) 01:00, 27 May 2021 (UTC)[reply]
- @BrokenSegue: Regarding the reference idea, curious what format works best in this case? Josh404 (talk) 01:49, 28 May 2021 (UTC)[reply]
- @Josh404: I would suggest minimally adding stated in (P248) The Movie Database (Q20828898) and maybe also inferred from (P3452) IMDb ID (P345). Bulk edits really should include references even if it's just of this form. BrokenSegue (talk) 03:18, 28 May 2021 (UTC)[reply]
- Nice, inferred from (P3452) was pretty much what I was looking forward. Definitely going to add that to my edits. Josh404 (talk) 03:44, 28 May 2021 (UTC)[reply]
- @BrokenSegue thanks for the help so far. It seems like the value of inferred from (P3452) is supposed to be another Q item, not a property. So IMDb ID (P345) doesn't work, maybe Internet Movie Database (Q37312) instead? Here's a sample reference edit for what I'm thinking. Josh404 (talk) 18:01, 28 May 2021 (UTC)[reply]
- @Josh404: oh right. Yeah that looks reasonable. Very similar to how I source some of my manual imports [1] I think my additional use of a retrieved date is maybe excessive. BrokenSegue (talk) 19:53, 28 May 2021 (UTC)[reply]
- @BrokenSegue @Josh404 thanks for your help. I've improved my code with respecting maxlag and two references (see above). I might need some time looking into whether it filters out deprecated statements, but I don't think it'll be a major problem. Tomyang001 (talk) 04:14, 29 May 2021 (UTC)[reply]
- Cool. Then I Support though you should try to coordinate with the other two people doing this. BrokenSegue (talk) 05:42, 29 May 2021 (UTC)[reply]
- @Josh404: I would suggest minimally adding stated in (P248) The Movie Database (Q20828898) and maybe also inferred from (P3452) IMDb ID (P345). Bulk edits really should include references even if it's just of this form. BrokenSegue (talk) 03:18, 28 May 2021 (UTC)[reply]
- I see that the bot has made over 7,000 edits and is still going strong. Is this still in the phase of "test run of between 50 and 250 edits"? Bovlb (talk) 19:47, 26 May 2021 (UTC)[reply]
- @Tomyang001: That question was for you. Bovlb (talk) 00:30, 27 May 2021 (UTC)[reply]
- @Bovlb: Sorry it's far beyond testing, and I can be sure that the bot works without any error as long as the associated IMDb ID (P345) is correct. I just want it keep running rather than do nothing. Should I stop now and wait for approval? --Tomyang001 (talk) 01:00, 27 May 2021 (UTC)[reply]
@Tomyang001: Thanks for the code sample. I think you should probably pause the bot while this request is reviewed. There's at least one issue with your bot. In particular, this bot doesn't seem like it's respecting/using max lag which we require for all bots. Also, I'm not sure whether haswbstatement filters out deprecated statements. Do you know? Do you intend to add references to your bot as I suggested? BrokenSegue (talk) 01:28, 27 May 2021 (UTC)[reply]
@Tomyang001: awesome that you're interested in working on this task too. I already have a bot approved and running on a the same task Wikidata:Requests for permissions/Bot/Josh404Bot 1 and Wikidata:Requests for permissions/Bot/Josh404Bot 2. I'm not sure what the policy is around this. My implementation is similar to yours using the TMDb API for cross references. Maybe we can work together? Josh404 (talk) 21:46, 27 May 2021 (UTC)[reply]
- @Josh404: oh looks like your bot is also not respecting max lag. oops. BrokenSegue (talk) 03:33, 28 May 2021 (UTC)[reply]
- I've only been writing via QuickStatements. I've assumed they implemented rate-limiting correctly? Josh404 (talk) 03:43, 28 May 2021 (UTC)[reply]
- Ah, ok, never mind. BrokenSegue (talk) 04:17, 28 May 2021 (UTC)[reply]
- I've only been writing via QuickStatements. I've assumed they implemented rate-limiting correctly? Josh404 (talk) 03:43, 28 May 2021 (UTC)[reply]
- @Josh404: My bot follow the order of IMDb id. For TMDB person ID (P4985), it's from 1 (George Lucas) to the last id, and for now it's around 70,000. I don't see your bot follow this order? Maybe we should work together, at least I don't think my bot will cause duplicates or conflicts. Tomyang001 (talk) 10:34, 29 May 2021 (UTC)[reply]
- Mostly looks good to me. If you want to copy my reference format, you could also add a line for:
S.post(URL, data={'action':"wbsetreference", 'format':"json", 'statement':GUID, 'snaks':"{\"P345\":[{\"snaktype\":\"value\",\"property\":\"P345\",\"datavalue\":{\"type\":\"wikibase-entityid\",\"value\":{\"string\":\"" + imdb_id + ""\"}}}]}", 'bot': 1, 'token':TOKEN})
- It seems nice to have when there's multiple IMDbs IDs on a given item. You can tell where it's sourced from. TMDb people can also be looked up by Freebase MID as well. Josh404 (talk) 17:50, 29 May 2021 (UTC)[reply]
- @Tomyang001 It looks like this code might be creating two separate reference rather than bundling both snaks into the same one. I'm not entirely sure what the low level wbsetreference call is supposed to be. I think you can maybe put both snaks into the one wbsetreference like:
R = S.post(URL, data={'action':"wbsetreference", 'format':"json", 'statement':GUID, 'snaks':"{\"P248\":[{\"snaktype\":\"value\",\"property\":\"P248\",\"datavalue\":{\"type\":\"wikibase-entityid\",\"value\":{\"id\":\"Q20828898\"}}}],\"P345\":[{\"snaktype\":\"value\",\"property\":\"P345\",\"datavalue\":{\"type\":\"string\",\"value\":\""+imdb_id+"\"}}]}", 'bot': 1, 'token':TOKEN})
- I haven't tested any of these. But that should also reduce your api call overhead as well. Josh404 (talk) 23:34, 29 May 2021 (UTC)[reply]
- It works. Thanks. Tomyang001 (talk) 00:05, 30 May 2021 (UTC)[reply]
- As chance would have it, it looks like we now have another user doing the same thing - Special:Contributions/Epigdog - again with no references. Is there any connection between you guys? CC @Tomyang001, Josh404, Epigdog Bovlb (talk) 01:02, 29 May 2021 (UTC)[reply]
- Heh, this property got popular all of a sudden. But that user isn't me. All my edits are coming from either me or my bot User:Josh404Bot. I'm in the process of trying to backfill some of these missing references. Josh404 (talk) 01:07, 29 May 2021 (UTC)[reply]
- Is there some kind of news or event that would cause a bunch of random people to all try to do this at once? BrokenSegue (talk) 02:13, 29 May 2021 (UTC)[reply]
- Nothing recent I'm aware of. I did recall some interested from the TMDb admins to add support for qids on their side. Josh404 (talk) 03:24, 29 May 2021 (UTC)[reply]
- Looks good to me now Support. I'll also keep an eye out any duplicate statements. Josh404 (talk)
- I am going to approve the bot in a couple of days provided no objections have been raised.--Ymblanter (talk) 18:39, 31 May 2021 (UTC)[reply]