Wikidata:Requests for permissions/Bot/TolBot 7
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 07:19, 21 January 2022 (UTC)[reply]
TolBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Tol (talk • contribs • logs)
Task/s: Create items for taxa based on data from GBIF
Code: Not entirely done yet, in Python
Function details: Upon being manually run (by me) and given either a GBIF species or genus ID, it will get data on that species or the species in that genus from GBIF. For each species, it will first search Wikidata for haswbstatement:P846=GBIF ID
(an item that has that GBIF ID). If a page with that ID does not already exist(s), it will search for the scientific name. If there are result(s), it will ask for my input (if one of the results is the same species); if it is the same species, it will add GBIF taxon ID (P846) to the item. If there are no results for either search or the result(s) from the second search are not the same species, then there is probably no item on Wikidata for this taxon, and it will create a new item with title in English, description "species of generic common name in genus genus" (where I manually set the generic common name for each run), instance of (P31): taxon (Q16521), taxon name (P225), taxon rank (P105): species (Q7432), parent taxon (P171), GBIF taxon ID (P846).
In case anybody is wondering about the task number, TolBot is already operating on the English Wikipedia, and I keep track of task numbers globally; this is its first task on Wikidata. This task will also have cross-wiki components; here I am only explaining what it does on Wikidata.
Sincerely, Tol (talk | contribs) @ 21:53, 17 September 2021 (UTC)[reply]
- would like to see some sample edits but generally LGTM BrokenSegue (talk) 03:14, 19 September 2021 (UTC)[reply]
- @BrokenSegue: Am I allowed to do test edits without administrator/community approval? Tol (talk | contribs) @ 17:41, 19 September 2021 (UTC)[reply]
- We allow (and encourage) small test batches as part of the approval process. BrokenSegue (talk) 22:31, 19 September 2021 (UTC)[reply]
- @BrokenSegue: Ah, alright! At English Wikipedia, one has to get approved for a trial. I'll try a test run soon. Tol (talk | contribs) @ 03:03, 20 September 2021 (UTC)[reply]
- Because I run the bot on Google Cloud, which is globally IP-blocked, I've requested IP block exemption at Wikidata:Requests for permissions/Other rights#TolBot. Tol (talk | contribs) @ 03:32, 20 September 2021 (UTC)[reply]
- @BrokenSegue: Ah, alright! At English Wikipedia, one has to get approved for a trial. I'll try a test run soon. Tol (talk | contribs) @ 03:03, 20 September 2021 (UTC)[reply]
- We allow (and encourage) small test batches as part of the approval process. BrokenSegue (talk) 22:31, 19 September 2021 (UTC)[reply]
- @BrokenSegue: Am I allowed to do test edits without administrator/community approval? Tol (talk | contribs) @ 17:41, 19 September 2021 (UTC)[reply]
Test
[edit]Here are 50 test edits (25 creations and 25 GBIF taxon ID (P846) additions):
They look fine to me. @BrokenSegue, what do you think? Tol (talk | contribs) @ 23:54, 21 September 2021 (UTC)[reply]
- wow so organized. yeah so it generally looks good. personally I'd prefer to see references being added when you import data. something like this. When you are doing connections based on the label of the taxon a reference like this might be appropriate. for extra credit you could do this but it's a bit much. anyways generally LGTM. BrokenSegue (talk) 00:05, 22 September 2021 (UTC)[reply]
- @BrokenSegue: Alright, I'll implement those references and qualifiers. Thanks for letting me know! Tol (talk | contribs) @ 00:33, 22 September 2021 (UTC)[reply]
- I'll probably run another test in a few days with these modifications: references added to parent taxon (P171) & taxon name (P225); based on heuristic (P887) as a qualifier to added GBIF taxon ID (P846); subject named as (P1810) as a qualifier to GBIF taxon ID (P846). Tol (talk | contribs) @ 20:44, 22 September 2021 (UTC)[reply]
- @BrokenSegue: Alright, I'll implement those references and qualifiers. Thanks for letting me know! Tol (talk | contribs) @ 00:33, 22 September 2021 (UTC)[reply]
- looks good. Maybe you want to include nl edit as well. @Succu: what do you think? --- Jura 08:43, 24 September 2021 (UTC)[reply]
- I don't think it's good idea to create items based on the results of data aggregators like GBIF, EoL or CoL because they are not curated. A lot of GBIF entries have issues or are marked as deleted. How do you ensure a parent name is belonging to the correct kingdom? There are a lot of homonyms too. I don't think based on heuristic (P887) and subject named as (P1810) are a good idea. I'm missing stated in (P248) and retrieved (P813). BTW: my own bot adds missing GBIF taxon ID (P846) from time to time. --Succu (talk) 13:33, 24 September 2021 (UTC)[reply]
- out of curiosity why do you think we should not add based on heuristic (P887) and subject named as (P1810)? BrokenSegue (talk) 17:24, 24 September 2021 (UTC)[reply]
- Why should we? When sould an external id qualified at all? --Succu (talk) 18:13, 24 September 2021 (UTC)[reply]
- (ec) I hope the match of the id is based on taxon name (P225) and not inferred from name or label (Q84423633). --Succu (talk) 18:18, 24 September 2021 (UTC)[reply]
- References are not qualifiers and have independent value. I'd argue that optimally every statement would be referenced. And plenty of external IDs are suggested to be qualified? When else would subject named as (P1810) be used even? BrokenSegue (talk) 18:17, 24 September 2021 (UTC)[reply]
- The value (id) of GBIF taxon ID (P846) is the object of the triple, so object named as (P1932) would be the correct qualifier, but nonetheless superfluous. --Succu (talk) 18:36, 24 September 2021 (UTC) PS: Of course stated in (P248) and retrieved (P813) should be added as references, not qualifiers. --Succu (talk) 18:40, 24 September 2021 (UTC)[reply]
- References are not qualifiers and have independent value. I'd argue that optimally every statement would be referenced. And plenty of external IDs are suggested to be qualified? When else would subject named as (P1810) be used even? BrokenSegue (talk) 18:17, 24 September 2021 (UTC)[reply]
- @Succu: Thanks for the feedback. I hope to mitigate this by only using taxa which GBIF marks as "accepted". I'll work on getting a schematic of references/qualifiers which I will use. Tol (talk | contribs) @ 19:55, 25 September 2021 (UTC)[reply]
- Can you also address the other point Succu mentioned in his initial comment? If he thinks it's not a suitable source for creating new items, then I'd not do it. --- Jura 08:07, 26 September 2021 (UTC)[reply]
- @Jura1: For curation, I use GBIF's "taxon status" and only use those which are "accepted"; taxa with issues are "doubtful", synonyms are "synonym" or "homotypic_synonym", and deleted taxa are usually "synonym" or "doubtful" and are never "accepted". As for belonging to the correct kingdom, I don't understand how that's an issue: it is given a genus QID and creates items for species in that genus. It gets the GBIF genus ID from the Wikidata item and uses those. As for homonyms, again, it is manually given the parent genus and only creates items for that genus. Tol (talk | contribs) @ 00:36, 27 September 2021 (UTC)[reply]
- Can you also address the other point Succu mentioned in his initial comment? If he thinks it's not a suitable source for creating new items, then I'd not do it. --- Jura 08:07, 26 September 2021 (UTC)[reply]
- out of curiosity why do you think we should not add based on heuristic (P887) and subject named as (P1810)? BrokenSegue (talk) 17:24, 24 September 2021 (UTC)[reply]
- I don't think it's good idea to create items based on the results of data aggregators like GBIF, EoL or CoL because they are not curated. A lot of GBIF entries have issues or are marked as deleted. How do you ensure a parent name is belonging to the correct kingdom? There are a lot of homonyms too. I don't think based on heuristic (P887) and subject named as (P1810) are a good idea. I'm missing stated in (P248) and retrieved (P813). BTW: my own bot adds missing GBIF taxon ID (P846) from time to time. --Succu (talk) 13:33, 24 September 2021 (UTC)[reply]
Info There are alone more than 1,000,000 accepted species missing an GBIF taxon ID (P846) or an item. Around 150,000 are plants. --Succu (talk) 17:44, 27 September 2021 (UTC)[reply]
- @Succu: If you think creating items based on GBIF is unreliable, how about just adding GBIF taxon ID (P846)? Tol (talk | contribs) @ 18:01, 27 September 2021 (UTC)[reply]
@Tol: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:15, 18 January 2022 (UTC)[reply]
- @Mike Peel, yes, this is somewhat inactive, but I'd like to continue with it. Tol (talk | contribs) @ 22:43, 18 January 2022 (UTC)[reply]
- I'm okay with this request. I will approve in a couple of days, provided that no objections will be raised. Lymantria (talk) 06:27, 19 January 2022 (UTC)[reply]