Wikidata:Requests for permissions/Bot/TAMISBot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
Not done, no progress for a while. Feel free to re-open this if work resumes. Thanks. Mike Peel (talk) 18:40, 24 September 2022 (UTC)[reply]
@Mike Peel: I'm not sure I understand the process or your decision. I came on June 9th to ask for update, on July 23th, Lymantria said he felt ok with going forward, and while we were wanting for other interactions, you came on Sept. 24th, and closed the request. But the author still wants to go forward… I fell this process could lower one's motivation. What is still missing to go forward? Regards, Antoine2711 (talk) 20:42, 24 October 2022 (UTC)[reply]
TAMISBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: ChristianBRoy (talk • contribs • logs)
Task/s: TAMISBot will import data about books using the WikiProjet Livres model.
Code: n/a
Function details:
- The bot parses data from an ONIX data source (ONIX is the metadata standard for the book industry)
- The data from the ONIX source is provided by the book publishers, who gave their agreement for data usage and upload to Wikidata
- The bot tries to find existing Wikidata items based on ISBN, title and author
- A human operator reviews the matches (and non matches), makes corrections if needed, and confirms that the bot can do its job without creating duplicates
- The bot creates new items when needed, or adds claims to existing items
- It does not duplicate claims when they already exist
- The created or updated items are all written work (Q47461344), version, edition or translation (Q3331189) and author (Q482980)
--ChristianBRoy (talk) 13:33, 27 August 2021 (UTC)[reply]
- How many new items? What about which books? --- Jura 13:58, 30 August 2021 (UTC)[reply]
- As an estimate, on the short-mid term (end of 2021 or beginning of 2022): potentially approx. 500 books (I believe 95%+ will be new items), two editions each (one paper and one ebook edition) so ~1000 more new items for editions, and one or two authors each book (a lot of the author already exist, so that part should not create that many items). The books are published by Canadian publishers, all of whom publish in French. They include publishers of fiction (for instance Éditions du Boréal (Q3579629), Éditions Alto (Q16684371) and Hurtubise (Q3579355)), non-fiction (Éditions du Septentrion (Q3579664), Guy Saint-Jean éditeur (Q3122128)), and a university press (Presses de l'Université de Montréal (Q21428306)). --ChristianBRoy (talk) 18:06, 30 August 2021 (UTC)[reply]
- How are they selected? We already had a proposal for Quebec depôt légal import, but Wikidata isn't quite equipped for that. --- Jura 22:28, 30 August 2021 (UTC)[reply]
- Books will be related to Québec City (either because of their author or publisher, or because the content of the book is linked to the city by main subject (P921) or narrative location (P840)). The motivation is the city's recognition as a City of Literature (Q3467764), part of the UNESCO Creative Cities Network. The scope is thus a lot smaller than the dépôt légal. --ChristianBRoy (talk) 13:16, 31 August 2021 (UTC)[reply]
- I'm not really convinced by using the publisher as criterion .. essentially it would be the depot legal of a given publisher. The idea to build a comprehensive bibliography about Quebec City (topic or narrative location) seems more interesting. --- Jura 11:46, 1 September 2021 (UTC)[reply]
- Understood. At the same time, having local publishers is part (in my understanding) of the UNESCO criteria. We could begin with "important" books published by those publishers... for instance, Nikolski (Q3341721) is important because it was a game changer for Éditions Alto (Q16684371) during their launch. In that case our bot would add links between the book and the publisher (currently there is no reference at all), and would create version, edition or translation (Q3331189) items in order to make the ISBNs known in Wikidata. We could also chose to limit our projet to books written by authors who already have an item on Wikidata (in which case we would just make sure to add notable work (P800) claims to the author). --ChristianBRoy (talk) 13:48, 1 September 2021 (UTC)[reply]
- I don't see how UNESCO is relevant to this bot request. There are other websites such as Worldcat, that aim to include all books. notable work (P800) is not to list every edition or work of an author. --- Jura 05:37, 12 September 2021 (UTC)[reply]
- Sorry about the UNESCO reference, this was a follow up to my answer to your previous question about the selection of books. I understand that notable work (P800) is not to list every edition or work of an author. However, it can be used to list works that are notable, and some are clearly missing in Wikidata. For instance, Jacques Lacoursière (Q674235) is a famous historian, who has received multiple awards in Canada and in France, but there is no entry about his most notable works (which we would add). ChristianBRoy (talk)
@Jura1: I think you see it the wrong way. This user, ChristianBRoy, has a project with partners that are publisher in Québec. He has access to valuable and verified data from the books these publishers make. Also, using AI and manual work, this project is looking to add to books informations that are usually very hard to come by, like main subject (P921), award received (P166), based on (P144), inspired by (P941), characters (P674) and narrative location (P840) and that we lack on WD. At least 50% of his books will already be on WD, so he will bonifying existing WD elements for most of his work.
I'm the one who recommended that he creates a bot for his automated modifications. I'm not really sure I understand the reservation you are expressing here. Regards, Antoine2711 (talk) 20:08, 13 September 2021 (UTC)[reply]
- What guidance on notability did you you provide? --- Jura 09:20, 14 September 2021 (UTC)[reply]
- First of all, all the publishers are ALREADY in Wikidata. There's 10 or 15, and the maximal scope in the next 10 years is the publishers in Quebec, so maybe 1000. But he's is far from there. His current project, limited to 10 publishers now in Quebec City, is about bonifying data for books published by these publishers. So the notability here is clear. I take for granted here that everybody knows that a publisher publishes books, so putting informations about book for existing publisher is pertinent in Wikidata. He will add data to already notable Wikidata publishers. I think the fact he use IA to extract and identify properties data’s not often defined in books, is also in itself a good reason to welcome this bot and encourage ChristianBRoy to be part of the Wikidata community and be a good and reliable contributor. I know I'm doing my best to show him what I've learned myself here. Can you also support him? Antoine2711 (talk) 01:41, 22 September 2021 (UTC)[reply]
- So your guidance would be that all books and editions are notable if we have items about their publisher? --- Jura 22:23, 26 September 2021 (UTC)[reply]
- @Jura1: I'm not saying any book these publishers did are automatically notables. What I say is the publisher are already notable, because they exist in Wikidata. Publisher get their notoriety from books they publish, and this project with add informations about these, specially information that is hard to get but useful for the public. Antoine2711 (talk) 02:02, 29 September 2021 (UTC)[reply]
- @Jura1: Actually, my understanding of Antoine's comment is that he refers to the structural need notability criteria. By linking books to publishers (and to authors as well), we make the statements about them more useful. We will also very likely improve the overall information about books, by including ISBNs in editions and linking them to works. For instance, La Constellation du Lynx (Q3207769) is a work with an ISBN, which is not structurally correct. We would create the edition for the ISBN and correctly link it to the work. Furthermore, my understanding that books meet Wikidata's notability critera is based on the fact that they are instances of "clearly identifiable conceptual or material entity". ChristianBRoy (talk)
- Yeah, I see your point of view, but I don't think that's the way WD:N is generally understood. Otherwise we would end up having every book and edition for larger publishing groups. The only books that you wouldn't consider notable would be the self-published ones. There are various databases for ISBNs, maybe try these instead? --- Jura 13:37, 25 October 2021 (UTC)[reply]
- The other ISBN databases do not have the same potential reach for the greater public, I believe. Also, as far as I know, none will offer the same possibility to easily link a work to a location or a character, and then do queries around those. And as I said, the idea is not to dump a huge list of ISBNs, but rather a human curated list of a few hundred works. In complement, I am curious as to what makes "having every book and edition for larger publishing groups" not a suitable option? Is that mostly a Wikidata performance / limited ressources concern, or are there editorial reasons for this not being interesting? (honest question, not a trap, I just want to have a better understanding of what makes contributions interesting or not). ChristianBRoy (talk) 16:04, 17 November 2021 (UTC)[reply]
- First of all, all the publishers are ALREADY in Wikidata. There's 10 or 15, and the maximal scope in the next 10 years is the publishers in Quebec, so maybe 1000. But he's is far from there. His current project, limited to 10 publishers now in Quebec City, is about bonifying data for books published by these publishers. So the notability here is clear. I take for granted here that everybody knows that a publisher publishes books, so putting informations about book for existing publisher is pertinent in Wikidata. He will add data to already notable Wikidata publishers. I think the fact he use IA to extract and identify properties data’s not often defined in books, is also in itself a good reason to welcome this bot and encourage ChristianBRoy to be part of the Wikidata community and be a good and reliable contributor. I know I'm doing my best to show him what I've learned myself here. Can you also support him? Antoine2711 (talk) 01:41, 22 September 2021 (UTC)[reply]
@ChristianBRoy: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:16, 18 January 2022 (UTC)[reply]
- Difficult one. I have doubts about adding editions too easily. Lymantria (talk) 06:38, 19 January 2022 (UTC)[reply]
- @Lymantria I could change the bot behaviour and remove the editions feature, it that is a concern. However, this would be contrary to the model used by Wikidata:WikiProject_Books. Moreover, editions are useful for ISBN's (which in turn are very important identifiers for books). Hence the dilemma, I guess. ChristianBRoy (talk) 13:28, 19 January 2022 (UTC)[reply]
- Thanks for asking @Mike Peel! Yes it is still active from my point of view. There is still interest in proceeding. ChristianBRoy (talk) 13:21, 19 January 2022 (UTC)[reply]
- Staff had to plan for deleting of items created too easily for scholarly articles.
- Given the somewhat alarming status about Query Service at End of 2022, I don't see how would have space to mirror ISBN registries/OPACs here.
- If it's Wikibase's technology that interests Quebec's National Library, it's possible to set up separate instances of Wikibase on its server. --- Jura 14:39, 19 January 2022 (UTC)[reply]
- @Jura1 for sake of clarity, this bot is not at all affiliated or related to the Quebec's National Library, sorry if somehow my comments created some confusion about that. That being said, I understand from your comment regarding the query service that your concern is the number of new Wikidata items that would be created, is that correct? If so, what would be a reasonable number? I also understand from your comment that you do not see the interest of mirroring ISBN registries here... are you also saying that there is no interest at all in having books on Wikidata? Otherwise, would my suggestion above, to not create version, edition or translation (Q3331189) items, make sense? ChristianBRoy (talk) 21:10, 21 January 2022 (UTC)[reply]
- The same can be run by other organizations.
- On Wikidata, one needs to follow WD:N. This is not met by excluding self-published books. --- Jura 10:20, 22 January 2022 (UTC)[reply]
- From WD:N, I understand that any book from an author that has a Wikipedia page or Wikidata item is notable. I feel that may be a bit too much (notable authors may have written non-notable books), but at least it is an objective criteria that I can code in the bot. The human operator for the bot would pick books based on their overall interest, but the bot would block uploading any book that does not have a link to an existing Wikipedia or Wikidata author page/item. It that sounds good, we could proceed to a test run for a small sample of books that could be reviewed (as per the approval process on Wikidata:Bots). ChristianBRoy (talk) 14:35, 25 January 2022 (UTC)[reply]
- @Jura1 for sake of clarity, this bot is not at all affiliated or related to the Quebec's National Library, sorry if somehow my comments created some confusion about that. That being said, I understand from your comment regarding the query service that your concern is the number of new Wikidata items that would be created, is that correct? If so, what would be a reasonable number? I also understand from your comment that you do not see the interest of mirroring ISBN registries here... are you also saying that there is no interest at all in having books on Wikidata? Otherwise, would my suggestion above, to not create version, edition or translation (Q3331189) items, make sense? ChristianBRoy (talk) 21:10, 21 January 2022 (UTC)[reply]
Given the long discussion here, I am trying to figure out ways to move forward... Is there anything I can do to help? My last suggestion (a test run) still holds, and I am open to other suggestions as well! --ChristianBRoy (talk) 18:46, 3 March 2022 (UTC)[reply]
@Mike Peel, Jura1, Lymantria: I have a problem here. We have a user that follow the recommandation, and come here with a reasonable request to be granted the bot status. He says he's going to curate a thousand books here on wikidata per year, saying half of them already exists, and his 10 publishers already exists. So, it's mainly enriching, certainatly not database dumping. Also, he says that ALL his data will be curated by a human, which also implies data fo a high quality. Now, Jura1 starts pretending he wants to drop hundreds of thousands of books on Wikidata, at that will create a technological problem or rather put pressure on an already concerning issue of item creation. But nowhere did he or me who came to explain his project, did we say such a thing. This project is perfect for Wikidata. It's enhancement of a small dataset of very particular data, cleaned up by human hand. We should welcome him instead of hitting him in the leg that we have been doing for the last 8 month. Or maybe I missed something, but I tell you, I've been around Wikidata and the Wikimedia Fondation for the last 4 years, so I did my homework. So, what do you need for this request to go forward? Regards, Antoine2711 (talk) 03:33, 27 April 2022 (UTC)[reply]
@Mike Peel, Lymantria: So, this user said he will modify & create 1000 wikidata elements over a period of a year. Could we start a test like we did with my bot, and have ChristianBRoy modify & create a hundred of books on Wikidata and test his robot? Me, I am an OpenRefine user, but his robot is really going to works without a human direct intervention. What would be the next step? Regards, Antoine2711 (talk) 02:57, 9 June 2022 (UTC)[reply]
- Fine with me. I am ready to be convinced. Lymantria (talk) 06:29, 23 July 2022 (UTC)[reply]