Wikidata talk:Tools/OpenRefine

From Wikidata
Jump to navigation Jump to search

For OpenRefine support, you can use:

Make sure you check OpenRefine's official documentation too.

If you are familiar with OpenRefine and happy to help others, you can add yourself below:

Reconcile -> get item qid[edit]

„Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. TODO: tutorial“ Nice feature ;-) Can anybody point me to a tutorial or documentation (or examples) with some more details? I did some reconciling with painting meta data and got good matches, but how can I convert the colum to the real QID to use it for import with Quick statements? Is that possible at all? If I could find it out, I would even try to write a tutorial … thx. --Elya (talk) 19:27, 18 November 2017 (UTC)[reply]

@Elya:, this feature was released yesterday in OpenRefine 2.8 so that's why the tutorial hasn't been done earlier. I recommend ArthurPSmith's tutorial at WikidataCon: the feature you are looking for is covered around 11:30 in this video: https://media.ccc.de/v/wikidatacon2017-10020-openrefine_demoPintoch (talk) 10:33, 19 November 2017 (UTC)[reply]
Pintoch – ahh, I did not realize that there was such a fresh update, thank you. I checked the feature and it's nice, but I don't see how to pull the qid from the reconciled data. I need them to import the list with quick statements. However, for the records: I found that the Wikidata/Wikipedia add-ons for Google Spreadsheets can do the job. So the workflow could be: reconcile the data with Open Refine, export to Google docs and pull the ID via =WIKIDATAQID(concat("en:",F2)) ("de" being the language code and "F2" the cell name with the reconciled labelname). One could do a double check with your own language label by =WIKIDATALABELS(G2,"de"), being "G2" the QID result and "de" your own language. --Elya (talk) 18:19, 19 November 2017 (UTC)[reply]
@Elya: no need to use Google Sheets for that! Just use the first recipe mentioned on Wikidata:OpenRefine#Recipes. − Pintoch (talk) 10:02, 20 November 2017 (UTC)[reply]
*blush* – there is a German saying like „you are at an advantage if you can read“. Thanks again. --Elya (talk) 17:14, 20 November 2017 (UTC)[reply]

Command line tool for the reconciliation endpoint[edit]

OpenRefine is a cool tool, but it requires a UI to work. I was wondering if anyone wrote a command line client for the reconciliation endpoint, so that it can be used with PyWikiBot or other similar frameworks?--Strainu (talk) 08:18, 23 December 2017 (UTC)[reply]

Not that I am aware of. It's quite straightforward to call the API directly with a library such as requests in Python, but yeah, it would be great to have a nice wrapper. − Pintoch (talk) 15:05, 23 December 2017 (UTC)[reply]

First impressions from a newbie walk-through of the API tutorial[edit]

I posted some feedback on my experience with Wikidata:Tools/OpenRefine/Editing/Tutorials/Working with APIs at Wikidata:Edit groups/OR/8cb3004. --Daniel Mietchen (talk) 02:23, 30 June 2018 (UTC)[reply]

Error "join expects an array and a string"[edit]

I'm in Wikidata:Tools/OpenRefine/Editing/Tutorials/Working with APIs again (now using version 3.1) and currently stuck in the Extracting information from JSON responses section at the step

we first need to extract the list of employers by parsing the JSON payload, using the Add column based on this column operation (in the Edit column menu of the employments column) with this expression: value.parseJson()["employment-summary"].join('###')

When I paste that expression into the expression field in the pop up form, I am getting the error "join expects an array and a string" and don't know how to proceed. I also tried to post a message to the OpenRefine forum but do not see this up there yet. Any hints? --Daniel Mietchen (talk) 19:16, 7 December 2018 (UTC)[reply]

@Daniel Mietchen: Did you succeed in fetching the employments JSON from ORCID at the previous stage? You should see some JSON in this column. The expression will fail when the cells do not contain any JSON payload, but should succeed otherwise. Here is a screenshot explaining the difference: http://pintoch.ulminfo.fr/30d61ac289/grel_json_parsing.png . If you want to get more familiar with OpenRefine in general (which should help you understand this sort of behaviour better), the previous tutorials might be useful (as the workflow they present is simpler). Other generic OpenRefine tutorials could help too, for instance https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine (many others are listed here: http://openrefine.org/documentation.html) − Pintoch (talk) 09:56, 8 December 2018 (UTC)[reply]

No Wikidata hovercards[edit]

I cannot see the Wikidata hovercards. I use OpenRefine 3.1 and tested it with Firefox and Chrome. The browser doesn't make any requests when holding my mouse over an reconciled entry. The preview url (https://tools.wmflabs.org/openrefine-wikidata/en/preview?id=Q42) does work though. Is there something you have to turn on in order to see the hovercards?--CENNOXX (talk) 14:36, 30 January 2019 (UTC)[reply]

Hi CennoxX, yes the behaviour of the preview is a bit counter-intuitive: you need to click on one of the reconciliation options, when a cell is not matched (the light blue links). When a cell is matched, there is no way to trigger the preview. This should be changed as you are not the first one to report this, so I am opening an issue about that: https://github.com/OpenRefine/OpenRefine/issues/1943Pintoch (talk) 22:14, 31 January 2019 (UTC)[reply]
Ahh, that's how it works :) Thank you!--CENNOXX (talk) 12:01, 1 February 2019 (UTC)[reply]

Coordinates formatting[edit]

Hello, what's the format supported by OpenRefine so as to load geographical coordinates and push to Wikidata ? Any procedure to updates geographical coordinates? For instance, I've got data like this one

ARPLatitude ARPLatitudeS ARPLongitude ARPLongitudeS
63-23-28.3370N 228208.3370N 148-57-20.2190W 536240.2190W

Thanks Bouzinac (talk) 09:25, 14 February 2019 (UTC)[reply]

Hi Bouzinac, the format supported by OpenRefine is documented at Wikidata:Tools/OpenRefine/Editing/Schema_alignment#Globe_coordinates. I am not familiar with the coordinate formats that you have. Here is the procedure I would use to upload them:
  • first, understand what these coordinates mean! Maybe the format with dashes mean "degree-minutes-seconds"? Can you find any relevant documentation about these formats?
  • second, work out how to convert these coordinates to the decimal format that OpenRefine understands. You will need to create an expression (in GREL or Python for instance) to convert these.
  • finally, apply the transformation in your OpenRefine project and upload the coordinates.
I cannot do this research for you - it is up to you to understand what your data means. However, if you can find any documentation about any of these formats showing that they are sufficiently standardized, I would be happy to make OpenRefine understand them natively in the upcoming version. Cheers, − Pintoch (talk) 10:57, 14 February 2019 (UTC)[reply]
It's coordinates from this framework/référentiel there https://nfdc.faa.gov/nfdcApps/services/ajv5/airportDisplay.jsp?airportId=9TN0

I suppose they used - instead of ° and ' and ''. I also wonder if I should concatenate lat,lon or lon,lat and then put into Wikidata... Bouzinac (talk) 11:41, 14 February 2019 (UTC)[reply]

Yes, once you have converted longitude and latitude to degrees you will need to concatenate them into one column. Is the documentation not clear enough about that? Feel free to edit it to make that clearer. If these numbers are indeed degrees, minutes and seconds, then please have a look at https://fr.wikipedia.org/wiki/Coordonn%C3%A9es_g%C3%A9ographiques, to understand how to convert that to degrees. If you have any difficulty with carying out the steps above, let me know - I just don't think it helps to give you a ready-made GREL expression: it is better if you explain me exactly what step you are struggling at. − Pintoch (talk) 11:55, 14 February 2019 (UTC)[reply]
Salut again, well OpenRefine refused my formatting. Here's the datafile … https://docs.google.com/spreadsheets/d/10eWaYar8AtXmc0yQoXH8VgTbp0j-tFqt1VCGqMJrgfs/edit?usp=sharing with the formatted column 'RetravailCoord'... Dis moi ce que tu en penses ?

NB : la réconciliation via code FAA n'a pas fonctionné… j'ai du exporter les items avec code FAA, fait un recherchev et repeuplé le fichier avec les n° de Q.... Bouzinac (talk) 15:11, 14 February 2019 (UTC)[reply]

Salut Bouzinac. Comme j'ai essayé de l'expliquer plus haut il ne suffit pas de concaténer tes champs. D'après la documentation (que je viens de mettre à jour pour que ce soit plus clair), OpenRefine s'attend à ce que tes latitudes et longitudes soient en format décimal, ce qui n'est actuellement pas le cas. Il faut donc faire cette conversion d'abord: transformer tes 63-23-28N en 63.3911. La formule est assez simple: 63 + 23/60 + 28/3600, et il faut ajouter un signe + ou - en fonction de la lettre à la fin (+ pour N et E, - pour S et W). Est-ce que tu veux essayer de traduire ça en code GREL ou Python ? C'est un bon exercice et je suis sûr que tu peux trouver des exemples en Python pour l'inspiration si nécessaire. Pour ta réconciliation, as-tu pris soin d'enlever tout filtrage par type? − Pintoch (talk) 16:16, 14 February 2019 (UTC)[reply]
Bonsoir @Pintoch:, alors, j'ai fait la conversion. En revanche, vu la lenteur de mon PC, peut être que tu aurais moyen de mettre à jour bien plus rapidement que moi avec ce fichier là ? (colonnes en bleu)

https://docs.google.com/spreadsheets/d/10eWaYar8AtXmc0yQoXH8VgTbp0j-tFqt1VCGqMJrgfs/edit?usp=sharing Avec comme référence : https://www.faa.gov/airports/airport_safety/airportdata_5010/... Je t'en remercie d'avance (PI mon import openrefine dure depuis déjà le début de cet après midi pour environ 17.000 lignes …. Bouzinac (talk) 19:10, 14 February 2019 (UTC)[reply]

A priori, pour envoyer les données sur Wikidata, la lenteur ne vient pas de ton PC mais des limites de vitesse d'édition imposées par Wikidata (30 edits/minute => ~9h30 pour faire 17000 modifications)… Donc je n'ai pas non plus de moyen de mettre à jour plus rapidement que ça malheureusement. − Pintoch (talk) 20:51, 14 February 2019 (UTC)[reply]
je ne voudrais pas critiquer (c'est so french!) mais à mon travail, je travaille sur des bases Teradata qui avalent sans broncher ou presque des modifs (selects/updates) sur une dizaine de millions de clients sur leurs identifiants/caractéristiques avec des photos jour par jour…. Donc bon, la comparaison (de mon point de vue) se fait à cette aune… Je reconnais en revanche que mon entreprise paie une fortune pour ce service, et Wikidata est gratos donc loin de moi de jeter la pierre. Bouzinac (talk) 21:31, 14 February 2019 (UTC)[reply]

Invalid credentials[edit]

I can't log. Is this problem due to cyrillic? Сидик из ПТУ (talk) 06:25, 3 August 2019 (UTC)[reply]

@Сидик из ПТУ: woops, thanks for reporting that! Which operating system and browser are you using? − Pintoch (talk) 08:00, 3 August 2019 (UTC)[reply]
Windows 10, Chrome 76.0.3809.87. Everything is exactly the same with Mozilla. Сидик из ПТУ (talk) 19:12, 3 August 2019 (UTC)[reply]
@Сидик из ПТУ: Could show me the contents of the terminal when you try to log in? − Pintoch (talk) 08:45, 4 August 2019 (UTC)[reply]
[1]. Сидик из ПТУ (talk) 08:50, 4 August 2019 (UTC)[reply]
@Сидик из ПТУ: is it possible that you are using two factor authentication? That is sadly not supported by OpenRefine at the moment, since the underlying library Wikidata-Toolkit does not support it. − Pintoch (talk) 09:36, 4 August 2019 (UTC)[reply]
No, this is the first time I've heard that Wikimedia has two-factor authorization. Maybe there are restrictions for passwords? I have not changed my password since 2006, now the system offers to complicate it.Сидик из ПТУ (talk) 09:44, 4 August 2019 (UTC)[reply]
Solved by updating password which was less than 10 characters. Сидик из ПТУ (talk) 09:53, 4 August 2019 (UTC)[reply]

Good datasets to use as OpenRefine exercises?[edit]

Hi all,

I will be running an OpenRefine workshop at WikiTechStorm 2019 and I am looking for any dataset whose import in Wikidata could constitute a good exercise with OpenRefine. If there are any data imports that you have been thinking about and could be doable in a few hours, let me know! You can also create a subtask on phabricator at phab:T236038.

Thanks! − Pintoch (talk) 16:34, 14 November 2019 (UTC)[reply]

I think importing certain award recipient statements would cover several features, imagine a list:
  • 2000 - John Doe
  • 2001 - Joseph   Dow, Jane Doohes
  • 2002 - Jon Does

That would cover features like:

  • Importing from clipboard
  • Splitting to columns
  • Transposing (duplicate 2001 for each recipient)
  • Cleaning up whitespaces and similar
  • Reconciling (creating new items for red links)
  • Creating a schema with label, instance of human, references
  • Adding a statement about award received with qualifiers (point in time, award shared with(?))
  • Performing data upload

--Papuass (talk) 11:51, 15 November 2019 (UTC)[reply]

@Papuass: indeed! I agree this is a great use case. Do you have any particular awards in mind? − Pintoch (talk) 12:28, 15 November 2019 (UTC)[reply]
Don't forget very "difficult" geographical dataset with coordinates to be "recoded" into Lat,Long ==> for instance look at [2]. Plus they have a "fun" FAA identifier codes with string like '0E2' which happens to become 20000 in Excel ;) [even if you specify Excel it is string... :rolleyes:) Bouzinac (talk) 13:10, 15 November 2019 (UTC)[reply]
@Bouzinac: Haha nice, but I think I'm going to keep it as simple as I can! That being said I'd be very interested to learn what are your recipes to translate these coordinates. Perhaps the Wikidata extension could support these conversions natively? − Pintoch (talk) 13:32, 16 November 2019 (UTC)[reply]
Hi, something like grel:toNumber(substring(value,0,3))*1.0+toNumber(substring(value,4,6))*1.0/60+toNumber(substring(value,8,12))/3600 Bouzinac (talk) 19:39, 16 November 2019 (UTC)[reply]
ask @Yarl: and @Cassandreces: - https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Tutorials/Video See also https://www.wikidata.org/wiki/Wikidata_talk:Events/Wikidata_Zurich_Training2019#open_refine_example LaMèreVeille (talk)

Slow performance[edit]

Hi, since weeks I have issue with my activity on Open Refine. It does not import new projects, not even old datasheets that I know for sure were ok, the tool is just stuck. I use version 3.3. beta on Windows 10, with Firefox. I tried with another editor the new version on his computer but the problem it's still there. After a lot of resaving and changing format I managed to make it work but I would like stress that it was not importing anymore data sheets that were already imported weeks ago. Can I ask what is the format people here mostly use?

Also, when I perform Wikidata edits the Wikibase step today was superslow. it was asimple insertions in 50 items and it took almost ten minutes to transfer circa 60% of the schema. Where is possible to check if other people are experiencing the same problem here on Wikidata?--Alexmar983 (talk) 16:07, 13 February 2020 (UTC)[reply]

If by "data sheet" you mean "Google Sheet" then you might want to update to version 3.3 where a bug in Google Sheets import was fixed. Concerning the slow edit upload step, this is not due to OpenRefine itself: Wikidata is overloaded. See this ticket: phab:T243701Pintoch (talk) 10:29, 14 February 2020 (UTC)[reply]
(forgot to @Alexmar983: − Pintoch (talk) 10:29, 14 February 2020 (UTC))[reply]
"data sheet" means User:Pintoch various formats. I don't use google sheet (never heard of it, is that what many users use with OpenRefine?). I use normally csv, xls, ods files and they looked all very difficult to upload recently, even if I am sure I have uploaded these versions weeks ago. As I said, we had the same problem on a computer with my same OS and browser where we used a newly installed 3.3 version and not my 3.3 beta (I try not to change the version since I have not understood perfectly where the projects are stored and I am in the middle of a massive import). Thank your for the Phab link--Alexmar983 (talk) 12:48, 14 February 2020 (UTC)[reply]

Bug with reconciliation and identical labels[edit]

Hi, I am doing with User:Epìdosis this import and as you can see even if you don not understand Italian, it's about GLAMS items and we are very precise.

After an entire day of a specific issue I can state with no doubt that there is some bug with some reconciliation process. I assume it's a problem of internal backlog or cache, probably (not sure if my jargon is correct).

I did not notice it with small datasets (50-200 items involved) where statistically the frequency was lower, but yesterday I imported one batch of 500 items and it was really clear (we learned slowly how to use it and the size of import increased progressively).

So, the first "bug" we noticed was when we realized that the tool needs a specific title for every new item, and it does not understand that they are on different rows in the datasheet because they are different. It's not really a bug, but we did not import items with IDs as a first step and we could not notice rapidly. When we imported items with IDs the tool warned us of the duplicate of insertion on the same reconciled or new items, so we realized quickly the reason and that the tool does not warn you of that when other statements than IDs are involved. So because of this, Q86118587 was created few days ago but we fixed it.

The bug we discovered at this point is as follows. We knew this could happen and we wanted to take care but even knowing that and paying attention some of them in any case were left after reconciliation, I did not notice all of them immediately, it's a big file. We revised them in any case and and we also spotted which cells were reconcilied to existing wrong items or were declared all reconciled on the same new item. We cleaned up and fixed them one by one. We are accurate, we are checking in any case, no big deal. What happened at this point however is that the tool did not save these changes at all.

We ended up with for example with this case and other ones. Now please notice that that case proves the bug because when preparing the list full of "town library" or "library of the diocese"... we specifically changed the descriptions all the time to include the name of the town not only on my Opne Refine sheets but also in the existing items on Wikidata so people in the future will have less confusion and less wrong reconciliations (well, it uses probably the English label and I modified Italian one, but you get the idea). We did not forget it by mistake, we clearly cleaned up.

In the import however the items wrongly reconciled remained as such and the one marked as new with a a generic title end up all cluttered together in a single item. All our effort was wasted. Some of them you can notice because of ID warnings, but some of them you don't. The only solution I found was to isolate them with a star, start the import (tabula rasa), and at that point it is possible to finally fix the reconciliation of these left-over items simply doing it again.

Is there any way to fix that? I must confess, I was upset I had to undo some edits, we plan to revise them in any case but I really cared about not inserting mistakes.

Of course the solution is to check similar titles ALL before the reconciliation (with the help of the text facet), but on a big datasheet statistically there might be two or more unseen items wrongly reconcilied to the same existing one and there is no solution for that even if you notice... except excluding them for the import, purging some cache probably, and do a second import.--Alexmar983 (talk) 02:39, 28 February 2020 (UTC)[reply]

User:Pintoch sorry if I ask you directly but.. any clue on that? Maybe you know and you find a better protocol to handle it.--Alexmar983 (talk) 02:48, 28 February 2020 (UTC)[reply]
@Alexmar983: I am not sure I completely understand all of your report but here are a few things that could potentially help you:
  • Are you aware of the distinction between the two actions "Create a new item for each cell" and "Create one new item for similar cells"? Have you used the appropriate action for your dataset? This could potentially explain the issue of two new items being incorrectly merged into the same item, or one new item being duplicated incorrectly.
  • The problem with changes not being uploaded can be due to the fact that Wikidata has been overloaded frequently over the past weeks;
  • About undoing your edits, are you aware that you can use EditGroups for that?
Pintoch (talk) 15:59, 28 February 2020 (UTC)[reply]
Pintoch yes I know the difference, that was not the cause of the problem. Actually if I had marked a wrongly reconcilied item as a new item creation on all similar cells, I would have had new items. Instead I got information merged in the same item.
I might try EditGroups but still the problems of these undoing remains.
I can assure that I marked dozens of items as "new items" or I changed their titles to less generic ones and such actions were not saved. No matter how much I waited, how much I did it... it looked ok superficially but not in practice. I had to remove these "corrupted" items manually, run the import and only at that point finally import them starting again. If you don't do so, it does not saves such changes of information in the cell, if one reconciliation was already perfomed. It only looks like that.
So if "biblioteca comunale" possible label remains in two reconcilied new items, I change their title to "biblioteca comunale of X" and "biblioteca comunale of Y", I mark again as new items... the tool just ignore all of that. I have to mark them with a star, remove with facet, run the dataset and than open those two items again at the end and run the process again. Don't ask me why, but it's a bug. If course as a every subtle bug if you don't experience yourself, it sounds confusing. I can assure it is when you spend hours try to diagnose it. That's why even if tired I wrote everything down...--Alexmar983 (talk) 16:15, 28 February 2020 (UTC)[reply]
I try again with one core example. So I have on my data sheets two line of information. Both about a institution named "biblioteca comunale" (town library) in the cell but all properties such as addresses, inception date, telehone, administrative entity are different...I mark both cell as new. The tool is "stupid" and merge them all in one item.
Ok, I learned my lesson. I could have learned also importing firstly items with IDs because If there is also an IDs, the tool informs me that it is creating an item with two IDs with a warning.
New import. This time I try to specificy better titles "town library of X/Y/Z..." and I reconcile. I missed some of them becasue it's a lot, it's normal to miss a few ones. I take my two cells with "biblioteca comunale", add the name of the town and I mark again as new. I also take two wrongly reconcilied cells for a "biblioteca diocesana" (diocese's library) where that of a certian town was linked to that a totally wrong town and mark them all as a new creations. I save them all, I save my schema, checked everything again. Not in a rush. I finally start my import. It's a mess again. I notice that it did not save anything because when IDs are different it actually warns me again that I am creating a unified item. The warning is in fact still there.
I try to fix all the cases with IDs and also to spot all the cases without IDs of same titles I should have checked, hoping to remind them all. I keep asking the tool to save them as new items, but nothing. The warning remains just there.
So, I removed them one by one, I import the rest of the data sheet and finally I open the page with those cells. All my correction is now gone. They all look wrongly matched as they were after the first reconciliation. This time however the tool saves my corrections and I can import them properly. Avoiding a cluttered insertion of worng information on just one new or existing item.--Alexmar983 (talk) 17:08, 28 February 2020 (UTC)[reply]
@Alexmar983: I changed their titles to less generic ones - changing the cell values after reconciliation does not change the reconciliation data at all, indeed. So it is normal that this does not fix your problem. − Pintoch (talk) 21:18, 28 February 2020 (UTC)[reply]
Pintoch ok, good to know... but since it visually looks like I am clicking and changing, how am I supposed to realize that is not effective at all? Why allowing me to change it if it does not change it in reality? If it's not a bug it's "poor design", right? So, if I discover that the reconciliation is wrong, I cannot do anything at that point? I can only isolate the cells and fix them later. I could restart reconciliation but I will loose all the manual work until that point, am I right? Since poorly reconcilied cells is something that can happen, what is the protocol you would suggest to use in that case?--Alexmar983 (talk) 21:34, 28 February 2020 (UTC)[reply]
@Alexmar983: you can change reconciliation data, just by clicking "Choose new match" on the cell. This will let you mark the cell as matched to any other item, or as a new item (make sure you use the single tick, not the double tick, in that case). If you want to re-run reconciliation on a subset of rows, you can also do that: just use facets to filter down to the rows you want to reconcile again. The rows which have been excluded by your facets will not be affected by this new reconciliation. − Pintoch (talk) 22:05, 28 February 2020 (UTC)[reply]
No Pintoch that is the bug, I can't mark the cell. You can ask also User:Epìdosis because he was with me when I showed him more than once. That's what i did very accurately, it looked done but it never registered the change in reality... just aparently. BTW yes thank you for the tip. Instead of running later the process again after the first import excluding them, I will reconcile them separately. Than import everything in one step on Wikidata. That should work.--Alexmar983 (talk) 00:02, 29 February 2020 (UTC)[reply]
@Alexmar983: I cannot reproduce this bug on my side. If you ever come across it again, it would be useful to share an export of this project and let us know for which cell "Choose new match" does not work (and exactly what you mean by "cannot mark the cell"). − Pintoch (talk) 09:46, 29 February 2020 (UTC)[reply]
Pintoch I have open refine 3.3 of December with Firefox and Windows 10. I did not upload the new version of few weeks ago, but I will at the end of this import cycle (simply, I try to remain with one versione while working on a project). And pardon if my jargon is not precise yet, but basically when I mark "as a new item" in the cell it looks like if it does everything (so I see the two squared boxes below, I click etc...) but in reality it changes nothing. Some if a different reconciliation is needed. I open the inofobox, I digit a precise item on the box on the right, I click... but in reality the originally wrongly matched item is still there. Only when I have imported I can really change such reconciliation 8so i don't start a reconciliation from the column, it simply allow me to fnally change it in the single cell one by one actually saving my changes). Thank you for your time.--Alexmar983 (talk) 11:20, 29 February 2020 (UTC)[reply]

Using OpenRefine to add multiple statements to a single property[edit]

I'm looking to do some batch edits on a dataset of artists. Many of the properties require multiple values. So, for occupation (P106), I might need to add the statements "artist," "university teacher," and "curator," along with the appropriate references. Is there a way to do this with a CSV file? Should the statements be in one tab (and another for all corresponding references), or should I create multiple rows for a single artist? I've been combing through some resources and YouTube videos to no avail. Thanks!  – The preceding unsigned comment was added by PMAlibcat (talk • contribs).

@PMAlibcat: yes this is possible. It all depends on how your data is structured in your source CSV. Have you had a look at Wikidata:Tools/OpenRefine/Editing/Advanced_schemas? The "split multi-valued cell" operation should help you obtain the sort of table structure mentioned there. − Pintoch (talk) 09:47, 3 June 2020 (UTC)[reply]

Endless spinning with while uploading (perform Wikibase edits)[edit]

Hi, I love using OpenRefine to create WD items, even organized an OpenRefine workshop at WikiCon, but at the moment I'm at my wits' end. Maybe one of you has an idea how to find out what's going on:

  • a list of historical forts in Cologne with 58 items/rows. Everything is reconciled, new items as new, schema OK and saved, the usual issues reduced to a minimum, preview looks nice
  • Started uploading yesterday: Special:Contributions/Elya. Everything fine with the first 10 items
  • after this, the yellow badge that shows the spinner keeps hanging.
  • I cancelled, filtered out the completed items and try to upload the remaining 48 - no success. The spinner keeps hanging at 0% and nothing happens
  • no error message

steps to debug

  • filter down to 1 item - nothing
  • a different one - nothing
  • restart OpenRefine, browser, computer, used different browser … you name it
  • re-login
  • created new "list" with one simple (completely unrelated) item Nußbaumerstraße (Q96574785) which works!
  • reduced then the original schema to a bare "title, label, description" and P31 even without references – nothing!
  • I even checked if my account was blocked and it was not ;-)
  • ahh … and updated OpenRefine from 3.2 to 3.3 this morning.
  • and yes, I have a very good internet connection.
  • Firefox console – empty

Some of the items have identical titles (2 rings of numbered forts – e.g. "Fort IV" will be there twice), but the description differs (inner/outer). However the upload stops also for items with different titles.

Any idea how to go on debugging? Is there an error log I could check? Is there a known issue that looks similar to this problem?

Thanks for any hint! --Elya (talk) 15:27, 22 June 2020 (UTC)[reply]

@Elya: you could perhaps have a look at the server logs when you observe the problem and report these. On Windows they are in the black terminal window, on Mac you can get them too. One possibility is that you are being throttled by maxlag: in this case, upgrading to 3.4-beta would let you define a higher maxlag value which could solve the problem. Let me know how it goes. − Pintoch (talk) 16:26, 22 June 2020 (UTC)[reply]
@Pintoch: thank you for the ideas … hmmm … I followed the instructions for the logs, and that's what I get:
9:39:07.114 [            refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)
19:39:07.125 [            refine_server] Initializing context: '/' from '/Applications/OpenRefine.app/Contents/Resources/webapp' (11ms)
19:39:07.444 [          org.mortbay.log] failed SocketConnector@127.0.0.1:3333: java.net.BindException: Address already in use (Bind failed) (319ms)
19:39:07.444 [          org.mortbay.log] failed RefineServer@7f13d6e: java.net.BindException: Address already in use (Bind failed) (0ms)
19:39:07.444 [            refine_server] Failed to start server - is there another copy running already on this port/address? (0ms)
Exception in thread "main" java.net.BindException: Address already in use (Bind failed)
… (+ some lines of stacktrace, I think) 
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
I updated Java and downloaded the JDK, however with javac -version I get 1.6.0_65 – no idea if this is good ;-)
Can you see anything in this? Java is a complete black box for me :-| Thanks! --Elya (talk) 18:11, 22 June 2020 (UTC)[reply]
PS: I installed the Beta, same problem. Which value should I use for wikibase:upload:maxLag ?
This seems to indicate that you already have OpenRefine running on your computer and you are trying to launch it a second time, which is not possible. Perhaps the easiest way is to restart your computer and, launch OpenRefine (only once), launch the Wikidata import and then look at the console to see if there are any errors there. Thanks a lot for your patience! − Pintoch (talk) 19:49, 22 June 2020 (UTC)[reply]
@Elya: oh and for the max lag parameter, the default value is 5, if you put something higher OpenRefine will be more aggressive in pushing its edits (even if the servers are a bit busy) − Pintoch (talk) 19:50, 22 June 2020 (UTC)[reply]
@Pintoch: thanks for your patience. I had restarted already, and did it again. There is only one instance running (also checked the activity monitor, only 1 process). No difference. I found the log message, it seems to be just a „cosmetic“ warning, see https://github.com/OpenRefine/OpenRefine/issues/1832. Maxlag = 10: no effect. And I'm trying only with 1 single item! I have the impression it must be the data itself …. Well, I'll take another night to sleep on it. --Elya (talk) 20:43, 22 June 2020 (UTC)[reply]
@Elya: ok, if you could paste the full logs it would help! − Pintoch (talk) 20:49, 22 June 2020 (UTC)[reply]
@Pintoch:. Ok, new day, new tests. I'm convinced that it's the data or the schema, not the software. I can reconcile and could upload a minimal item, reduced to almost nothing: Zwischenwerk XIIb (Q96607218), and, again, a totally unrelated item Jessestraße (Q96607514), but with coordinates. For both I created new schemas and the data was reduced to one line. I'll try to recreate the schema and to re-build the item, property by property (*sigh* just a small idea to upload on Sunday afternoon … 3 days later …) and report it here. However don't understand what made it break in the first place after uploading 10 totally perfect items … --Elya (talk) 16:54, 23 June 2020 (UTC)[reply]
OK. I've got a hint: I further reduced the complexity and landed in a reconciled value Germany (Q183) that shows the following error when hovering over it:
{"status": "error", "message": "invalid query", "details": "'Germany'", "arguments": {"id": "Germany", "lang": "en"}}
same for the next in the row:
{"status": "error", "message": "invalid query", "details": "'Festungsring K\u00f6ln'", "arguments": {"id": "Festungsring K\u00f6ln", "lang": "en"}}
Any idea what could cause this? The error message shows up only irregularly, most of the time the little blue popup is just blank. --Elya (talk) 18:32, 23 June 2020 (UTC)[reply]
PS: I re-reconciled via QIDs (in fact, these were very simple data I could have put them in the schema manually), and now it's done. And now look at this beauty :-) - still no idea what's been going on, but thanks for accompanying me in this mess …--Elya (talk) 20:03, 23 June 2020 (UTC)[reply]

I add something to this old thread for the records, because it's relevant and might help others (and my future me, searching for this in half a year ;-) – it's really helpful using Windows or Linux if you have problems because it will display a permanent and real log while uploading, and giving meaningful error messages. The way you can obtain the logs on the Mac vanished from the OpenRefine documentation, therefore I place a link on the Web Archive until it's back: Installation-Instructions#obtaining-server-logs-on-mac. However, it does not seem to give the same useful messages as Windows does. So if you experience strange upload problems without error messages on a Mac, change to Windows (or Linux) with your project if you have the option, and check the terminal. If anybody using a Mac with OpenRefine has found a more useful log, I'd appreciate any hint. --Elya (talk) 14:36, 28 February 2021 (UTC)[reply]

Wikidata reconciliation stopped working[edit]

Dear all, I have been working with OpenRefine 3.1 for a year. Since a couple of weeks I can no longer reconcile. From the column pull down menu, I select reconcile->start reconciling. Wikidata is available in the list. I click on Wikidata (single click). A window pops up, notifying me that OpenRefine is “Working” and then nothing happens anymore (OpenRefine keeps on “Working”). I upgraded OpenRefine (3.3) and installed the latetst Java version (8.261) but the problem persists. With on difference: if I now choose Wikidata form the list of available reconciliation services it is listed as Wikidata (old) (en). Any idee what I can do to make OpenRefine reconcile again? Thanks in advance and kind regards, Walkuraxx (talk) 13:06, 20 July 2020 (UTC)[reply]

@Walkuraxx: yes, sadly this comes from a recent migration on Toolforge. To fix this, use the "Add standard service" to add the Wikidata service again, with this new URL: https://wikidata.reconci.link/en/api (you can replace "en" by any other language code to get reconciliation in your language). − Pintoch (talk) 14:42, 20 July 2020 (UTC)[reply]

UTC)

@Pintoch:What a good start of the day :-) Problem solved, thank you very much :-) Walkuraxx (talk) 06:10, 21 July 2020 (UTC)[reply]

Edit Wikidata schema not working?[edit]

Hi, I am having trouble creating a Wikidata schema for uploading some edits from Openrefine. I can open the screen and add a new item, but the options 'type item or drag reconciled column here' don't seem to work; the selected column is just not added to the screen and I can't move on to further steps. I'm on Openrefine 3.3 in Firefox. Am I missing something? Best, --RKDdata (talk) 12:35, 12 August 2020 (UTC)[reply]

@RKDdata: that is a known bug that should be fixed in an upcoming version. In the meantime try restarting OpenRefine or switching to English as UI language in OpenRefine. − Pintoch (talk) 07:26, 14 August 2020 (UTC)[reply]

Reconciling with ID[edit]

Hello, I can't seem to figure out how to reconcile items in OpenRefine using the ID Yale University Art Gallery ID (P8583). I would have thought the process would be very simple: I have a CSV file with a number of items; I open it in OpenRefine; I then click on the Label description to reconcile with items already in Wikidata and ask it to use the ID column (with property YUAG ID) and collection (with property collection (P195) linking it to Yale University Art Gallery (Q1568434). While it correctly matches some, it also reconciles with completely unrelated items in other museums that have similar titles/labels. Why is this the case? I would think that with a unique ID associated with a collection, it would be very easy to match these up. It shouldn't be looking for objects that do not have Yale University Art Gallery ID (P8583) or are not in Yale Gallery collection. Is there an easy way to just automatically link IDs? Can I tell OpenRefine just to query for items that have Yale University Art Gallery ID (P8583), retrieve those IDs and then see if there is a match between any ID in the file? Valeriummaximum (talk) 21:06, 20 September 2020 (UTC)[reply]

@Valeriummaximum: sorry about the delay in replying to this! Yes this is a known problem. One way to reconcile only based on the unique ids and without taking labels into account is to create a new column containing some gibberish string which has no chance of matching any label on Wikidata (for instance "anrusecbjéldvecauirnsect" could do). Then, reconcile that gibberish column using the other id column. You should only get matches via the unique id. We have plans to make this more convenient and intuitive in the future. − Pintoch (talk) 14:00, 30 May 2021 (UTC)[reply]

How to represent BCE dates[edit]

Hi, I am wondering how to represent BC dates with OpenRefine. For example, when I have a date like (165 BCE, Julian calendar), I express this as -0165-00-00_Q1985786 but the preview in OpenRefine interprets this as "30 November 167". I am not sure if the issue here is the use of the Julian calendar or the use of the minus sign to represent BCE but it seems to have defaulted to a CE interpretation of the year date.  – The preceding unsigned comment was added by Bouzinac (talk • contribs).

For the record, support for this feature was added in OpenRefine 3.5. − Pintoch (talk) 14:01, 30 May 2021 (UTC)[reply]

BCE dates : other topic[edit]

Hello, I'd like to point this pb :

  • Trying to convert a -1988-05-01 as a "true" date(- meaning BCE...)
  • Edit column from value and type the formula toDate(-1988-05-01) ==> seems to work but it does "1988-05-01T00:00:00Z" not "-1988-05-01T00:00:00Z". How to convert properly to BCE dates ? Thanks  – The preceding unsigned comment was added by Bouzinac (talk • contribs).
BCE dates should be kept as strings (such as `-345`), support for this was added in 3.5. − Pintoch (talk) 14:02, 30 May 2021 (UTC)[reply]

Reconcile using several authors[edit]

Hi! I would like to reconcile scholarly articles (Q13442814) using not only their title (query), but also their authors (P50 and P2093). However, I'm getting some strange results, which I'd rather explain with an example.

I'll use Q27890423 as an example, with DOI (P356; which is an unique identifier) "10.1038/NATURE20118". This entity has several author (P50) statements (e.g., Q60056477 "David Xing", Q60161584 "Elodie Rey") and several author string name (P2093) statements (e.g., " David Borton", "Eduardo Martin Moraud"). I tried different queries (using https://wikidata.reconci.link/en/api) of the form:

queries={"q0":{"query":"","type":"Q732577","properties":[{"pid":"P356","v":"10.1038/NATURE20118"},{"pid":"P50|P2093",v:<AUTHORS>}],"type_strict":"should"}}

changing the <AUTHORS> part. In the table below I focus on the "value" field for the "P50|P2093" feature of the result:

<AUTHORS> "P50|P2093" feature value
["David Xing"] 100
["Elodie Rey"] 100
["David Xing","Elodie Rey"] 65
{"id":"Q60056477"} 100
{"id":"Q60161584"} 100
[{"id":"Q60056477"}] 33
[{"id":"Q60056477"},{"id":"Q60161584"}] 27

I'm confused how these scores (values) are calculated. Particularly, I don't understand why providing more than one matching author lowers the score. I couldn't find anything in the documentation. Could you help me understand this? Thanks! --Diegodlh (talk) 01:07, 26 March 2021 (UTC)[reply]

@Diegodlh: This is a fantastic bug report! Sorry that I don't check this page very often. I have migrated it to a GitHub issue and will investigate the problem. − Pintoch (talk) 14:09, 30 May 2021 (UTC)[reply]
@Pintoch: Thank you very much! I will now retry to use this in my code! --Diegodlh (talk) 21:01, 31 May 2021 (UTC)[reply]

OpenRefine now available on PAWS![edit]

PAWS is a wikimedia cloud tool that provides hosted access to Jupyter notebook and other tools without needing any local installation. I've just set up OpenRefine on it too! You can access it with this link: https://hub.paws.wmcloud.org/hub/user-redirect/openrefine. You'll have to login with your wiki credentials, and then maybe refresh the page once (if you get an error). This lets you use OpenRefine without needing to install it locally. Hope this is useful! YuviPanda (talk) 20:32, 22 May 2021 (UTC)[reply]

@YuviPanda: I can certainly see some use (one person in a GLAM recently complained that they can't install anything on their work computer and wanted an online version, that's done now!).
I just did a quick first try and everything seems to work correctly.
What are the pro and con of using this online version: is it faster/slower/same? is there any difference in the interface or use of it? (didn't saw one for now) could an OR project be shared or even multiple user collaborate on a same project? will it be automatically updated for the next version of OpenRefine? (throwing some random question, feel free to not answer them it they don't make sense)
Cheers, VIGNERON (talk) 07:08, 23 May 2021 (UTC)[reply]
Glad it seems useful, @VIGNERON:! The primary 'pro' is that you don't have to install anything on your computer, and it will work on any machine with a web browser. The 'projects' are stored in the cloud, and so you can access them from different machines as well. The installation can be easily updated to a new version - and that'll update it for everyone. If required, we can easily add extensions there by default as well. I'm not sure how much memory is allocated for it though. Can you tell me more about what 'collaborate on multiple projects' together means? I've never used OpenRefine so am not sure. YuviPanda (talk) 15:41, 24 May 2021 (UTC)[reply]
@YuviPanda thank you very much, that looks amazing! I think it already solves T194767 / T223604 to a great extent; if there’s some way for different users to collaborate on it (as VIGNERON also asked), that would probably make it even more useful, but I don’t know if that would be easy or not. Lucas Werkmeister (talk) 11:53, 23 May 2021 (UTC)[reply]
Amazing! Looking forward to trying it out. Thanks for working on this! − Pintoch (talk) 12:47, 23 May 2021 (UTC)[reply]
@YuviPanda, Lucas Werkmeister, Pintoch: I just did a ~10k items import (adding a description in breton) and it worked well (at least, as well as on a local installation, I didn't see any difference, except not seing the terminal which can be a bit annoying when you want to see where a problem comes from exactly).
For the collaborative aspect, I wrongly thought that files on PAWS where public, my bad (this is not frequently needed anyway).
Cheers, VIGNERON (talk) 14:16, 27 May 2021 (UTC)[reply]
@VIGNERON Files in PAWS are public PAWS:User:VIGNERON Chico Venancio (talk) 14:38, 27 May 2021 (UTC)[reply]
@Chicocvenancio: so I was right the first time, thanks. How can I find these URLs? Cheers, VIGNERON (talk) 14:43, 27 May 2021 (UTC)[reply]
Tested too, nice. Two questions : if ORefine upgrades, does OR on PAWS upgrade too? I had to enter my wikidata credentials. Hope these credentials aren't stored by PAWS. Nice job anyways! Bouzinac💬✒️💛 15:29, 27 May 2021 (UTC)[reply]
I've been regularly working on a large OpenRefine project in PAWS for several weeks now. It works like a charm (apart from the occasional refresh that's needed at startup). @YuviPanda: would there be any objection that I add information on how to use the PAWS version at Wikidata:Tools/OpenRefine (draft here)? Or is it still too unstable for that? Spinster 💬 19:46, 3 August 2021 (UTC)[reply]
Great to hear it works well, @Spinster:! Feel free to add it to the documentation. YuviPanda (talk) 10:45, 19 August 2021 (UTC)[reply]
Phew, finally ✓ Done - I hope I didn't break the page translation. @YuviPanda: I took the freedom to mention your username, in case folks have questions. Spinster 💬 17:59, 6 September 2021 (UTC)[reply]
@YuviPanda, Spinster: I'm also using OpenRefine on PAWS, I just did a 2500 item creation for documents where I'm doing my WiR. It's great (no need to worry about IT installing OpenRefine on my computer, which can be unexplicably difficult). I have a question tho, on desktop I sometimes use the console screen to check and understand what happens (for example when there is a non-trivial problem, for instance here 20 out of 2500 items where not created, but they are very similar, I don't see why and where is the issue... probably obvious but the console screen sometimes help me to fond or at least narrow the origin of the said issue). Cheers, VIGNERON en résidence (talk) 07:34, 5 October 2021 (UTC)[reply]

OpenRefine won't reconcile to human items[edit]

For a while, I've being using OpenRefine's reconciliation feature to mass locate potential existing items. But recently, whenever I configure the reconciliation to find human items, it doesn't show potential matches despite the normal Wikidata search doing so. Is there a way to fix this? Please ping me since I'm not watching this page. ミラP@Miraclepine 20:19, 9 October 2021 (UTC)[reply]

Hi Miraclepine, sorry for the delay in replying to your question! I would recommend you register the new reconciliation service for Wikidata, currently at https://wikidata.reconci.link/en/api. That should work better than the built in one (which I am assuming you are using?). − Pintoch (talk) 13:15, 4 November 2021 (UTC)[reply]

New user's misadventures[edit]

Hi, after Sandra's tutorial about OpenRefine during WikidataCon, I decided to give it a try. I have a small spreadsheet with info on some new items I want to add. I successfully loaded it and picked some random column and clicked reconcile/actions/Create_a_new_item_for _each_cell option which gave this column green underline. Than I clicked Wikidata/Edit_wikidata_schema, and began working on the schema. It is my understanding that all reconciled columns should be marked with a green underline, but that did not happened. I can create new item, but than window type item or drag reconciled column here opens and I can not drag anything there. Typing column name does not work either. I can create schema without it but then I can not save it or create any QuickStatements without this step. I am using OpenRefine 3.4.1, tried Firefox and Chrome and also tried OpenRefine from PAWS. I always run into the same issue. Any recommendations? --Jarekt (talk) 22:59, 3 November 2021 (UTC)[reply]

Might be the same issue as reported in Wikidata_talk:Tools/OpenRefine#Edit_Wikidata_schema_not_working?. --Jarekt (talk) 23:06, 3 November 2021 (UTC)[reply]
Hi Jarekt, thanks for trying it out! You have run into issue #1608. We should fix this, but in the meantime you could work around the problem by first reconciling your column with Wikidata (in your column menu: Reconcile -> Start reconciling) and only after that, marking cells as new. − Pintoch (talk) 13:14, 4 November 2021 (UTC)[reply]
Pintoch thanks for replying. That was exactly what I ended up doing. I did some experiments with other batches that required true reconciliation and they worked well, so I did mock reconciliation on my first batch and than marked cells as new. I still have some questions about my second batch, where I started with the data from this query, to get bunch of Q-ids and painting names, I extracted peoples names from second column and tried to reconcile while restricting items to human (Q5). Reconciliation took a while abut did not find any suggestions. Luckily I can click on each name and search and that step returned great many matches. Is there a way to guide the reconciliation to display a list of people with each name? For example if I have string "Jan Kasprowicz" and "Irena Solska", in the reconciled column how do I do it so it finds Jan Kasprowicz (Q962432) and Irena Solska (Q4428561). Also Is there an easy way to reconcile a column of q-codes, I only got about 90% of matches? --Jarekt (talk) 02:56, 5 November 2021 (UTC)[reply]

Upgrade OpenRefine on PAWS to v3.5.0 as well?[edit]

OpenRefine very recently released a fresh version 3.5.0. It has some neat improvements and new features (including the ability to view more than 50 rows/records at once). I think it would be great if the OpenRefine version running on PAWS could be upgraded to OpenRefine v3.5.0 as well. @YuviPanda: would you perhaps be willing and able to do this? Maybe this is ridiculously easy and someone else is able as well - in that case, it may be nice to document how it can be done? Cheers, Spinster 💬 14:12, 20 November 2021 (UTC)[reply]

It's probably not super easy but +1 with Spinster (I'm eager to try the "arbitrary Wikibase instances" functions). Cheers, VIGNERON en résidence (talk) 08:55, 24 November 2021 (UTC)[reply]

Search for match link is not displayed in the new version of OpenRefine[edit]

I recently decided to upgrade to the latest version of OpenRefine and ran into a very strange bug. I have several exported projects of older version (3.4.1) that I am currently running in updated one. For some reason, the Reconcile function now works incorrectly, since the link to the Search for match has disappeared somewhere, although everything works perfectly fine on the old version (checked). The crux of the problem is that now all projects of the old version (3.4.1) do not display a link to Search for match in the new version of the tool (3.5.1). In order for this link to appear again, you will need to go back to an older version of the tool.

P.S. I also found a discussion on GitHub which describes the same problem. Regards Kirilloparma (talk) 00:57, 2 January 2022 (UTC)[reply]

UPD: Alright, looks like I know what's wrong here ... It turns out that the newer version of OpenRefine uses a different API. The previous version 3.4.1 uses Wikidata (en), while the newer one uses API called Wikidata reconci.link (en), and that is exactly why Search for match link is missing everywhere. To fix this problem, I just added a link to the API of 3.4.1 version and it worked (see the result in my video). The only thing I still don't understand is why the changelog doesn't indicate that the API was changed? When I downloaded the new version of the tool, I immediately discovered that the API is called differently, but I did not know that this was some completely new API that is incompatible with the previous version of OpenRefine. Well, at least I was able to fix this problem and continue my unfinished projects from the old version :). Regards Kirilloparma (talk) 01:01, 5 January 2022 (UTC)[reply]

Propose to move article title to Wikidata:OpenRefine[edit]

Proposed move Wikidata:Tools/OpenRefine --> Wikidata:OpenRefine

Deciding whether to give a tool its own page or assign it to be a subpage is an arbitrary decision. In general new and small tools benefit from being a subpage because they get attention from a broader project, and in general older established tools should have their own pages so that users better interconnect them with other parts of Wikidata.

OpenRefine is one of the most popular and best established Wikidata tools in terms of documentation and userbase. Because it is a subpage of tools, I think it is under-catalogued and too narrowly framed as a technical tool. I propose to move it to its own page, set up its own category, and point to it from other projects including as a focus for the social discussion of sorting data and not only as a technical tool for doing work. I feel that for as long as OpenRefine is a subpage of tools that expanding on its community use would be out of place.

I propose to move it. Thoughts, objections?

OpenRefine on PAWS problem[edit]

@YuviPanda@Spinster Recently I have problem with OpenRefine on PAWS. the link does not work, there is no OpenRefine option for new notebook anymore. Jklamo (talk) 12:08, 16 July 2022 (UTC)[reply]

Hope they are upgrading it to a newer version. Infrastruktur (talk) 13:42, 16 July 2022 (UTC)[reply]
Me too Shizhao (talk) 13:05, 23 July 2022 (UTC)[reply]

stack[edit]

this job is stack. should I do somethung to keep it work? Geagea (talk) 13:20, 22 July 2022 (UTC)[reply]

Jast an update. The job was stack because the Q-id was a redirect. Geagea (talk) 10:21, 19 January 2023 (UTC)[reply]

Adding a sequence of identical property/value pairs with different qualifiers[edit]

I'm working on this wikibase: [ottgaz.org]

I need to enter a succession of property/value pairs with different qualifier dates, like this:

status  eyalet
          start time   17. century
          end time     1822
        eyalet
          start time   1822
          end time     1840

The example is at the bottom of the entry here: [[3]]

I can't do this with Quickstatements, as @Pintoch explained when I posted a version of this question on Wikidata Quickstatements Help). If I use quickstatements like this

Q43 P15 Q496 P7 +1601-01-01T00:00:00Z/7 P8 +1822-01-01T00:00:00Z/9
Q43 P15 Q496 P7 +1822-01-01T00:00:00Z/9 P8 +1840-01-01T00:00:00Z/9

I get a result like this

status  eyalet
          start time   17. century
          start time   1822
          end time     1822
          end time     1840

I can't seem to work out how to do it using an OpenRefine schema either, however. FWIW I posted a screenshot of the schema on my mastodon.

Any suggestions? Will Hanley (talk) 03:13, 18 December 2022 (UTC)[reply]

Numeric values with tolerance?[edit]

Hi. Is it possible to add tolerances / confidence intervals to numberic values using OpenRefine? Similar to QuickStatements, where I could use 30~5 to express 30 +/- 5? Yellowcard (talk) 20:54, 16 February 2023 (UTC)[reply]

@Yellowcard: yes it can, the syntax is documented here: https://openrefine.org/docs/manual/wikibase/schema-alignment#quantities (or on-wiki: Wikidata:Tools/OpenRefine/Editing/Schema_alignment#Quantities). − Pintoch (talk) 07:52, 17 February 2023 (UTC)[reply]
@Pintoch: Thank you, I did not find this passage. However, I have some issues to understand this notation. How would you express something like 40.2 +/- 2.7? Again, thank you. Yellowcard (talk) 08:20, 17 February 2023 (UTC)[reply]
@Yellowcard: that is not supported yet, indeed. One could add support for more syntaxes though. − Pintoch (talk) 21:02, 1 March 2023 (UTC)[reply]
@Pintoch: That's a pity. What do you mean by "One could add support for more syntaxes though."? Yellowcard (talk) 08:22, 2 March 2023 (UTC)[reply]
@Yellowcard: I mean that OpenRefine could be changed so that it also accepts values such as 40.2±2.7. The first step for this would be to file a feature request on OpenRefine's forum or as a GitHub issue if you have a GitHub account. − Pintoch (talk) 08:43, 2 March 2023 (UTC)[reply]

Add column from reconciled values with qualifier[edit]

I'd like to download review scores (P444) for awards & prices, but only those with "review scored by" (P447) the "International Congress of Distinguished Awards" (Q58890465). See the Nobel Prize for a prize with two review scores. Is this possible? Richirikken (talk) 15:18, 11 July 2023 (UTC)[reply]

@Richirikken: that is sadly not supported yet. There is some discussion about this on GitLab and on the previous GitHub issue. − Pintoch (talk) 22:21, 29 January 2024 (UTC)[reply]

Add column from reconciled values with unit[edit]

I'd like to download prize money (P2121) for awards & prices, but not only the number, but also the unit (Dollar or Euro, etc.). See Q1543268 as example. Is this possible?

best regards Richard Richirikken (talk) 08:15, 17 July 2023 (UTC)[reply]

@Richirikken: sorry for the delay! At the moment there is no simple way to do this I am afraid. You could open a request for this on GitLab. − Pintoch (talk) 22:19, 29 January 2024 (UTC)[reply]

Recover data from simple url for 2 wikidata id[edit]

Hi, I would like to know if, with OpenRefine, it is possible to recover information from these two URLs? The first one is related to authors (P12256) and the second one about scientific articles (Property:P12234). ORBI is a scientific repository from the university of Liege and numerous authors have already their wikidata page. The idea would be to recover the ids and add them to wikidata. The database is quite large (> 14000 authors and 200000 articles). Thanks you in advance. Gabon100 (talk) 20:47, 29 January 2024 (UTC)[reply]

@Gabon100: it looks like you'd first need to get the information out of ORBI in a structured way. Scraping could be a solution but I suspect there are easier ways, if ORBI offers some ways to export its data as a dump or an API. I'd be worth checking those first. Then, OpenRefine should be more useful for the linking part, matching the authors to Wikidata. As you can see above, this page isn't very actively watched, so you might have more success on OpenRefine's forum or the OpenRefine-Wikimedia Telegram channelPintoch (talk) 22:17, 29 January 2024 (UTC)[reply]
@Pintoch thank you! I will see with the ORBI team if I can retrieve the fulldata base. It will be easier but I will also asked my questions to the forum. Regards. Gabon100 (talk) 09:49, 30 January 2024 (UTC)[reply]
Hi @Pintoch, I had discussion with the ORBI team and they should give me acces to a full database, which will be easier. Thanks. Gabon100 (talk) 10:19, 5 February 2024 (UTC)[reply]