Wikidata talk:WikiProject Sweden/Swedish Riksdag documents

From Wikidata
Jump to navigation Jump to search

Laws and acts[edit]

We have implementing regulation (Q2795484) and statute (Q820655), are those currently correctly modeled so that we can use those as our main types for laws? Ainali (talk) 15:49, 26 June 2020 (UTC)[reply]

Test extracting data[edit]

- Salgo60 (talk) 17:13, 26 June 2020 (UTC)[reply]

Clean-up tasks for motions and propositions[edit]

So, we've recently mass imported almost all motions and propositions (M&P) from Riksdagen. Before we continue with adding documents that in turn reference the M&P, I thought I'd share a list of issues that I've discovered so far.

1. Motions missing authors

As of today, about 20% of motions lack author (P50). This is due to a variety of reasons, among them misspelled names in the source data. A fuzzy matching script should be able to handle most of these cases.

2. Propositions missing signatures

Only propositions from after ~2004 have signatory (P1891).

3. Incorrect motion titles

I guesstimate about 1000 motions have incorrect titles due to errors in Riksdagen's OCR process. These fall into a number of subcategories, including spelling mistakes, line breaks causing only parts of the title to be included, the author's name being the title etc.

4. Motions with a 2 appended to the title

When mass importing motions, many labels + descriptions were duplicates, so I tried to temporarily append an ordinal number to the titles, to be manually fixed at a later point. It turns out quite a few duplicates existed, so we probably have to write a script removing the 2 and making the descriptions more detailed to solve this.

Query for motions ending with 2: https://w.wiki/Z7q

5. Missing motions

Some motions are simply missing. One of the reasons for this is that the OCR process went so bad that it has a whole paragraph as the document title, which WD denies being written to the database.

6. Missing propositions

These are just a handful, and I have to investigate what went wrong.

This is a great write-up, thanks for that Popperipopp! To me it seems like only 5 and 6 are somewhat blocking going forward with report (Q10429085) whereas 1-4 can be fixed independently. Do you agree on that or do you see more troubles? Ainali (talk) 08:22, 9 August 2020 (UTC)[reply]
That depends on whether the ongoing work with linking motions and propositions depends on the title being correct or not. Maybe Belteshassar can chime in here. Popperipopp (talk) 10:20, 9 August 2020 (UTC)[reply]
Still only early thoughts, but I think titles are too similar and difficult to parse out to be useful. I was planning to primarily use legal citation of this text (P1031) to establish the citation links. Also thinking I should wait with the OCR motions until we have a view of the quality to avoid inserting OCR errors into Wikidata. Belteshassar (talk) 10:30, 9 August 2020 (UTC)[reply]
In that case, yes I agree with your assessment Ainali.
It looks like the reports are really good in their citations and even use Swedish Riksdag document ID (P8433). That should make it even easier than legal citation of this text (P1031). Se this example from Boating under influence (Q96978927):
<pre>
<referens>
<referenstyp>behandlar</referenstyp>
<uppgift>motion 2017/18:3575 Utvärdering av bestämmelsen om sjöfylleri</uppgift>
<ref_dok_id>H5023575</ref_dok_id>
<ref_dok_typ>mot</ref_dok_typ>
<ref_dok_rm>2017/18</ref_dok_rm>
<ref_dok_bet>3575</ref_dok_bet>
<ref_dok_titel>Utvärdering av bestämmelsen om sjöfylleri</ref_dok_titel>
<ref_dok_subtitel/>
<ref_dok_subtyp>mot</ref_dok_subtyp>
<ref_dok_dokumentnamn>Motion</ref_dok_dokumentnamn>
</referens>
</pre>
Ainali (talk) 11:43, 9 August 2020 (UTC)[reply]


8. Wrong file format for XML documents

I noticed in Q98753204 that file format (P2701) qualifier for full work available at URL (P953) for the documents using Riksdagen's own XML format state Office Open XML Wordprocessing Document, ECMA-376 1st Edition (Q3033641) where just Extensible Markup Language (Q2115) would be expected (at least until Riksdagen's format is added as their own format if ever).

--Moonhouse (talk) 13:53, 13 September 2020 (UTC)[reply]

Update: All motions now have author statements. Popperipopp (talk) 14:54, 2 July 2021 (UTC)[reply]

SOU cleaning tasks[edit]

There are lot's of SOU reports that don't have proper titles in Swedish Parliament database (Q21592569), e.g. Q98462109.

Here is a query that gives almost 1,000 objects right now. Is there a way we can correct these in bulk, e.g. using the site http://www.sou.gov.se/? -- Belteshassar (talk) 14:58, 29 August 2020 (UTC)[reply]

I tried finding the correct titles by parsing the title from the XML documents, but it's rather messy. I think I can clean up ~180 documents at this point. Popperipopp (talk) 16:34, 29 August 2020 (UTC)[reply]
Just a note, in case it was not on the radar. Wikimedia Sverige did an import of ~6000 SOU reports which were digitised by the royal Library. See more info on phab:T241458. These should all have proper titles but not legal citation of this text (P1031). @Alicia Fagerving (WMSE) who handled the upload. /André Costa (WMSE) (talk) 09:58, 17 December 2020 (UTC)[reply]
To clarify. While they don't have legal citation of this text (P1031) the same info is available in the Swedish alias field. https://w.wiki/qvi.
Note that the modelling differs significantly with the WMSE imports all being part of Swedish Goverment Official report (Q7604015) (which is recognised with an ISSN) and the latter imports being linked to per year series such as Swedish government official reports 2003 (Q98456743).
There seems to be lots of duplication between the two efforts. See e.g. Q80298906 vs. Q98462462 /André Costa (WMSE) (talk) 11:04, 17 December 2020 (UTC)[reply]
Thanks for clarifying. Are they really duplicates though? The first import seems to describe the editions, while the latter describes the "works in themselves" (I'm lacking the proper terms here). Popperipopp (talk) 18:13, 18 December 2020 (UTC)[reply]
(Switching to my volunteer hat)I can buy that the two are separate, i.e. Q80298906 talking about the printed edition and

Q98462462 being the more elusive work of which also a translations would be an edition. This separation has two direct effects though.

The first is that any citation referencing a particular page should point to the edition since the work itself has no pagination (alternatively there might be a qualifier for "page number refers to edition"). I believe all such citations point to the work item today.
The second is that the full work available at URL (P953) statements should probably all be moved to an edition item since the work item by itself has no corresponding text. Unclear if the current values should be considered digitalisations of the printed edition or separate editions in their own right. /Lokal_Profil 12:45, 25 December 2020 (UTC)[reply]
I agree with with the description of the effects this has. But after having thought about the whole issue again, I'm not so sure the distinction is that useful in the first place. Perhaps we should just merge the duplicates and have all the reports be instances of both inquiry report (Q98457443) and version, edition or translation (Q3331189)? Popperipopp (talk) 21:08, 10 June 2021 (UTC)[reply]
I would be inclined to agree with @Popperipopp: here. Don't think the work/edition split is useful here. If we choose to keep them separate, we should tie them with edition or translation of (P629) / has edition or translation (P747). That will simplify moving the citations to the edition items. Belteshassar (talk) 16:23, 1 July 2021 (UTC)[reply]
I think the main issue with merging would be a muddling of sources with them separate it is clear if a citation is for the Swedish text or an English translation. With the work and (Swedish) edition combined it is less clear if the citation is for the Swedish text or if someone actually pointed it to the work by mistake (which is easy to spot in the separated case). I'm assuming these are rarely reprinted/republished otherwise those editions of course also start to cause issues. /Lokal_Profil 08:40, 2 July 2021 (UTC)[reply]
We're now down to ~560 SOU reports with missing titles. Popperipopp (talk) 08:07, 8 June 2022 (UTC)[reply]
The only ones remaining now are partial reports, related to Wikidata talk:WikiProject Sweden/Swedish Riksdag documents#Pdf files in property P953 often only contain part of the corresponding HTML document for the SOU. Popperipopp (talk) 09:05, 10 November 2022 (UTC)[reply]

Remissinstans[edit]

Is there a good way of modelling the involved consultation body (Q10650842) for bill (Q686822) (and possibly other documents)? Having this data would be an interesting way of seeing who has influenced the decisions./Lokal_Profil 12:42, 27 October 2020 (UTC)[reply]

Correct me if I'm wrong, but isn't it mainly inquiry reports that are sent out for consultation? Anyways, I wonder if it would make sense to do significant event (P793)consultation (Q10650843) with start and end time as qualifiers. One could then add a addressee (P1817) for each consultation body (Q10650842). Alternatively, are the responses to the consultation notable in their own right? It would be nice if we could link the responses published on the government website. Belteshassar (talk) 17:36, 27 October 2020 (UTC)[reply]
So your suggestion would be something like the following?
significant event
Preferred rank consultation
start time
addressee The Union for Professionals
Swedish Public Employment Service
...
0 references
add reference


add value
I think addressee (P1817) would suggest a list of everyone who was requested to answer the consultation, rather than a list of everyone who actually did submit one. Unsure which of these should be captured. If the consultation has it's own item then both could of course be captured.
It's unclear to me if each consultation warrants it's own item (if the same consultation can be referenced from multiple documents then probably yes) and if so what the referencing property would be. /Lokal_Profil 09:10, 5 November 2020 (UTC)[reply]
Yes, that looks like a good model. And I agree only those requested to answer should be listed. Ainali (talk) 20:44, 5 November 2020 (UTC)[reply]
Ainali Those requested to answer, or those requested to answer, who also answered?
As an aside I spotted https://github.com/DinRiksdag/OpenRemiss which could probably be used for 2015-2019 data and to spot relevant consultees missing in Wikidata. /Lokal_Profil 20:28, 12 November 2020 (UTC)[reply]
Lokal Profil: The first, everyone who was requested to answer. That's important enough in itself (it's even a good way to notability on svwiki). Ainali (talk) 21:39, 12 November 2020 (UTC)[reply]
I made a first attempt here Q98458551#P793. Good news is we have very good coverage of "remissinstanser" in Wikidata. I only had to create one new item. It's however extremely tedious to do this through the normal web interface and it slows down my browser considerably with so many qualifiers. I'm considering trying to build a custom tool for scraping the pdfs and mapping names to Wikidata-items. Belteshassar (talk) 09:32, 13 June 2021 (UTC)[reply]

Mapping to Zotero types[edit]

I am contemplating how we can add support for these types in Zotero's Wikidata translator. This would have at least two benefits:

  1. Make it possible for scholars to browse these documents on Wikidata and add the to Zotero with a single click using the browser plugin. Given that legal scholars mostly use footnotes or inline references for offcial documents, I'm not if there will be any interest, though.
  2. Make it possible for wikipedians to cite documents in Visual Editor by just entering the Qid. At least I would use that function every time I want to reference an official document.

Currently editions of SOU and bills (propositioner) are already compatible. To make the remaining types work, I think all that is needed is to add the right type mapping here from the Wikidata instance of (P31) we use to the most suitable Zotero item type. Presenting my initial thoughts in the table below. Please weigh in.

Wikidata type Zotero item type
statute (Q820655) statute
constitution (Q7755) statute
implementing regulation (Q2795484) statute
individual motion (Q96739634) bill
multi-party motion (Q97695021) bill
follow-up motion (Q97695043) bill
committee group motion (Q97695005) bill
party motion (Q97695011) bill
report (Q10429085) report
interpellation (Q1505023) letter?
written question (Q99045339) interview?
record of meeting of the Riksdag (Q98467717) ?
committee directives (Q98491862) ?

Belteshassar (talk) 10:57, 9 January 2021 (UTC)[reply]

@Belteshassar: This looks really good. I think you have good suggestions and for your two question marks, I believe letter could work for both of them. Ainali (talk) 21:21, 16 January 2021 (UTC)[reply]

I have started adding references to the public inquiries in NAD. Unfortunately I don't see any easy way of doing this programmatically so help is welcome. An important note is that we want to link to the authority post for the inquiry organization and not to the archive they left behind. These are normally on the form SE/RA/1XXXX. E.g. for Q98591462 doavoid

Q98591462Swedish National Archive reference code (P5324)SE/RA/101010385[1]

and avoid do

Q98591462Swedish National Archive reference code (P5324)SE/RA/325617[2]

Belteshassar (talk) 09:47, 21 February 2021 (UTC)[reply]

After discussion off-wiki with Salgo60 I realize that this is not a good solution. The NAD-codes are not guaranteed to be unique between agents and archives, and the archive seems to take precedence as it works now. It think we may want to wait for a property for the new GUID style identifiers. See also older discussion on svwiki. Belteshassar (talk) 10:53, 21 February 2021 (UTC)[reply]
I have now gone ahead and created the property proposal. Belteshassar (talk) 20:46, 12 June 2021 (UTC)[reply]
Here's how I do things now: I use Swedish National Archive reference code (P5324) to link to the actual archive as indicated above SE/RA/325617 and Swedish National Archive agent ID (P9713) to link the authority post for the organization zMkOg4RGYIBQKgwrgd2lA0 Belteshassar (talk) 08:42, 18 July 2021 (UTC)[reply]

Commission of inquiry cleaning tasks[edit]

We have imported a lot of commissions of inquiry. As I have been going through and adding members to the commissions, I have noticed a few irregularities that we may be able to correct.

1. Missing reports

There are quite a few missing reports that could be added to the commissions. Sometimes the reports may not have been imported to Wikidata for some reason, which complicates things.

2. Clarify the role of the commission in the report

Most SOU reports are written by the commissions collectively, but there are other types of reports where the commissions are more of a publisher or editor.

3. Specific errors

Here I just list errors or things that seem odd for later fixing or reporting to Riksdagen.

  • Q98592523 only has a special inquiry officer (Q98464946) according to Skr. 1995/96:103 so the committee members probably belong somewhere else.
  • Fixed Q98592381 likely the titles have been parsed wrong.
  • Q98591495 was ended and then called in again as Q98591350. Same members and name second time. Authorities treat it as one entity and so would a Wikipedia article. They do however have two different numbers: Fi 2005:01 and Fi 2007:01, respectively. Should they be merged or should a new item be created to combine the two?
4. Add VIAF ID (P214) where avaiable

Many have VIAF, but watch out for similar names and conflated identifiers. This query is helpful and gives a link to search VIAF with a click. I started from the top and added a filter to hide those checked.

Belteshassar (talk) 08:55, 18 July 2021 (UTC)[reply]

General maintenance queries[edit]

I have made two queries that might be useful to discover errors and anomalies.

I have also put them on subpages so that changes can be noticed on the watchlist: Wikidata:WikiProject Sweden/Swedish Riksdag documents/Most common properties on items linking to a certain document type, Wikidata:WikiProject Sweden/Swedish Riksdag documents/Number of documents with P8433 by P31

However, the first one can have all the types, so checking it manually with the first query and changing the type (using the values in the latter one) is also useful. I didn't manage to put all types in that query without it timing out, if someone can fix that, it would be great as the watchlist could be the primary tool and no manual checks would be required. Ainali (talk) 07:46, 29 August 2021 (UTC)[reply]

Just a note that I found the first query to a bit deceptive, as it does not show the Most common properties on a certain document type, but rather Most common properties on items linking to a certain document type. I moved the page with list (and fixed the link above), but then also realized that the first kind of query also might be useful, so recreated it with how it should be on Wikidata:WikiProject Sweden/Swedish Riksdag documents/Most common properties on a certain document type. Ainali (talk) 08:20, 2 April 2023 (UTC)[reply]

WikidataCon award[edit]

Last weekend during WikidataCon, this project got the community award "Sustainable institutions". Congratulations all!

Ainali (talk) 21:11, 4 November 2021 (UTC)[reply]

Move statements for authors[edit]

I just discovered moved by (P6939), and I think that fits better than author (P50) for our different kind of motion (Q452237). I propose that a mass move of the statements is justified. @Popperipopp, what do you think? Ainali (talk) 07:27, 26 August 2022 (UTC)[reply]

Agreed. It's more specific and is a better property for this use case. Popperipopp (talk) 09:01, 26 August 2022 (UTC)[reply]

Info from Riksdagsinformation[edit]

There are no plans to add legislative committee (P7727) to motions from the period 1971-1984/85 at this moment, although it seems to be a long term vision on their part. QubeCube (talk) 09:29, 30 August 2022 (UTC)[reply]

Pdf files in property P953 often only contain part of the corresponding HTML document for the SOU[edit]

Maybe this is already known, but I just noticed that most, if not all, of the pdf's added as full work available at URL (P953) for inquiry report (Q98457443) with a title (P1476) starting with "sou ", do only contain part of the document. The missing part(s) seem to be in other pdf's with name(s) ending with "d2.pdf", "d3.pdf" and so on. The below query lists these items:

  • Link to query that lists (right now 314) items with this problem.

I made an effort to add a value for full work available at URL (P953) in one of these objects, see d:Q98462162#P953, but the qualifier chapter (P792) should probably be replaced by something more appropriate that does not give rise to a property constraint violation.

@Popperipopp: who seems to have added most of these links. -- Larske (talk) 20:29, 21 October 2022 (UTC)[reply]

I'm aware of this problem, but I haven't figured out any possible solutions to it. As I see it, the problem lies with full work available at URL (P953) explicitly referring to complete works. I suppose we could either 1) create objects for both the partial documents (d1, d2 etc) and the SOU in its whole and link them together with e.g. part of (P361), or 2) continue on the effort you made above with some other appropriate qualifiers. Maybe issue (P433) or applies to part (P518)? Popperipopp (talk) 09:31, 22 October 2022 (UTC)[reply]
Hm. This sounds like a side effect of terrible data handling at the source. Would it not be more feasible to ask them to fix it upstream? Who wants half a pdf? No one? So9q (talk) 13:17, 9 November 2022 (UTC)[reply]

I've ran my citation script again over the last couple day and there quite a few documents not found via legal citation of this text (P1031) for one reason or another. I've pasted the log here if anyone want to take a look. Belteshassar (talk) 14:13, 6 November 2022 (UTC)[reply]

Was this resolved @Belteshassar:?

Uploading new documents[edit]

New parliament year and new motions, interpellations and written questions. Since this project began in 2020, the main uploads has faithfully been made by @Popperipopp. @Ainali and I has been wondering how the mass uploads are done, and are wondering if this information could be shared, so that the task doesn't fall single-handedly on Popperipopp. Would this be possible to show or discuss this in some manner? QubeCube (talk) 19:30, 4 October 2023 (UTC)[reply]

Yes, that has been my plan all along. But I have yet to clean up the code so it's decent enough for public release. I'll tackle that as soon as possible. Popperipopp (talk) 11:53, 28 November 2023 (UTC)[reply]
Just a note for the archive that the code is now published at GitHub. Ainali (talk) 10:32, 12 January 2024 (UTC)[reply]

I started a new project to analyze all sentences from all 600k Riksdagen documents[edit]

See https://github.com/dpriskorn/riksdagen_sentences for details. So9q (talk) 20:24, 23 December 2023 (UTC)[reply]

I asked the Swedish goverment for data about their "expense categories"[edit]

Unfortunately they seem rather disinclined to answer so far. See https://handlingar.se/request/oppna_data_om_statens_utgiftsomr So9q (talk) 09:13, 24 December 2023 (UTC)[reply]

How were you planning to add it to Wikidata? Ainali (talk) 10:31, 12 January 2024 (UTC)[reply]
They are mentioned a lot in labels of "motioner". So an item for each might be suitable. I haven't thought a lot about it to be honest. So9q (talk) 12:28, 13 January 2024 (UTC)[reply]