Wikidata:Property proposal/UKÄ standard classification of Swedish science topics 2016
UKÄ standard classification of Swedish science topics 2016
[edit]Originally proposed at Wikidata:Property proposal/Natural science
Description | Swedish Higher Education Authority standard for classification of scientific articles from scientists in the higher education system of Sweden |
---|---|
Data type | External identifier |
Example 1 | biology (Q420) → 106 |
Example 2 | structural biology (Q908902) → 10601 |
Example 3 | ethology (Q7155) → 10613 |
Example 4 | chemistry (Q2329) → 104 |
Example 5 | physics (Q413) → 103 (named as Physical Sciences), 10399 (named as Other Physics Topics) |
Number of IDs in source | ~250 |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | https://bibliometri.swepub.kb.se/bibliometrics?subject=$1 |
See also | Wikidata:Property proposal/UKÄ standard classification of Swedish science topics 2011, ANZSRC 2020 FoR ID (P8529), ANZSRC 2008 FoR ID (P5922), All-Science Journal Classification Codes (P10203) |
Motivation
[edit]This is valuable because it is used as an ID in e.g Swepub data from the National Library of Sweden and perhaps in other places.--So9q (talk) 23:43, 2 January 2022 (UTC)
- A full list of items is found at https://www.scb.se/contentassets/10054f2ef27c437884e8cde0d38b9cc4/standard-for-svensk-indelning--av-forskningsamnen-2011-uppdaterad-aug-2016.pdf --Egon Willighagen (talk) 09:40, 3 January 2022 (UTC)
- An old endpoint that is now giving 404 is at https://id.kb.se/term/uka/$1 --So9q (talk) 10:08, 3 January 2022 (UTC)
Discussion
[edit]- Comment I tried to find how to find material with this scheme, to figure out a meaningful full URL pattern. But no luck :( I saw SLU had post about it, but kb.se seem devoid (except for broken links, which are abundant :( ). Is there anything in the direction of "linked data" for this property? --Egon Willighagen (talk) 08:41, 3 January 2022 (UTC)
- Yes, an example to how this is used on articles would be great to determine how valuable it is. Ainali (talk) 08:43, 3 January 2022 (UTC)
- I found a working formatter URL; while it's just SLU here, it serves the point. My comment is now a Support. --Egon Willighagen (talk) 09:41, 3 January 2022 (UTC)
- Okay, https://pub.epsilon.slu.se/view/subjects/SSIF-$1.html was for 2011 not 2016 :( --Egon Willighagen (talk) 09:56, 3 January 2022 (UTC)
- But how different is 2016 from 2011? You replaced the URLs in "Examples" with this one PDF URL, but it's much more valuable to have a page per item: a global PDF cannot serve as formatterURL Vladimir Alexiev (talk) 10:06, 3 January 2022 (UTC)
- @Vladimir Alexiev: Fixed with newly found formatter URL that lists articles with that classifier :)--So9q (talk) 10:10, 3 January 2022 (UTC)
- But how different is 2016 from 2011? You replaced the URLs in "Examples" with this one PDF URL, but it's much more valuable to have a page per item: a global PDF cannot serve as formatterURL Vladimir Alexiev (talk) 10:06, 3 January 2022 (UTC)
- Okay, https://pub.epsilon.slu.se/view/subjects/SSIF-$1.html was for 2011 not 2016 :( --Egon Willighagen (talk) 09:56, 3 January 2022 (UTC)
- I found a working formatter URL; while it's just SLU here, it serves the point. My comment is now a Support. --Egon Willighagen (talk) 09:41, 3 January 2022 (UTC)
- Yes, an example to how this is used on articles would be great to determine how valuable it is. Ainali (talk) 08:43, 3 January 2022 (UTC)
- Yes and no. I have only seen it used here: https://bibliometri.swepub.kb.se/ they have a dump file on 1.5 GB that I just fetched into PAWS and I'm about to extract all the topics and UKÄ codes into pandas and do some statistics on it :), see https://public.paws.wmcloud.org/User:So9q/WikidataMLSuggester/extract-swepub-topics.ipynb where I'm working as we speak. So no URL formatter but it seems like well formatted data, example:
- --So9q (talk) 08:48, 3 January 2022 (UTC)
{ "@id": "https://id.kb.se/term/uka/30205", "@type": "Topic", "code": "30205", "prefLabel": "Endocrinology and Diabetes", "language": { "@type": "Language", "@id": "https://id.kb.se/language/eng", "code": "eng" }, "inScheme": { "@id": "https://id.kb.se/term/uka/", "@type": "ConceptScheme", "code": "uka.se" }, "broader": { "prefLabel": "Clinical Medicine", "broader": { "prefLabel": "Medical and Health Sciences" } } }, :
- Isn't it better to ask kb.se to publish "uka" as they do for some other classifications, eg https://id.kb.se/term/sao and one of its terms https://libris.kb.se/rp354vn9510f7x9? That way we can have a proper formatterURL with both HTML and RDF representation Vladimir Alexiev (talk) 10:13, 3 January 2022 (UTC)
- Wrote to libris@kb.se: Hi! We want to connect UKA to Wikidata, see https://www.wikidata.org/wiki/Wikidata:Property_proposal/UKÄ_standard_classification_of_Swedish_science_topics_2016 .
- For that it would be best if you publish it as LOD with individual pages for the concept scheme and for each concept,
- just like you're doing for some other classifications, eg https://id.kb.se/term/sao and one of its terms https://libris.kb.se/rp354vn9510f7x9 .
- A Wikidata user is currently processing https://bibliometri.swepub.kb.se/ and extracting UKA terms and we see that you have it represented in RDF (JSONLD),
- so we're just asking whether you can make per-entity HTML pages and JSONLD files? Vladimir Alexiev (talk) 12:42, 3 January 2022 (UTC)
- Isn't it better to ask kb.se to publish "uka" as they do for some other classifications, eg https://id.kb.se/term/sao and one of its terms https://libris.kb.se/rp354vn9510f7x9? That way we can have a proper formatterURL with both HTML and RDF representation Vladimir Alexiev (talk) 10:13, 3 January 2022 (UTC)
Further to libris@kb.se:
I noticed a JSONLD mistake in eg https://bibliometri.swepub.kb.se/api/v1/bibliometrics/publications/oai:DiVA.org:umu-187284 (JSON keys are shown in bold instead of surrounded by quotes):
{ @id: "https://id.kb.se/term/uka/10302", @type: "Topic", broader: { prefLabel: "Physical Sciences"}, code: "10302", inScheme: { @id: "https://id.kb.se/term/uka/", @type: "ConceptScheme", code: "uka.se"}, language: { @id: "https://id.kb.se/language/eng", @type: "Language", code: "eng"}, prefLabel: "Atom and Molecular Physics and Optics" }, { @id: "https://id.kb.se/term/uka/10302", @type: "Topic", broader: { prefLabel: "Fysik"}, code: "10302", inScheme: { @id: "https://id.kb.se/term/uka/", @type: "ConceptScheme", code: "uka.se"}, language: { @id: "https://id.kb.se/language/swe", @type: "Language", code: "swe"}, prefLabel: "Atom- och molekylfysik och optik" },
You make two groups of statements about https://id.kb.se/term/uka/10302: most of them are duplicates, except these which will be accumulated
- "language", which means this concept has two languages??
- "prefLabel" , but the language of those labels is not indicated
In turtle, this looks like:
<https://id.kb.se/term/uka/10302> rdf:type bf2:Topic ; bf2:code "10302" ; bf2:language <https://id.kb.se/language/swe> , <https://id.kb.se/language/eng> ; madsrdf:authoritativeLabel "Atom- och molekylfysik och optik" , "Atom and Molecular Physics and Optics" ; madsrdf:hasBroaderAuthority [ madsrdf:authoritativeLabel "Fysik" ] ; madsrdf:hasBroaderAuthority [ madsrdf:authoritativeLabel "Physical Sciences" ] ; madsrdf:isMemberOfMADSScheme <https://id.kb.se/term/uka/> .
You can use this to convert JSONLD to Turtle:
curl -s https://bibliometri.swepub.kb.se/api/v1/bibliometrics/publications/oai:DiVA.org:umu-187284|riot -syntax jsonld -formatted ttl
Another problem is that rather than linking to the parent concept, you point to its label:
madsrdf:hasBroaderAuthority [ madsrdf:authoritativeLabel "Fysik" ]
You should replace this with
madsrdf:hasBroaderAuthority <https://id.kb.se/term/uka/103>
and optionally include a brief description of that concept.
IMHO, you should fix the JSONLD to the following:
{ @id: "https://id.kb.se/term/uka/10302", @type: "Topic", broader: {@type:"@id",@value:"https://id.kb.se/term/uka/103"}, code: "10302", inScheme: { @id: "https://id.kb.se/term/uka/", @type: "ConceptScheme", code: "uka.se"}, prefLabel: [{@value:"Atom and Molecular Physics and Optics", @language:"en"}, {@value:"Atom- och molekylfysik och optik", @language:"se"}] },
--Vladimir Alexiev (talk) 15:48, 3 January 2022 (UTC)
- Comment I commented on the 2011 property but the same applies here - do we need a property for OECD's FORD then? Does this actually differ from FORD (other than having Swedish labels)? ArthurPSmith (talk) 18:56, 3 January 2022 (UTC)
- @ArthurPSmith and @So9q: If UKA is the same as OECD, then I'd be in favor of creating an OECD FORD prop, and treating UKA as its Swedish translation. International is better than national Vladimir Alexiev (talk) 09:21, 6 January 2022 (UTC)
- @Vladimir AlexievI just found this Portuguese version https://concytec-pe.github.io/vocabularios/ocde_ford.html, I have not found a similar English version anywhere yet, but this short one is published by OECD themselves. Presumably an English version exists, and other versions are translations of that one.--So9q (talk) 16:48, 6 January 2022 (UTC)
- https://bartoc.org/en/node/1042: OECD Revised Field of Science and Technology (FOS) Classification in the Frascati Manual.
- Languages: zh en fr lt pl pt Vladimir Alexiev (talk) 09:21, 7 January 2022 (UTC)
- https://www.oecd.org/sti/inno/38235147.pdf: 26-Feb-2007: Revised Field of Science and Technology for the Frascati manual
- https://www.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en: 08 Oct 2015: Frascati Manual 2015
- https://www.oecd-ilibrary.org/science-and-technology/oslo-manual-2018_9789264304604-en: Oslo Manual 2018, not sure whether it includes FOS Vladimir Alexiev (talk) 09:28, 7 January 2022 (UTC)
- @So9q Found the latest version: https://read.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en#page61. But it's much smaller, under 60 items. So UKA is a big extension of OECD FOS/FORD. Vladimir Alexiev (talk) 09:34, 7 January 2022 (UTC)
- @Vladimir AlexievIt is unfortunately not clear to me if FORD is as big as the portuguese version linked above or if the Swedes extended the original 60-term FORD into a more detailed 250 term classifcation. I think we should contact OECD and or UKÄ and ask for the full FORD classification. It would be much easier if UKÄ could just link to the OECD document in question so we see precisely who extended and who just translated.--So9q (talk) 12:48, 7 January 2022 (UTC)
- @ArthurPSmith Do you have evidence FORD is the same size as UKA? I haven't seen such. KB.SE said they updated UKA based on edits in Frascati/FORD, but that doesn't mean they are similar in size.
- I vote that the proposal should stay.
- @So9q No response from libris@kb.se. You were right not to wait for them. Vladimir Alexiev (talk) 19:34, 7 January 2022 (UTC)
- At least they seem to have fixed the JSON formatting that you pointed out above. The address there now returns valid JSON :) Let's hope that they improve over time.
- The Libris team have build Libris XL at a cost of ~1000 MSEK but that system seems broken at least from a UX perspective and @Salgo60 have found a lot of bad data that the librarians put into it and no one seem to care about cleaning up.
- The Libris team delivers low value and half broken services but sits on a potential goldmine of data (another team at KB labs recently succeeded making a new BERT model based on all the good data they got in-house and they are also working on speech recognition ML models for automatic subtitling of spoken Swedish).
- Swepub is also a bit crazy put together. I would really have preferred if they would have asked the consumers (WMDE) what we want and made changes based on iterative improvements.
- Instead they choose to put out the dumps without much specification and then go into hiding and pretty much ignore the rest of the world. The data could be so high quality if they told DiVA/the universities e.g. to improve the subjects for example. They could make a graph database with all science topics currently in use in the dataset and then we could match against that. Instead they choose to accept text string topics from the scientists which is bad for everyone IMO. So9q (talk) 23:21, 7 January 2022 (UTC)
- No, sorry I don't have any deeper knowledge of FORD - the only links seem to go to that Frascati manual page; the Frascati manual itself mentions an online appendix with further information, but the only link in the online supplement regarding FORD goes right back to the Frascati manual. So there *may* be more to it, but I'm not finding it right now. So I guess I'm fine with going ahead on this one then. ArthurPSmith (talk) 14:19, 10 January 2022 (UTC)
- I askes UKÄ about the FORD source the used and got this url https://read.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en#page61
- It is the short list we already found earlier. So9q (talk) 11:03, 11 February 2022 (UTC)
- @Vladimir AlexievIt is unfortunately not clear to me if FORD is as big as the portuguese version linked above or if the Swedes extended the original 60-term FORD into a more detailed 250 term classifcation. I think we should contact OECD and or UKÄ and ask for the full FORD classification. It would be much easier if UKÄ could just link to the OECD document in question so we see precisely who extended and who just translated.--So9q (talk) 12:48, 7 January 2022 (UTC)
- @Vladimir AlexievI just found this Portuguese version https://concytec-pe.github.io/vocabularios/ocde_ford.html, I have not found a similar English version anywhere yet, but this short one is published by OECD themselves. Presumably an English version exists, and other versions are translations of that one.--So9q (talk) 16:48, 6 January 2022 (UTC)
- @ArthurPSmith and @So9q: If UKA is the same as OECD, then I'd be in favor of creating an OECD FORD prop, and treating UKA as its Swedish translation. International is better than national Vladimir Alexiev (talk) 09:21, 6 January 2022 (UTC)
- Support — MasterRus21thCentury (talk) 20:43, 10 January 2022 (UTC)