User talk:Ahankins

From Wikidata
Jump to navigation Jump to search
Logo of Wikidata

Welcome to Wikidata, Ahankins!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

RISM URIs in MARC

[edit]

Hi,

You added a row for RISM to the URIs in MARC Wikidata page, but this seems to only apply to works in RISM. Aren't there other kinds of RISM IDs (for composers, for example?). I don't understand why "sources" is part of the ID rather than just the numeric identifier. "sources" could be part of the URL that you append the unique number to. We would not put an identifier like sources/993104505 into an 024 field of a MARC authority record, it would just be 024 7 993104505 I think. Also, in order to even do this in MARC, a Standard Identifier Source Code (https://www.loc.gov/standards/sourcelist/standard-identifier.html) for RISM would need to be established by the Library of Congress. Otherwise, only a URI could be recorded, with no subfield $a or subfield $2. AdamSeattle (talk) 03:24, 30 April 2024 (UTC)[reply]

Hi @AdamSeattle. Thanks for your feedback. It is very helpful.
The "sources/" is part of the IDs, since we also have "people/" and "institutions/" as well. It's there to differentiate between the different types of records. If you look at Wikidata RISM ID (P5504) you'll see the structure.
You say "We would not put an identifier like sources/993104505 into an 024 field of a MARC authority record, it would just be 024 7 993104505 I think.", but I'm unclear why not. This is a genuine question -- I haven't found any guidance on the structure of identifiers in MARC, or whether anything is accepted / restricted. I see "sources/993104505" as "just a (unique) string" in this context. If you have some guidance, I would be very interested to know!
The number by itself doesn't mean anything -- 30003869 can be https://rism.online/institutions/30003869 or it can be https://rism.online/people/30003869. Likewise 100020 can be https://rism.online/sources/100020 or https://rism.online/people/100020.
We are currently undertaking a fairly major transformation of RISM identities "in the wild." We're looking to transition it to a Linked Data-friendly system, and part of that is building consistent identities across different platforms, and this is not particularly easy. What we don't want is different identifier standards that require some heuristic knowledge to transition between "old" systems and "new" systems. So making the identifier value contain part of the URI seemed like a good, easy solution.
We are also going to be approaching the LoC (through the IAML and/or MLA Cataloguing Committees) to add RISM to the list of standard identifiers. But first we need to sort it out ourself, so feedback like yours is very useful. Ahankins (talk) 06:41, 30 April 2024 (UTC)[reply]
I think what LC prefers to do in this situation is to establish separate identifier codes rather than one. Then the base URL will be unique for each and include the word before the numeric ID. Have a look at https://www.loc.gov/standards/sourcelist/standard-identifier.html and you'll see separate codes for identifiers from archINFORM, BookBrainz, ISFDB, and others:
archinl - archINFORM index of locations
archinpe - archINFORM index of persons
archinpr - archINFORM projects
bbrainza - BookBrainz Author
bbrainzp - BookBrainz Publisher
bbrainzw - BookBrainz Work
isfdbau - ISFDB author directory (Internet Speculative Fiction Database)
isfdbaw - ISFDB award directory (Internet Speculative Fiction Database)
isfdbma - ISFDB magazine directory (Internet Speculative Fiction Database)
isfdbpu - ISFDB publisher directory (Internet Speculative Fiction Database)
pcadbu - Pacific Coast Architecture Database - buildings list
pcadpe - Pacific Coast Architecture Database - persons list
pcadpf - Pacific Coast Architecture Database - practices and firms
tmdbm - TMDB movies (The Movie Database)
tmdbp - TMDB people (The Movie Database)
tmdtv - TMDB TV shows (The Movie Database)
Having separate codes for RISM institution, RISM people, RISM sources would solve this problem. Only the numeric identifier part of the URI would need to be recorded in 024 $a and the full URI could be in $1. AdamSeattle (talk) 00:54, 1 May 2024 (UTC)[reply]
Thank you for taking the time to respond -- I really appreciate it!
I think what LC prefers to do in this situation is to establish separate identifier codes rather than one. Then the base URL will be unique for each and include the word before the numeric ID.
That's a good solution, and one I think we could go with. (It seems like more work for the people maintaining that list, but I guess if that's what they prefer, then it's OK.)
However, I'm still interested in the reasons for not putting the full identifier in $a with just one value for $2. As far as I can tell, "prefixes" are fairly standard for identifiers in the 024. As a simple string, if you only consider the characters and not the semantics, "sources/123" or "person/123" are both unique identifiers for a given entity. The fact that it "looks" like part of a URI shouldn't really matter to the system, should it?
Is it the slash that is the problem? I couldn't find any indication of forbidden characters for $a. Would it be a problem if it was $a sources-123 or $a people_123 or $a institutions123?
I'm asking because we really don't want to cause problems with data interchange, but we're also interested in simplifying our referencing system to be consistent across all platforms where we publish. We are working on OCLC and VIAF exports, so don't really want to mess this up. Ahankins (talk) 09:59, 1 May 2024 (UTC)[reply]