Wikidata talk:WikiProject Archival Description/Data structure

From Wikidata
Jump to navigation Jump to search

ISAD(G) or RiC?

[edit]

Hi, I would rather suggest to go with the approach taken by Records in Context (RiC), which has the ambition to make the data models for archival description ready for the linked data world.

Concretely, the sole classes above the item would be:

  • collection
  • record set

They would be related among each other through <part of>/<has part> relationships.

Furthermore, a new property could be created to indicate the <level of description> (according to ISAD(G)). Before doing that, I would however ask to what extent the ISAD(G) levels of description indicated in archival finding aids share some common semantics across institutions (or conversely: to what extent does each archive have its own idiosyncratic approach to applying levels of description?). - If there is no shared sematics with clear definitions for the different levels of description, it may not make much sense to ingest that data into Wikidata.

Cheers, Beat Estermann (talk) 06:24, 24 April 2018 (UTC)[reply]

Current usage of archives at (P485)

[edit]
# List pairs of items / archives linked by the property P485 (archives at)
SELECT ?item ?itemLabel ?itemtypeLabel ?archives ?archivesLabel ?archivestypeLabel
WHERE
{
  ?item wdt:P485 ?archives.
  ?item wdt:P31 ?itemtype.
  ?archives wdt:P31 ?archivestype.
     
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!

Studies regarding the inventory building in the area of audiovisual heritage

[edit]

The following two studies should be considered when discussing how to describe record sets in Wikidata:

The second study contains a series of metadata sets that are described from the perspective of their functional requirements. I'll see whether I can get it published.

--Beat Estermann (talk) 07:02, 31 May 2018 (UTC)[reply]


(Archival) Collection

[edit]

Est-ce délibéré d'utiliser 'collection' (Q2668072) plutôt que 'archival collection' (Q9388534), qui est certes très peu utilisé 10 fois il me semble), mais existe également ? Il me semble que cela pourrait être source de confusion et aller à l'encontre d'un usage uniforme. --Anchardo (talk) 19:00, 15 February 2019 (UTC)[reply]

And what would be the value added in using the more specific concept? - If you have a look at the statements about archival collection (Q9388534) and the related Wikipedia articles, you will notice that the definition of the concept is rather shaky. --Beat Estermann (talk) 07:40, 18 February 2019 (UTC)[reply]

XML-EAC conversion vers Wikidata

[edit]

Il y aurait un grand intérêt à développer des scripts (ou du moins une marche à suivre) pour transformer automatiquement des données encodées en XML-EAC en des données importables dasn Wikidata (voir discussion à ce sujet sur Twitter).

Pour y réfléchir, voici un exemple de fichier XML-EAC :

<?xml version="1.0" encoding="UTF-8"?>

-<eac-cpf xsi:schemaLocation="urn:isbn:1-931666-33-4 http://eac.staatsbibliothek-berlin.de/schema/cpf.xsd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:isbn:1-931666-33-4">


-<control>

<recordId>FRAC45234_NA_00013</recordId>

<maintenanceStatus>new</maintenanceStatus>


-<maintenanceAgency>

<agencyCode>0000 0001 2161 689X</agencyCode>

<agencyName>Orléans. Archives municipales</agencyName>

</maintenanceAgency>


-<languageDeclaration>

<language languageCode="fra">fra</language>

<script scriptCode="Lati">Lati</script>

</languageDeclaration>


-<conventionDeclaration>

<citation>Norme ISAAR/CPF (Norme internationale sur les notices d’autorité utilisées pour les Archives relatives aux collectivités, aux personnes ou aux familles), 2e édition, 2004. AFNOR NF Z 44-060, décembre 1996, Catalogue d'auteurs et d'anonymes - Forme et structure des vedettes de collectivités-auteurs AFNOR NF Z 44-081, septembre 1993, Catalogage des documents cartographiques : forme et structure des vedettes noms géographiques. </citation>

</conventionDeclaration>

<maintenanceHistory/>


-<sources>


-<source xlink:href="">

<sourceEntry>Archives personnelles et témoignage de la famille Nicoulaud consultées à l'occasion du don du fonds Robert Nicoulaud</sourceEntry>

</source>

</sources>

</control>


-<cpfDescription>


-<identity>

<entityType>person</entityType>


-<nameEntry>

<part>Nicoulaud, Robert (1913-1996)</part>

<authorizedForm>AFNOR</authorizedForm>

</nameEntry>

</identity>


-<description>


-<existDates>


-<dateRange>

<fromDate standardDate="1913-11-13T00:00:00+00:00">1913-11-13</fromDate>

<toDate standardDate="1996-02-09T00:00:00+01:00">1996-02-09</toDate>

</dateRange>

</existDates>


-<functions>


-<function>


-<descriptiveNote>

<p>Politique</p>

</descriptiveNote>

</function>


-<function>


-<descriptiveNote>

<p>Association</p>

</descriptiveNote>

</function>


-<function>


-<descriptiveNote>

<p>Action sociale</p>

</descriptiveNote>

</function>

</functions>


-<legalStatus>


-<descriptiveNote>

<p>Personne physique</p>

</descriptiveNote>

</legalStatus>


-<biogHist>

<p>Louis Robert Nicoulaud est  le 13 novembre 1913 à Orléans. Diplômé de l’Ecole nationale des Arts et Métiers de Châlons-sur-Marne, il devient ingénieur pour la Compagnie de chemin de Fer du Paris-Orléans devenue par la suite la Société Nationale des Chemins de Fer français (SNCF). Il épouse Paulette Cosson avec laquelle il aura un fils prénommé Claude. Très impliqué dans le tissu associatif local et dans la vie de son quartier, Robert Nicoulaud décède le 9 février 1996. Durant la Seconde Guerre mondiale, il participe à la campagne de France. Démobilisé en juillet 1940, il rejoint d’abord la région parisienne  il habitait avant guerre. Mais dès 1943-1944, il s’installe provisoirement à Bordeaux puis à Mérignac en Gironde. Dès 1945 et pour 5 ans, il devient chef de dépôt SNCF à Worth en Allemagne  il participe au rétablissement du réseau ferroviaire. De retour à Orléans, il participe à la fondation, en 1951, du Centre Culturel d’Orléans - Maison de jeunes et de la culture d’Orléans (CCO). Il s’agit sans doute de la première du genre en région Centre. Pour recevoir les jeunes, Robert Nicoulaud achète au 213bis, rue des Murlins un baraquement servant initialement à loger les sinistrés de la Seconde Guerre mondiale. Ultérieurement, et après avoir fait déplacer l’avant du baraquement vers l’arrière, il fait construire sa maison devant le CCO. Jusqu’à la dissolution du CCO en 1968, il se bat pour l’accès de la jeunesse à la culture populaire et laïque. Il sera ainsi actif au sein de la Fédération départementale des Maisons de jeunes et de la culture du Loiret ainsi que de la Fédération régionale. Lorsque la gestion des MJC locales est reprise par la municipalité d’Orléans, il semble s’éloigner de ce mouvement. Malgré de nombreux déménagements dus à son activité professionnelle, Robert Nicoulaud reste fondamentalement attaché à la ville qui l’a vu naître. Avec Paulette, son épouse, il multiplie les engagements associatifs. Entre autres, il est sociétaire de l’Automobile-club de France de la région parisienne en 1964. Il est membre de la Société des anciens élèves de l’école nationale des Arts-et-Métiers, du Cercle Péguy, de l’Association pour la protection du site du Loiret, de l’Association pour la protection de l’environnement et de la qualité de la vie (APEAO), de l’Association populaire d’art et de culture (APAC). Sur le plan politique, il participe en 1977 aux élections municipales sur la liste d’Union pour la démocratie municipale et le renouveau d’Orléans, menée par Michel de La Fournière . Au début des années 1980, il adhère au parti politique naissant « Les Verts ». La fibre sociale de Robert Nicoulaud est l’un des traits les plus marquants de sa personnalité. Dans un premier temps, c’est son intérêt pour la jeunesse qui en témoigne. Durant les années 1970, c’est la naissance d’une entreprise familiale qui en apporte une nouvelle preuve. Les Tricots Nicou ont pour but de fournir du travail à des femmes au foyer et ainsi leur apporter un complément de revenus. L’entreprise se porte d’ailleurs très bien puisqu’elle expédie jusque sur la Côte d’Azur. Sur le plan intellectuel, on notera que Robert Nicoulaud adhérait à la revue "Esprit" à laquelle il a peut-être d’ailleurs collaboré via des groupes de travail. En 2010, 213bis rue des Murlins, la plaque signalant l’entrée du CCO existe toujours à l’entrée de la propriété. Dans le jardin, l’ancienne Maison de jeunes et de la culture est désormais utilisée comme garage.</p>

</biogHist>


-<places>


-<place>

<placeEntry>Orléans (Loiret, France)</placeEntry>

</place>


-<place>

<placeEntry>Orléans (Loiret, France) -- Rue des Murlins (1935-2099)</placeEntry>

</place>

</places>

</description>


-<relations>


-<resourceRelation xlink:href="http://archives.orleans-agglo.fr/arkotheque/inventaires/ead_ir_consult.php?a=4&ref=FR%2045234_23S_Fonds_Robert_Nicoulaud" resourceRelationType="creatorOf">

<relationEntry>Fonds Robert Nicoulaud (AMO, 23S)</relationEntry>


-<dateRange>

<fromDate standardDate="2019-03-07T19:31:00+01:00">1931</fromDate>

<toDate standardDate="1984-03-07T10:43:13+01:00">1984</toDate>

</dateRange>

</resourceRelation>

</relations>

</cpfDescription>

</eac-cpf>

 – The preceding unsigned comment was added by O'Zarchives (talk • contribs).

Collection ou service d'archives

[edit]

Il y a un problème de confusion entre les langues vis-à-vis de la question d'archives dans Wikidata. La compréhension de l'entité archive (Q166118) n'est pas la même entre les langues. En français et l'allemand on y parle des documents alors qu'en italien et anglais on y parle de l'institution... Comment sortir de cette impasse? Voici une première liste des éléments à disposition :

Une proposition serait de changer la description de archive (Q166118) en français (et en allemand?) pusiqu'il existe archives (Q56648173) pour parler des documents. Qu'en pensez-vous? --2le2im-bdc (talk) 18:51, 6 April 2022 (UTC)[reply]

I agree with your opinion, and at least, description of archive (Q166118) in French has been updated in that direction oldid-1835730047 just for your information. --KAMEDA, Akihiro (talk) 07:05, 19 October 2023 (UTC)[reply]

The property ‎fonds (P12095) has been created

[edit]

Through the discussion, the property ‎fonds (P12095) has been created. I added the property in the table in this page, but there should be more places to be added. I can't read français, so I hope someone, maybe @2le2im-bdc?, can update the page. If this page should be translated to English, to welcome collaborators, I have a (small) budget to do so. Is it worth translating? This namespace has to be in the license CC-BY SA 4.0, is that right? --KAMEDA, Akihiro (talk) 07:14, 19 October 2023 (UTC)[reply]

Thanks a lot for the news @KAMEDA, Akihiro ! I have update a littel bit the page but you have right, we have to translate it in English. I can't do it but I will spoke also with @Amandine WMCH about it. Best Regards 2le2im-bdc (talk) 20:09, 24 October 2023 (UTC)[reply]
@2le2im-bdc Thank you for the page update! I look forward to your additional information about translation :) --KAMEDA, Akihiro (talk) 01:06, 25 October 2023 (UTC)[reply]

Disentangling collection and inventory number, P195 and P217

[edit]

At the moment (and apparently, by design of the creators), the properties for collection and inventory number, collection (P195) and inventory number (P217), the latter has a contraint asking for P195 as a qualifier.

From one point of view this actually makes sense: Inventory numbers are not unique and need to be qualified in some way in order to make sense out of context. In historical sciences (if you forgive this term, I know Anglophone readers love to distinguish between sciences and humanities) it is customary to use inventory numbers appended to an abbreviated version of their holding instutition, sometimes further qualified with the fonds (Q3052382) and the town or geographical location of the institution.

However, our current way of modelling inventory numbers in our structured database leads to data duplication. It makes creating and curating these values more difficult than it needs to be, and it does not make sense from an ontological point of view.

To find a solution, we should consider the nature of inventory numbers. They are meant to make an object identifiable within its conservation context. In a world without digital media and the internet, Manuscript A in Library X need not have an inventory number different from Manuscript B in Library Y. But we live in an age where you can view digital reproductions of manuscripts from any place you like simultaneously (the advantages of which I've described in my 2020 paper on Paeanius). This is why databases for manuscripts are not using inventory numbers but unique, non-descriptive identifiers. This becomes especially relevant for databases that cover a multitude of collections like Manuscripts Mediaevalia and Pinakes. In Greek scholarship, use of the Pinakes IDs (called Diktyon numbers) in conjunction with the familiar designation (city, institution, fonds, inventory number) is not only encouraged but even required by most publishers and editors.

I use manuscripts as an example (from a domain familiar to me), but my argument can be extended to any kind of object held in GLAM institutions or private collections.

Of course, descriptive terms still have their place in the digital world, which is why we choose descriptive labels for manuscript items like Iviron 812 (Q123196004). But inventory numbers are not the only (and certainly not the most straightforward) way of identifying manuscripts in a digital context. If we can agree on this, there is really no need to qualify the inventory numbers in any way. What we should do instead is make sure that both P195 and 217 are used in items for cultural heritage objects.

From my judgement, the best course of action would be to:

  1. Remove the required qualifier constraint from P217
  2. Add a mandatory constraint to P195 and P217 stating that both statements need to used together, not in a nested loop like now, but apart from and next to one another, physically speaking.
  3. Think about ways of connecting interdependent statements like this. Wikidata does not have a hierarchy of statements to my knowledge, but other databases (like Pinakes) locate items with ordered lists of increasingly granular data: country -> city -> institution -> fonds -> inventory number.

Thanks for reading and considering this. I eagerly await your opinions on the matter.

Best, Jonathan Groß (talk) 08:24, 1 November 2023 (UTC)[reply]

@Jonathan Groß: First of all, thanks for raising the problem. The following data show that the situation is really messy. First of all, presently Wikidata has 108957 manuscripts (query), out of which 7618 have neither P195 nor P217 (query; type 0) and so won't be considered in the following discussion. So, considering the others, the following types can be outlined:
1) manuscripts in Wikidata with both P195 (main statement) and P217 (main statement) (query) (presently 98767)
2) manuscripts in Wikidata with P195 (main statement) and no P217 (main statement) (query) (presently 2481)
2.1) manuscripts in Wikidata with P195 (main statement), qualified with P217, and no P217 (main statement) (query) (presently 348)
2.2) manuscripts in Wikidata with P195 (main statement), not qualified with P217, and no P217 (main statement) (query) (presently 2170)
3.1) manuscripts in Wikidata with P217 (main statement), qualified with P195, and no P195 (main statement) (query) (presently 82)
I omit 3.2 (parallel to 2.2) because it has 0 occurrences. I will now consider only the first group, which clearly prevails on the others:
1.1) P195 qualified with P217 value and P217 qualified with P195 value (query) (presently 4486)
1.2) P195 qualified with P217 value but P217 not qualified with P195 value (query) (presently 132)
1.3) P195 not qualified with P217 value but P217 qualified with P195 value (query) (presently timeout in WDQS, but 92869 results in QLever)
1.4) P195 not qualified with P217 value and P217 not qualified with P195 value (query) (presently 2249)
From the above data, the most frequent solution for describing manuscripts (85% of the cases) is type 1.3, that is: collection (P195) not qualified + inventory number (P217) qualified with the same value of P195 (example: Commentary on the Epistles of St Paul by Gilbert de La Porée (Q100276111)). This description has the obvious issue of duplicating the information of collection (P195).
An issue we have to take into accout is the following: a manuscript can change its collection and its inventory number many times in its history (see e.g. https://pinakes.irht.cnrs.fr/notices/cote/64420/). In the case of manuscripts having 2(+) collection (P195) and 2(+) inventory number (P217) understanding which P195 is associated with which P217 would be crucial ... and the only way we can do it is qualifiers. But this would fall again into the issue of duplicating information.
For this reason, I think the only solution which could both associate univocally P195 and P217 and avoid duplicating information is having one only as main statement and the other only as its qualifier (for this reason, I disagree with the points 1 and 2 of Jonathan's proposal, which is substantially type 1.4). Considering the hierarchy of institution -> fonds -> inventory number, since fonds is obviously above inventory number (i.e. an inventory number makes sense only because it is assigned in a specific fonds), my proposal is the following (it can be applied not only to manuscripts, but also to other types of archival material):
  1. prohibit completely the use of inventory number (P217) as main statement
  2. require collection (P195) to be always qualified with inventory number (P217) (if something is in a collection, surely that collection has assigned it some inventory number)
This corresponds to type 2.1 above; my example item is Christ Church Wake 5 (Q112074931). Of course adopting this data model would require a lot of adjustments on existing items, but most can be done through bots (QuickStatements would not be enough, because of qualifiers).
I add a final analysis of the types above: types 1.1, 1.2 and 1.3 are redundant; 1.4 doesn't ensure the association between P195 and P217; 0 lacks P195 and P217; 2.2 lacks P217 (as 3.2 would lack P195); 2.1 and 3.1 are not redundant and don't lack information, but 2.1 is preferrable because P195 is preminent on P217 and so P195 deserves to be main statement with P217 as qualifier. Everything IMHO, of course :) --Epìdosis 10:43, 4 November 2023 (UTC)[reply]
I was forgetting a crucial point: collection (P195) is not immediately above inventory number (P217), between them we have to add the recently created, and presently not much used, ‎fonds (P12095). The inventory numbers (should) never refer directly to the collection, but to the fonds, which is the entity to which they effectively belong. I have fixed my example item Christ Church Wake 5 (Q112074931) accordingly. So a more precise outline of my proposal would be:
  1. prohibit completely the use of inventory number (P217) as main statement
  2. require ‎fonds (P12095) (not P195) to be always qualified with inventory number (P217) (if something is in a fonds, surely that fonds has assigned it some inventory number)
Probably a significant amount of values presently stored in collection (P195) will need to be converted into ‎fonds (P12095); the criterium for doing this conversion probably needs a separate discussion. --Epìdosis 10:57, 4 November 2023 (UTC)[reply]
Thanks for surveying and reporting on the Status Quo as you did! So we both feel that inventory numbers should be tied to their context in some way, and I concede that two main statements P195 and P217 next to one another (as I suggested) is insufficient for that. However, I am not sure that using P217 exclusively as a qualifier for P12095 is the best solution. As Beat Estermann pointed out here (bullet point 2), we currently have a mixup between core concepts within Wikidata. We should clear this up first before deciding how to model inventory numbers.
From my non-archivist point of view, I get a feeling that P217 is a very fuzzy property: Is is meant to record the inventory number by itself without any qualification, or should it be used like a "signature" with a full string of holding institution, records set, designation, number? I am looking forward to hearing Beat's input. Jonathan Groß (talk) 12:42, 4 November 2023 (UTC)[reply]
@Jonathan Groß: sure, it emerges clearly from this and other discussions that a lot of concepts are mixed up and that we first need to clear them up. In fact, which string should go into P217 is one of the issues that need to be clarified: I tend to agree with the examples of the property, which show only the inventory number by itself without any qualification, but also the option of a full string of holding institution, records set, designation, number has its pros. Surely we need the opinions of one or more archivists (in fact, I have never studied archival science), let's wait for them before starting massive fixes. --Epìdosis 13:16, 4 November 2023 (UTC)[reply]