Wikidata:WikiProject Heritage institutions/Data structure/Data modelling issues
This page supports the tackling of data modelling issues related to the description of heritage institutions on Wikidata.
This Google Doc contains a tentative list of data modelling issues related to heritage institutions. Please help complementing it and/or contribute to the description of issues on this wiki page!
Cases of Problematic Data Modelling[edit]
"Organization", "building", or "collection"?[edit]
Many Wikidata items currently confound organizations, buildings and/or collections. In most cases, this mix-up has been caused by automatic data imports from Wikipedia where it is common practice to cover these different concepts in one and the same article. On Wikidata, this causes problems as soon as statements are added to these items that may apply to either of these concepts, such as inception (P571). Most museums, for example, do have different dates for the construction of the building, the initialization of the collection, and the creation of the organization. If such statements are added indiscriminately, it remains unclear to which entity they apply, and the information becomes difficult to interpret if not useless altogether.
Proposed Solution[edit]
Keep items for "organization", "building", and "collection" strictly apart by assigning them to the respective class (see table below).
Network Organization | Organization | Building | Part of a Building, Room | Collection | |
---|---|---|---|---|---|
museum | n/a | museum (Q33506) | museum building (Q24699794) | n/a | museum collection (Q27699276) |
archive(s) | n/a | archive (Q166118) | archive building (Q635719) | n/a | archival collection (Q9388534) |
library | library network (Q28324850) | library (Q7075) | library building (Q856584) | library (Q29843656) | library collection (Q856592) |
Complications[edit]
Some complications arise from the fact that some classes within the class tree confound these different concepts (see the section "Inconsistencies in Hierarchical Class Tree" below). Thus, some of the data modelling issues can be addressed at the level of individual items, while others need to be resolved at the level of the class tree.
In an effort to facilitate the correct assignment of items to their respective classes, mismatches between the various language versions of the class definitions need to be addressed (see the section "Translation & Internationalization Issues" below).
Problematic Items[edit]
The table below lists overview tables with country statistics for problematic items as well as maintenance queries that help pinpointing problematic items. Note that the lists do not discriminate between cases where the assignment to different classes occurred at the level of the item and cases where the confusion stems from a badly constructed class tree.
Type of Issue | Dashboard | Maintenance Queries |
---|---|---|
Museums (= organizations) at the same time defined as architectural structures | Country statistics | List of problematic items (example query: Switzerland) |
Archives (= organizations) at the same time defined as architectural structures | Country statistics | List of problematic items (example query: Switzerland) |
Libraries (= organizations) at the same time defined as architectural structures | Country statistics | List of problematic items (example query: Switzerland) |
Museums (= organizations) at the same time defined as collections | Country statistics | List of problematic items (example query: Switzerland) |
Archives (= organizations) at the same time defined as collections | Country statistics | List of problematic items (example query: Switzerland) |
Libraries (= organizations) at the same time defined as collections | Country statistics | List of problematic items (example query: Switzerland) |
Further problematic items may be found by searching specifically for "organization" items with statements that are typical for architectural structures (e.g. instances of museum (Q33506) with architect (P84) items). Also, certain identifiers typically refer to either an "organization" or a "building".
Type of issue | Dashboard | architect (P84) | architectural style (P149) | occupant (P466) |
---|---|---|---|---|
Museum (= organization) items with statements that are typical for architectural structures | Country statistics | Query | Query | Query |
Archive (= organization) items with statements that are typical for architectural structures | Country statistics | Query | Query | Query |
Library (= organization) items with statements that are typical for architectural structures | Country statistics | Query | Query | Query |
Note that the percentage indications provided by the property dashboard for the different properties does not match the figures obtained from the maintenance queries.
Type of issue | Dashboard | maintained by (P126) | copyright status (P6216) | level of description (P6224) |
---|---|---|---|---|
Museum (= organization) items with statements that are typical for collections | ||||
Archive (= organization) items with statements that are typical for collections | ||||
Library (= organization) items with statements that are typical for collections |
[TO DO: complement the tables and dashboards with further properties and with further links to maintenance queries that pinpoint mismatching properties]
Progress Statistics[edit]
The tables below are used to track the progress that is being made in cleaning up the respective items and hierarchical class tree.
As the percentage indications provided by the property dashboard for the different properties do not match the figures obtained from the maintenance queries, the absolute numbers obtained from the maintenance queries are used.
Date | museum (N) | museum & architectural structure (N) | museum & collection (N) | architect (P84) (N) | architectural style (P149) (N) | occupant (P466) (N) |
---|---|---|---|---|---|---|
2019-09-29 | 45808 | (server error) | 43794 | 1109 | 860 | 137 |
2019-10-06 | 45919 | (server error) | 43903 | 1109 | 863 | 139 |
2019-10-14 | 45989 | (server error) | 453 | 1109 | 864 | 138 |
2019-10-30 | 46304 | (server error) | 630 | 1117 | 911 | 141 |
Date | archive (N) | archive & architectural structure (N) | archive & collection (N) | architect (P84) (N) | architectural style (P149) (N) | occupant (P466) (N) |
---|---|---|---|---|---|---|
2019-09-29 | 5583 | 181 | 5348 | 32 | 17 | 3 |
2019-10-06 | 5590 | 183 | 5354 | 32 | 17 | 3 |
2019-10-14 | 5618 | 184 | 107 | 33 | 17 | 3 |
2019-10-30 | 5618 | 186 | 400 | 33 | 17 | 3 |
Date | library (N) | library & architectural structure (N) | library & collection (N) | architect (P84) (N) | architectural style (P149) (N) | occupant (P466) (N) |
---|---|---|---|---|---|---|
2019-09-29 | 37477 | 3632 | 34352 | 360 | 252 | 26 |
2019-10-06 | 56716 | 10969 | 51089 | 360 | 252 | 26 |
2019-10-14 | 72047 | 1785 | 191 | 359 | 253 | 27 |
2019-10-30 | 73565 | 1612 | 57949 | 365 | 256 | 27 |
[TO DO: complement the statistics tables]
Property Constraints[edit]
Eventually, property constraints should be used to flag problematic statements. Furthermore, we should be looking into using Shape Expressions for data validation purposes (cf. EntitySchema:E125; EntitySchema:E90).
[TO DO: insert table with proposed property constraints and a column to track their state of implementation.]
[TO DO: review/complement the instructions on the "Data structure" page to reflect the rules defined here.]
Translation & Internationalization Issues[edit]
"Organization", "building", or "collection"?[edit]
There are various translation issues related to the distinction of organizations, buildings, and collections. For example, as of summer 2019, library (Q7075) (library) was defined as "collection" in English, as "institution" in Spanish, and as a "place" in French.
Item Concerned | Description of Issue | Proposed Solution | Implementation Status |
---|---|---|---|
library (Q7075) | Varying definitions across languages: "collection of resources" (en), "facility" (de), "institution" (es, it, nl), "place" (fr), book depository (ru) | Align all language versions with the following definition: "institution charged with the care of a collection of literary, musical, artistic, or reference materials, such as books, manuscripts, recordings, or films". | corrected en, fr, and de; removed some obviously wrong definitions in other language versions |
institution (Q178706) vs. institution (Q3917196) | In many languages, the term "institution" is polysemic, referring to 1. an established law, practice, or custom; 2. an organization. | Define institution (Q178706) as 1. (an established law, practice, or custom); and use institution (Q3917196) to refer to 2. (an organization). | |
[TO DO: List further translation issues and proposed solutions.]
Inconsistencies in Hierarchical Class Trees[edit]
"Organization", "building", or "collection"?[edit]
The mix-up between organizations, buildings, and collections can also be found within the hierarchical class tree and needs to be sorted out there as well.
Item Concerned | Description of Issue | Proposed Solution | Implementation Status | Comments |
---|---|---|---|---|
library (Q7075) | subclass of "storage" | remove statement | Done | Some mismatched identifiers were corrected as well; yet, they haven't been verified systematically.
Added different from (P1889) statements to distinguish the item from items it is easily confounded with. Added maintained by WikiProject (P6104) → WikiProject Heritage institutions (Q69901156) |
library (Q7075) | subclass of "collection" | remove statement | Done | |
library (Q7075) | subclass of "facility" (= place) | remove statement | Done | |
main library (Q12317349) | subclass of "library building" - library building (Q856584) | remove statement | Done | |
library branch (Q11396180) | subclass of "library building" - library building (Q856584) | remove statement | Done | |
GLAM (Q1030034) | defined both as subclass of cultural institution (Q5193377) (= organization) and collection (Q2668072) | remove statement <sub-class of> collection | Done | Added maintained by WikiProject (P6104) → WikiProject Heritage institutions (Q69901156) |
[TO DO: List further inconsistencies and proposed solutions.]