Developing practices for FAIR and linked data in Heritage Science

Table of Contents

Good data practices

Good practices when creating and recording data are the basic prerequisite for the result to meet the FAIR principles in the end. Practices that were acceptable (although not optimal) for analogue data have become the biggest obstacle to reaching the full potential of digital data. Machines require a higher degree of rigour to understand data than humans do. Those working in laboratory operations are skilled at learning new instruments and methods, but they have not always been given the opportunity to learn how to ensure that their output is machine-readable and interoperable. It is not certain that the instruments and software used are optimised for those objectives, especially when it comes to commercial products where manufacturers lack incentives to make formats open source.

Basically, good data practices involve four main components:

1.

Data content (interoperability)
2.

Data description (findability)
3.

Data file formats (accessibility)
4.

Data licensing (reusability)

Good data content is about using standardised terms whenever possible (from acknowledged vocabularies—see below), especially for the column headings. Do not use abbreviations (unless internationally recognised), variations of spelling (e.g. singular and plural) or special signs in the cells (e.g. “?”). Each cell should also contain only one single observation value (e.g. “silver”, not “silver, lead”). Clarifications and qualifications should be restricted to free text fields.

Data description is about making sure others can identify and understand the relevant source material you are producing. A simple step is naming the files in ways that are descriptive and logical (e.g. not just “File1” or “Report”). Ensure that there is good documentation about the output created within a project, e.g. tables or datasets that list photo descriptions or sample provenances. For instance, a project may have included various types of analysis made on several paintings, resulting in multiple images and datasets. Is it possible for an outsider to understand which files are the results of what type of analysis on which painting, once they are uploaded to a repository and removed from the project folder on your computer? Is there a file that contains the necessary contextual information about your data output, which you can include or link to? Descriptive metadata for the files stored in a repository is also a very important step in ensuring findability and efficiently communicating what the data are about. We will go deeper into how to do this in the section below, on vocabularies for metadata to be used both for data content and description.

As far as possible data should be preserved in open-source file formats to ensure that the content can be accessed. Recommendations for file formats for archiving purposes are published by most national archives, and the US Library of Congress has an online “Recommended Formats Statement” that is continuously updated with information about viable file formats for text, images, 3D, audio, datasets, table data, GIS, etc. which is very much recommended. However, laboratory instruments often deliver results in specialised proprietary file formats that cannot be converted into open versions without losing many valuable properties. A solution in those cases can be to publish both the original data file as well as an export of the most important data in an open file format, thereby ensuring both immediate and long-term usability. Scientific communities are also taking an increasingly active role in developing good practices for preserving and sharing data within specific fields, including those relevant to Heritage Science (e.g. refs. ^25,26,27). The goal is not to be perfect in all your data practices but to at least take some basic steps ensuring the results can be found, accessed and used by your peers.

When research data is published online, it should be complemented with information that outlines the terms for reuse, preferably in a short, easy-to-understand format. There are a number of different licenses available, the most common ones being the licences maintained by Creative Commons, an international nonprofit organization ( The one most commonly used in scientific publications is CC BY, which enables others to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator—i.e. a citation. This is very much in accordance with long-established practices around the reuse of published research, and it is the default license for data uploaded to EU platforms like Zenodo. While it can be tempting to add extra restrictions, this is generally not a good idea if you want your work to be properly used and cited. Many publishers of research in journals and books are commercial ventures, meaning a BY-NC (non-commercial) or even a BY-SA (share alike) suffix causes problems. Non-derivative (ND) means no changes can be made to the digital object, such as cropping part of an image or combining it with another. It may also be going against funders’ requirements to use other licenses than CC BY or CC0 (public domain) as these are the only two CC licenses considered truly open.

The CC BY license ensures that you maintain the right to be cited when someone else uses your data in their own research. As for misuse of research results, that can, of course, always happen. There are additional rules and regulations that govern acceptable practices in research. By having a public record of all your work, and clear rules for its reuse, you are better protected in such cases than if you only stored it on your own computer.

Vocabularies and metadata for Heritage Science

As mentioned above, descriptive metadata is essential for making your data discoverable on the internet and an essential part of the FAIR principles. However, as became very clear during the workshops, this is an area where the general recommendations available often leave people struggling with where to start and what to focus on. Disciplines have different requirements and priorities, and there is a growing awareness that guidelines must be tailored with that in mind^26,28,29. Heritage Science faces an additional challenge due to its inherent interdisciplinarity. Recommendations aimed at laboratory disciplines tend to forget that cultural, spatial, and temporal contexts are a vital part of heritage objects. At the same time, researchers within the cultural heritage field can miss that chemists and physicists will be looking for research based on instruments used and types of material components being analysed. It became quite clear during the workshops and afterward during follow-up meetings that many felt overwhelmed when faced with all the possible metadata that could theoretically be used to describe a set of research data. However, the point of descriptive metadata is not to cover every possible description but to create a good foundation for discoverability.

Using finished and ongoing projects selected by the participants as case studies we identified categories of metadata that are relevant for many Heritage Science projects. These cases covered everything from modern art to historic and prehistoric materials from different parts of the world (weapons, coins, textiles, manuscripts, etc.), as well as modern materials used in museum exhibits. Instruments and methods available at the Heritage Laboratory include radiography, Raman, infrared and UV–Vis spectroscopy, X-ray fluorescence mapping, electron and optical microscopy and various advanced imaging capabilities, as well as instruments for climate measurement, accelerated ageing tests and tensile and compression testing. Based on this combination of a variety of materials and methods we were able to identify and test a number of available vocabularies and test their usability with the participants both during group workshops and individually as the data was uploaded to Zenodo. The process allowed us to identify some main categories of metadata that are useful as a starting point for descriptive metadata in most heritage science projects, though not all of them will be relevant in every case.

By adding descriptive metadata for at least some of the categories listed below, researchers will have made a good effort to make their data discoverable. We will start with the categories themselves, and then we will present the controlled vocabularies that we recommend you use for specific categories. However, as discussed below there are areas where these vocabularies fall short. Finding georeferenced authorities is especially difficult, as geographical entities vary over time and disappear (e.g. Roman Empire) or are culturally or ethnically defined rather than politically (e.g. Sápmi).

Recommended categories of metadata for Heritage Science

Use standardised terms in vocabularies within these categories (if relevant) when describing your research data in a repository.

Subjects (e.g. Heritage Science, Archaeology, Art History, Literature, etc.)
Materials (e.g. copper, bone, vellum, pigment, silk, paper)
Method/Instrument (e.g. microscopy, photogrammetry, SEM-EDS, X-ray)
Geographical context (e.g. Country, Region and/or Site name)
Time or style period (e.g. European Bronze Age, Tang dynasty, Olmec, Expressionism)
Object type (e.g. brooch, coin, flute, oil painting, missal, temple)
Person (e.g. name of Artist or the name of a person featured in a work of art or literature)
Object identifier (collection identifiers for the artefacts, artworks, samples, or unique identifiers for sites or buildings)

Object identifiers in the form of authority for a specific artefact, artwork or archival object are becoming more common as digitalised collections are made available online (Fig. 4). Collection databases contain rich metadata about an item, meaning a lot more information can be communicated in a machine-actionable manner just by including the object identifier as liked data in your dataset. If the items analysed in a project do not have globally unique identifiers (as compared to inventory numbers which are not unique), consult with the collection caretaker to see if it is possible to generate them. Using these will make it easier to pinpoint the exact item that was analysed helping curators and researchers to keep track of what types of analyses have been done on items and where the results have been published.

**Fig. 4: The search platform Europeana aggregates digitalised objects in museum and archive collections from European countries.**

The forthcoming European Research Infrastructure for Heritage Science, E-RIHS, and its precursor project IPERION HS, have initiated a pilot collaboration with OpenAIRE to implement Open Science in the field of heritage science³⁰. The IPERION HS data management plan summarises the types of relevant data for heritage science projects as well as how such data shall be preserved and published adhering to FAIR principles^31,32. While the implementation of open access is still in its infancy in the field of Heritage Science, the Natural Science museums in Europe have progressed to develop an international research infrastructure called DiSSCo, where scientific data and images are linked to specific museum specimens to create so-called “FAIR Digital Objects” (FOD)³³.

Such infrastructures could decrease the need for costly and potentially destructive transportation of museum objects, specimens, and will potentially make it easier to build upon previous results. FAIR data practices can reduce the gap between researchers and collection managers, and highlight the ways in which collections in museums, archives and libraries continue to generate new knowledge.

Recommended controlled vocabularies for Heritage Science

Below is our selection of controlled vocabularies that cover categories relevant to Heritage Science. The common criteria are that they are maintained by stable organisations and have a fairly user-friendly web search. In Table 1, they are presented with the categories of metadata each is primarily useful for, with links to the web search and an example of an “authority”, i.e. a persistent URL (web link), for both human and machine-readable definitions. Unfortunately, the persistent URL is rarely the weblink one sees in the internet browser’s address field. To make matters worse, almost every vocabulary has a different way of referring to their authorities (e.g. “permalink”, “concept URI”, “RDF Unique Identifier”), the meanings of which are far from obvious to the average user. We have therefore made a notation of that as well in Table 1. It would certainly be beneficial to everyone if there was a more standardised terminology in this field as well.

Table 1 Recommended vocabularies for Heritage Science and Cultural Heritage studies

The list of vocabularies is not meant to be exhaustive or definitive. As we will address in the final discussion there is a problematic lack of vocabularies necessary for the Humanities and Cultural Heritage disciplines. However, we believe these are a good starting point that will help make people more comfortable with using vocabularies and persistent identifiers. Some vocabularies have overlapping content, i.e. the same term or name appears in them, but there is no need to use multiple authorities for a term. There are digital services that specifically map authorities for terms to each other (e.g. Wikidata and VIAF—see below). This mapping of authorities works as an online thesaurus for web applications, meaning they can understand terms regardless of which controlled vocabulary is used. Just choose the authority (persistent unique identifier) for a term whose description is accurate enough to what you are referring to. There is no problem mixing authorities from different controlled vocabularies when describing your data. In fact, it will most likely be necessary, as there is no single controlled vocabulary that encompasses all terminologies (not even Wikidata—yet).

Getty vocabularies

Developed and published by The Getty Research Foundation, these are the most commonly used vocabularies in heritage studies. Many collection management systems are mapped to the Getty vocabularies, as are many other web and software applications in various languages, meaning they are linguistically interoperable to a very high degree. The vocabularies are hierarchically structured, meaning a term is linked to a more general and often a more specific term. This is a highly useful feature, as it means that a machine will understand that research data described with the metadata “earrings” is relevant when someone is looking for analyses made on jewellery (or “jewelry”).

Getty publishes a number of vocabularies, of which the Art & Architecture Thesaurus (AAT) is the most well-known and covers anything from archaeology to modern art. The Thesaurus of Geographic Names (TGN) has georeferenced places, including historical placenames. The Union List of Artist Names (ULAN) also includes names of organisations, repositories, etc. The Cultural Objects Name Authority (CONA) has names of artworks and architecturally significant places. The Iconography Authority (IA) lists religious terms and has a special focus on non-Western topics.

It should be noted that the Getty vocabularies mainly contain terms relevant to the Institute’s own extensive collections. Other museums, archives, etc., can contribute terms to expand them further, but this is only done occasionally. Consequently, the Getty vocabularies are mainly dominated by terminologies for objects that have found their way into Western collections or are part of the Western cultural canon. Something that will be discussed further below.

MesH

Medical Subject Headings (MeSH) is a controlled vocabulary produced by the US National Library of Medicine. Getty AAT contains a lot of terms for scientific instruments and methods that are relevant to Heritage Science, but MeSH contains practically all of them and is updated more regularly. It is a useful vocabulary if a scientific instrument or method is missing from Getty AAT, or if the definition there does not match what the instrument was used for (e.g. analyses of modern material properties).

MeSH also contains authorities for anatomical terms for bones, which can be useful when one wants to differentiate between an analysis made on a fibula (dress pin) or a fibula (shin bone) (see Fig. 3).

Gold Book (Compendium of Chemical Terminology)

The International Union of Pure and Applied Chemistry (IUPAC) publishes the Compendium of Chemical Terminology, popularly referred to as the Gold Book, as a controlled vocabulary online. It is a good complement to Getty AAT and MeSH when making sure to use standardised chemical terminologies and units of measurement or needing the authorities for these.

GeoNames

GeoNames is an open database containing georeferenced information about almost every type of geographical entity. There are often multilingual versions of place names, as well as links to Wikipedia, increasing the interoperability of its authorities. There can be issues about spatial accuracy in some cases³⁴, especially for historical sites³⁵. Pinpoint accuracy is mainly a problem when using the data for GIS analysis, not for someone who simply wants a unique identifier for a location. However, it is recommended to check on the map that the placement is fairly correct. It is possible for users to log in and correct errors or add additional information, such as alternate spellings, just as on Wikidata. This was done for one of the case studies at the Heritage Laboratory, which involved radiographic analysis on arrow points found at Låktatjåkko in northern Sweden³⁶. As is the case with many Sami place names it has several alternative spellings (Låktatjåkka, Loktačohkka, Laktatjakko), one of which was added to GeoNames as a result of the workshop (Fig. 5).

VIAF

The Virtual International Authority File (VIAF) is a joint project for libraries and is maintained by the Online Computer Library Center (OCLC). This means that VIAF aggregates vocabularies from a large number of libraries across the world, and the authorities are mapped and linked back to these resources. The content is therefore highly trustworthy, as well as very multilingual and interoperable. VIAF contains names of people and organisations associated with published works, as well as titles of said works.

Wikidata

The most extensive and ubiquitous publisher of vocabularies on the internet is the Wikimedia Foundation. The service Wikidata contains all the items (terms, topics, concepts, objects) that have their own page in Wikipedia, as well as a huge number of aliases of said terms. As mentioned above, this is what allows users to find the correct page for terms with identical spelling, and to find the information online regardless of language and alphabet used during a search. Wikidata is readable by both machines and humans.

While using a Wikipedia article as a reference is problematic due to the anonymous and always-changing nature of the contributions, the same is not necessarily true for Wikidata identifiers. The former contains interpretations and opinions, whereas the latter contains mainly factual data in the form of structured metadata (which, of course, can be incorrect). As mentioned above, Wikidata authorities always have links to identifiers in other controlled vocabularies, as a way of reference. This means that Wikidata works as a global hub, mapping and connecting identifiers for terms across the internet. Like VIAF, but on a much greater scale. This is extremely useful for computer applications, which can use these connections to find, filter and visualise information from a combination of sources. It is also useful for humans. If you have been unable to find a term in any of the above vocabularies, looking it up in Wikidata will often yield a result, which in turn can help you discover additional useful controlled vocabularies if you are reluctant to use Wikidata as a reference.

Finally, Wikidata allows for user-generated authorities. This can be a solution if there is a term, person, or object that you can define and describe, but which does not yet have a globally unique identifier. One of the projects at the Heritage Laboratory involved an oil painting by Picasso at the Swedish Museum of Modern Art, La Source, whose identifier was neither globally unique nor persistent. The name of the painting is identical to several other artworks, including a drawing by the same artist held by another museum. A Wikidata identifier was, therefore created as part of the project. This is not a viable solution for every collection item, but a painting by such a renowned artist met the criteria for relevance.

Additional vocabularies

The vocabularies listed above cover broad areas and have fairly user-friendly interfaces. There are three additional ones that we would like to highlight. Two of them are a bit more challenging in terms of interface and content, and one is geared towards a disciplinary niche.

PeriodO

One type of metadata that is very important to heritage studies is chronology. This is also one of the most complex to get right. Common concepts like “Neolithic” and “Middle Age” have different start and end dates depending on the region, just as there is a huge variety of names for prehistoric and historic periods and cultures in different countries.

PeriodO is a gazetteer, i.e. a geographical directory, with definitions of historical, art-historical, and archaeological periods. It is developed through both individual and organisational contributions, funded by the US National Endowment for the Humanities and the Institute of Museum and Library Services. It works similarly to VIAF in that it aggregates controlled vocabularies from various sources and creates authorities for them. The PeriodO web client allows users to search for periods broadly in all the controlled vocabularies, or to select a preferred vocabulary and search in that. The authorities do not just have a chronological definition but are also georeferenced, though sometimes only broadly through linking them to places as defined by Wikipedia (e.g. “France”).

The web client takes some getting used to, and the choices can feel a bit overwhelming as the vocabularies are a mix of project and organisational contributions. Getty AAT and Wikidata are often good enough for authorities on general chronological periods. However, we feel it is worth highlighting this service for those not finding what they need elsewhere, or who are looking for more advanced options.

FISH—Heritage Data

Forum on Information Standards in Heritage (FISH) is a British non-profit, whose membership includes many of the national and regional agencies for archaeology, history, and heritage in the UK. Despite what the acronym seems to suggest, the FISH vocabularies are not aimed at aquatic wildlife. Instead, they cover everything from archaeological sciences and objects to building materials to historical maritime and aircraft terms. These are published in several different formats, including as CSV and PDF, which are very helpful in finding standardised heritage terminologies in English just in general.

The vocabulary terms are also published as authorities in a web search. However it requires the user to first choose which one of the numerous vocabularies they want to search in. It is explained in the Schemes table what subjects each vocabulary covers, and choosing one of the FISH vocabularies by Historic England is generally a good starting point. There are other vocabularies specialised in Scottish, Welsh, and Irish heritage terminologies as well, which can be very useful.

Nomisma

Nomisma is a controlled vocabulary for concepts associated with numismatics, particularly coinage from the ancient Mediterranean cultures and historical European countries. It was started through individual initiatives but now has received funding from the International Numismatic Council and is overseen by a scientific steering committee. Apart from being helpful in finding standardised terminologies, there are geographical, multilingual, and linked data associated with many of the authorities, making them highly interoperable (Fig. 6).

While the terms, people and places in Nomisma can often be found in the vocabularies mentioned above, it could be a good idea to use these authorities when describing research data that was generated analysing coins and medals. It will make the results more discoverable by a highly active research community.

link

Developing practices for FAIR and linked data in Heritage Science

Good data practices

Vocabularies and metadata for Heritage Science

Recommended categories of metadata for Heritage Science

Recommended controlled vocabularies for Heritage Science

Getty vocabularies

MesH

Gold Book (Compendium of Chemical Terminology)

GeoNames

VIAF

Wikidata

Additional vocabularies

PeriodO

FISH—Heritage Data

Nomisma

More Stories

DHS and US Department of State announce return of stolen cultural property to the Hellenic Republic of Greece

Draft Historic Preservation Plan for Hawai‘i available for public review : Kauai Now

CENCOR launches islandwide survey to map Puerto Rico cultural heritage.

DHS and US Department of State announce return of stolen cultural property to the Hellenic Republic of Greece

Draft Historic Preservation Plan for Hawai‘i available for public review : Kauai Now

CENCOR launches islandwide survey to map Puerto Rico cultural heritage.

Mexican heritage, culture on display at Denver Cinco de Mayo celebration

Good data practices

Vocabularies and metadata for Heritage Science

Recommended categories of metadata for Heritage Science

Recommended controlled vocabularies for Heritage Science

Getty vocabularies

MesH

Gold Book (Compendium of Chemical Terminology)

GeoNames

VIAF

Wikidata

Additional vocabularies

PeriodO

FISH—Heritage Data

Nomisma

More Stories

DHS and US Department of State announce return of stolen cultural property to the Hellenic Republic of Greece

Draft Historic Preservation Plan for Hawai‘i available for public review : Kauai Now

CENCOR launches islandwide survey to map Puerto Rico cultural heritage.

You may have missed

DHS and US Department of State announce return of stolen cultural property to the Hellenic Republic of Greece

Draft Historic Preservation Plan for Hawai‘i available for public review : Kauai Now

CENCOR launches islandwide survey to map Puerto Rico cultural heritage.

Mexican heritage, culture on display at Denver Cinco de Mayo celebration