Guide for Archaeological Data Management Planning

Square

Version 1.0, 25 March 2022, CCBY
Authors: Peter Doorn (DANS) and Paola Ronzino (PIN)

Introduction

An ever-growing number of research funding and research performing organisations require archaeologists to demonstrate that they manage their data responsibly: when you submit a proposal for funding, a data management plan (DMP) is often required. The rationale for this is the idea that scientific research should be transparent and replicable, and that its results, including research data, should be shared whenever possible. In order to accomplish this, data should comply with the FAIR data principles, meaning that data should be Findable, Accessible, Interoperable and Reusable, see: https://www.go-fair.org/fair-principles/

A complication is that the requirements, though similar in intention, tend to vary across funders and universities. Yet, more and more research councils and other institutions work with the core requirements and guidance developed by Science Europe, an organisation uniting many research funding and performing organisations across Europe: https://scienceeurope.org/our-priorities/research-data/research-data-management.

The Directorate General for Research and Innovation (DG-RTD) of the European Commission also issued influential requirements for the Horizon 2020 Framework Programme (H2020), which were updated in May 2021 for the new Horizon Europe programme (HE):

The templates for Horizon 2020 / Europe actually consist of lists of topics and questions, which need to be dealt with, without prescribing a particular format or order. Hence, it is allowed to follow the Science Europe requirements as well.

In the ARIADNEplus project, we developed online tools to assist archaeologists who want to (or are obliged to) make a data management plan. We are offering the following assistance:

  1. A Protocol for Archaeological Data Management, based on the Science Europe guide for research data management (including directions for evaluating DMPs). Many research councils and universities in Europe, including the European Commission for the Horizon Europe programme, accept the Science Europe core requirements for making a data management plan (DMP). This protocol can be considered as a standard DMP for archaeological research based on the principle of “comply or explain”. If you deviate from the standard compliance or need to give further details, the protocol provides you the opportunity to make supplementary comments.
  2. A DMP template for archaeology based on the Horizon Europe requirements (version 1.0, 5 May 2021).
  3. A DMP Researcher Template for Archaeological Datasets, compliant with the Horizon 2020 requirements, originally developed in the Parthenos project: https://www.parthenos-project.eu/portal/dmp.
  4. This Guidance document for Archaeological Data Management Planning that can be consulted for both the protocol and the templates mentioned above.

A note on numbering of this guidance:

The order of this guidance follows the Science Europe core requirements, but is also applicable to the ARIADNEplus DMP Researcher Template, and the Horizon 2020 and Horizon Europe requirements.  

Every question of the three templates provide links to the relevant guidance.

Further reading on Research Data Management and tools for Data Management Plans:


1. Data description and collection or reuse of existing data

1.a. How will new data be collected or produced and/or how will existing data be reused?

1a.1 Reuse of existing data

For more information on the reuse of existing research data see: OpenAIRE Guides for Researchers, “Can I reuse someone else’s research data? Learn more on how to reuse research data”: https://www.openaire.eu/can-i-reuse-someone-else-research-data If any additions or corrections to existing data are made, it is good practice to make these available under the same conditions as apply to the original data. A license for this is called a “share alike” license. For further information, see: https://creativecommons.org/licenses/by-sa/4.0/

1a.2 Data provenance

Provenance provides insight into where existing data come from and by whom, when and how they were created. It provides a historical record of the data and their origins. Distinction is sometimes made between primary and secondary data:

  • Primary data are data that have been collected for the first time and that have not undergone thorough data processing and/or analysis yet.
  • Secondary data are data that have been cleaned up, analysed and shared by others (published or unpublished) and these are typically data for reuse.

On data citation see: https://datacite.org/cite-your-data.html

1a.3 Good practices in data collection

For good practices of data collection in archaeology, see: Archaeology Data Service – Digital Antiquity Guides to Good Practice: https://guides.archaeologydataservice.ac.uk/g2gpwiki/


1.b. What data (for example the kind, formats, and volumes), will be collected or produced?

1b.1 Data volume

The estimated volume of the data to be collected (in approximate numbers) is preferably given in bytes (MB/GB/TB). In case other units are used, please specify them.

1b.2 File formats

Standard file types and formats facilitate the comparison, linking, and merging of newly collected data with other data sources over time. This element of data management intends to make data interoperable, reflecting the “I” of the FAIR data principles. By using standardised file formats that are widely used in the archaeological community, potential reuse is increased. Next to community standards, most file formats that are industry standards can easily be converted to software-independent formats, e.g. Excel (XLS and XLSX) to CSV; ESRI Shapefiles to MID/MIF files. Further information:

1b.3 Preferred formats

The use of “preferred formats”, which are recommended by data repositories, can be used independently of specific software, developers or vendors. For overviews of preferred formats, see:


2. Documentation and data quality

2.a. What metadata and documentation will accompany the data?

2a.1 Recommended metadata elements

Metadata, or “data about data” is essential in making data findable, accessible and reusable (the F, A, and R of FAIR), especially the metadata which is used for citing and describing data. Making a dataset understandable for other researchers therefore implies that the data description and documentation include elements such as:

  • The title of the data.
  • A unique and permanent identifier (e.g. a DOI or a URN) through which the data and metadata can be accessed and cited.
  • The date of publication of the data.
  • The rationale for conducting the research and the specific research objectives, including a description of the problem definition, research design, and data collection and processing methods.
  • A description of the organisation and structure of the data, including consistent file naming conventions, with unambiguous titles and descriptions.
  • A description of the content of the data, such as record types, variable descriptions, and units of measurement, including codebooks for coded information.
  • A description of the geographic coverage and sampling/selection of sub-areas.
  • The dates/period of data collection.
  • If applicable, (references to) documents relating to the official approval and permits to carry out the research (further specified in section 4c).
  • The person(s)/team (names and affiliations; preferably also their unique identifiers such as ORCIDs) responsible for the data collection and processing (further specified in section 6a).
  • The availability of the data (i.e. detailed information on when, how and by whom the data can be accessed and used). The accessibility and conditions for reuse of the data can best be specified in a data use license.
  • The intended user community for the data, the reuse potential, and limitations and pitfalls for reuse.
  • Computer code, including routines and procedures in standard software packages (e.g. SPSS syntax, Atlas.ti queries, MATLAB analysis scripts, R code) for data processing and analysis (further specified in section 5c).
  • (References to) publications related to the data, such as articles, monographs, book chapters, MA/PhD theses, preprints, internal reports.

2a.2 Metadata standards and schemas

A list of standardised metadata elements to describe a resource is called a metadata schema. Using an existing metadata schema will ensure that international standards for data exchange are met and is therefore recommended. Further information:

Following consistent conventions for naming data objects, files and folders, as well as for coding variables, the use of intuitively clear principles makes it significantly easier for future researchers to understand what the data are about, how they are organised and hence how to reuse them.


2.b. What data quality control measures will be used?

2b.1 Quality control

In order to have confidence in the data of an archaeological survey or excavation, whether for heritage management or research objectives, assurance is needed that the project was carried out to professional standards. 

In general, Quality Assurance (QA) involves policies, procedures, manuals, standards, and systems, arranged with the goal of ensuring and continually improving the quality of products or services and consumers’ satisfaction with them. Translated to the field of archaeology, possible QA guarantees are:

  • Compliance with archaeological quality standards
  • Research carried out by or under auspices of professionally trained archaeologists
  • Application of general quality standards, such as ISO9000 and ISO9001
  • Application of metadata standards for data description
  • Good practices, guidelines and approved methods of archaeological research to be followed
  • Use of open, well-specified and widely used archaeological glossaries, vocabularies and gazetteers instead of ad-hoc and proprietary vocabularies. Such standard vocabularies describe the exact meaning of the concepts and qualities that the data represent.

Further reading on quality control in archaeological research:

  • Wilshusen, R., et al. (2016). “Archaeological Survey Data Quality, Durability, and Use in the United States: Findings and Recommendations”. Advances in Archaeological Practice, 4(2), 106-117. DOI: https://doi.org/10.7183/2326-3768.4.2.106 
  • Kansa, E., & Kansa, S. (2021). “Digital Data and Data Literacy in Archaeology Now and in the New Decade”. Advances in Archaeological Practice, 9(1), 81-85. DOI: https://doi.org/10.1017/aap.2020.55
  • Banning, E., et al. (2017). “Quality Assurance in Archaeological Survey”. Journal of Archaeological Method and Theory. 24, 466-488. DOI: https://doi.org/10.1007/s10816-016-9274-2

Example of a Quality Standard: Dutch Archaeology Quality Standard (KNA): https://www.sikb.nl/over-sikb/about-sikb/kna-2-1-arcaelogy

Vocabularies

Vocabularies that may be useful for archaeological research:

Qualified references

FAIR Principle I3 on interoperability states that “(Meta)data include qualified references to other (meta)data”. The idea is that linking data and/or metadata to other information sources adds to their value. A qualified reference is a cross-reference that explains why or how datasets are linked: it describes what a link means or how it is intended. The goal of this FAIR criterion is to create meaningful links between (meta)data resources to enrich the contextual knowledge about the data. If possible and applicable, the scientific links between datasets need to be specified. To be more concrete, you should specify if one dataset builds on another data set, if additional information is needed to complete the data, or if complementary information is stored elsewhere (for example, a published article, or the depot where archaeological finds are stored). See also: https://www.go-fair.org/fair-principles/i3-metadata-include-qualified-references-metadata/ 

2b.2 Quality Assurance elements

Quality Assurance elements that may be specified in the DMP are:

  • (Reference to) quality standards that will be adhered to during the data collection.
  • (Reference to) quality assurance procedures that identify the explicit actions to be taken for monitoring the data collection.
  • How to evaluate the impact of the quality assurance standards on the data collection procedures and results.
  • Quality targets in terms of accuracy, reliability, precision and validity of measurements (for example, see: Wright, D.K. Accuracy vs. Precision: Understanding Potential Errors from Radiocarbon Dating on African Landscapes. Afr. Archaeol. Rev 34., p. 303–319 (2017): https://doi.org/10.1007/s10437-017-9257-z).

2b.3 Sampling

On sampling in archaeological research, see: Orton, C., Sampling in Archaeology. Cambridge Manuals in Archaeology. Cambridge University Press, 2000, 2009R: https://www.cambridge.org/core/books/sampling-in-archaeology/19EFCC337099150189D34408E56939D2


3. Storage and backup during the research process

3.a. How will data and metadata be stored and backed up during the research process?

3a.1 Good practices in data storage

For good practices in storing archaeological data, see:

3a.2 Version control and data integrity

To ensure that data are not altered unintentionally after they were collected or generated, it should be possible to check their integrity. Version control is a method used to track changes in the data over time, so that you can recall older versions at a later time. By using a version control mechanism, you can document every change in revised versions of a dataset, by which the authenticity and integrity of the data can be guaranteed. For further information, see: https://datamanagement.hms.harvard.edu/collect/version-control

3a.3 Recommendations with respect to backup and recovery:

  • It is recommended to explicitly assign the responsibilities for backup administration to a member of the research team. To the responsibilities of this “backup administrator” belong at least the supervision of the backup and recovery plan (see below) and whether the backup plan is carried out.
  • If the home institution has a data backup strategy in place, the research should comply with it and make use of the institutional backup system.
  • Backup administration can be outsourced to the technical support staff of the home institution or to an external service provider.
  • If data volumes are substantial (say: above 1TB), an estimate of how much storage capacity will be needed for backups should be made in advance, and it should be verified that the backup system has the storage space available for the research.
  • A disaster recovery mechanism should contain the steps to take if a data loss occurs and which helps to restore data as completely and quickly as possible.
  • The backup and recovery mechanism should be tested at the start of the data collection phase and repeated at least yearly.
  • Data and backups should be stored in at least two geographically separate locations.
  • Backups of personal and sensitive data should be protected against unauthorized access in the same manner as the original files (see section 3b on data security).

Backup planning

A backup plan should cover the following elements:

  • The scope of the backups: determining what data to backup
  • The backup schedule: how often and in how many copies (using incremental, differential or/and full backups as necessary), how long backups will be kept, and how backups that will no longer be needed will be destroyed; which backup processes can take place automatically and which need to be carried out manually.
  • How the integrity of backed-up files will be checked (e.g. with a checksum tool).
  • What backup media will be used.

Recommended backup media:

  • Managed storage system at the home institution, e.g. faculty or institutional Network Shares, NAS (Network Attached Storage), SAN (Storage Area Network)
  • Data repository of the home institution
  • Storage at national data facility (e.g. archaeological data archive in your country) or (international) disciplinary repository
  • (International) multidisciplinary repository (e.g. Zenodo, B2SHARE, Dataverse, figshare)
  • Cloud storage (e.g. Dropbox, Owncloud, Amazon s3, Google cloud, Microsoft Azure)
  • Local computer disks (e.g. of the person responsible for data management)
  • External USB Disks at a locked off-site location (compliant with section 3b)

Warning: USB sticks are generally not recommended for backups!

Further reading on backups: CESSDA ERIC (2017): Data Management Expert Guide: Chapter 4. Store: https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/4.-Store/Backup


3.b. How will data security and protection of sensitive data be taken care of during the research?

3b.1 Data security and protection

Data security is relevant to protect intellectual property rights and business/research interests, or to keep sensitive information, including personal data, safe. Information security is the practice of preventing unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction of information. Data security’s primary focus is the balanced protection of the confidentiality, integrity and availability of data, without hampering organization/research productivity unnecessarily. For basic definitions and an overview of information security, see:  https://en.wikipedia.org/wiki/Information_security

Note: The protection of personal data and compliance with privacy legislation, especially the General Data Protection Regulation (GDPR), is dealt with in section 4a. The protection of Intellectual property is covered in section 4b.

Sensitive data refers to high-risk information such as files that contain sensitive personal information, information that is covered by law or that has a high intellectual property value, politically sensitive information or trade secrets. This may include detailed information on vulnerable archaeological heritage or sites that need to be protected against unauthorized disturbances.

Sensitive personal data is a specially defined category in Art. 9 of the European General Data Protection Regulation (GDPR), see: https://gdpr-info.eu/art-9-gdpr. The protection of sensitive data may require additional security measures (in terms of physical security, network security, and security of computer systems and files) to ensure that such data is stored and transferred safely.

Data Protection/Security Officer:

It is recommended to assign the responsibilities for data security to a member of the research team, who is or will be adequately trained for the task, or to technical support staff of the home institution or to an external service provider. Such a function is called “information security officer/administrator” or “data protection officer”. To the responsibilities of the security administrator belong at least:

  • Making a risk analysis and/or security plan that identifies assets, threat sources, vulnerabilities, potential impacts, and possible controls, particularly for sensitive or confidential data (for example containing personal data, politically sensitive information, or trade secrets).
  • Supervising the security plan and whether the measures of the plan are effectively carried out.

Further information:

CESSDA ERIC (2017), Data Management Expert Guide:

3b.2 Access control

Access to computer systems can be controlled and data can be protected in a variety of ways, such as:

  • with an authentication and authorisation control system for data access
  • with two-factor authentication
  • with passwords only (less secure)

For tips and best practices covering a variety of information security-related topics, see: https://www.uhcl.edu/information-security/tips-best-practices/ Scientific communities increasingly make use of “federated” authentication and authorisation methods, in which a collection of research organisations (such as universities) use the same procedures and credentials for providing access to computer systems and information.

Managed Access control is a security technique that regulates who or what can view or use resources in a computing environment. There are two types of access control: physical and logical. Physical access control limits access to campuses, buildings, rooms and physical IT assets. Logical access control limits connections to computer networks, system files and data.

To secure a facility, organizations use electronic access control systems that rely on user credentials, access card readers, auditing and reports to track employee access to restricted locations and proprietary areas, such as data centers. Some of these systems incorporate access control panels to restrict entry to rooms and buildings, as well as alarms and lockdown capabilities, to prevent unauthorized access or operations.

Access control systems perform identification authentication and authorization of users and entities by evaluating required login credentials that can include passwords, personal identification numbers (PINs), biometric scans, security tokens or other authentication factors. Multifactor authentication (MFA), which requires two or more authentication factors, is often an important part of a layered defense to protect access control systems.

Recommendation with respect to physical security of sensitive data:

  • Access to buildings, rooms, and cabinets where computers, (backup) media or hard copy materials with sensitive data are held, should be controlled in accordance with institutional policies.
  • Access to and removal of media or hard copy materials with sensitive data in store rooms should be logged.

Recommendations with respect to network and computer systems security:

  • Avoid storage in a public cloud and storage on servers or computers connected to an external network, particularly servers that host internet services.
  • Exchange and sharing of sensitive data among the members of the research team should be protected via secure channels (e.g. VPN).
  • Firewall protection, up-to-date security-related upgrades and patches to operating systems and application software, and virus protection to avoid malicious codes should be ensured.
  • Personal or sensitive data files should not be sent via email or other file transfer means, including uploading to the cloud, or transporting on portable devices, without first encrypting them.

Encryption:

Data can be encrypted as an additional method to restrict unauthorized access to sensitive data files, folders or entire hard drives. Recommendations with respect to the use of encryption:

  • Store encryption keys securely and separately from the data files that are encrypted.
  • Make sure that encryption keys will only be accessible to authorized users of the data.

For additional information see:


4. Legal and ethical requirements, codes of conduct

4.a. If personal data are processed, how will compliance with legislation on personal data and on data security be ensured?

4a.1 Privacy and protection of personal data

This section only applies if data on living persons is collected or processed, which can be the case in, e.g., experimental and ethnoarchaeology. But also the names, functions and email addresses of members of the research team are personal information; or photographs and videos of excavations, on which identifiable persons are depicted. Note that general aspects of data security are dealt with in section 3b.

The European General Data Protection Regulation (GDPR) came into force on May 25th, 2018 in all EU Member States. It harmonizes data privacy laws across Europe. The complete text of the GDPR (EU) 2016/679 linked with suitable recitals can be found here: https://gdpr-info.eu

Recommendations concerning the protection of personal data:

  • If available, implement the policies with respect to the handling of personal data of the home institution.
  • During or immediately after data collection on individuals, the research data should be pseudonymised according to security needs, e.g. participant names and addresses should be stored separately from files containing the substantive information.
  • Sensitive (personal) information should be destroyed in a consistent and secure manner when it is no longer needed and is not susceptible to review or replicability, nor to deposit for long-term preservation and reuse.
  • Non-disclosure agreements for managers or users of sensitive data should be imposed.
  • Data deposit for long-term preservation and reuse should be at a specialised and trustworthy repository (see section 5a) with due respect to confidentiality of sensitive information.

Pseudonymised data (where anonymization is reversible) still is personal data. If the (re)identifying information is securely kept separate from the pseudonymised data, outsiders can not establish whether the data is anonymized or pseudonymised.

Further reading: CESSDA ERIC (2017): Data Management Expert Guide: Chapter 5. Protect – Processing personal data: https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/5.-Protect/Processing-personal-data

4a.2 Informed consent

Informed consent implies that research subjects are informed about the research, and are explicitly asked for permission to collect and process information about them.

Recommendations on acquiring informed consent from research subjects (or their guardians/legal representatives):

  • State by whom and for which purposes the personal data may be used during/within the project.
  • State with whom and for which purposes the personal data may be archived and shared after/outside of the project.
  • For deposit and preservation or sharing of personal data in a repository external to the home institution, a Data Processing Agreement (DPA) with the repository should be adopted. The agreement should be in line with the informed consent statements.

Data Processing Agreement

In a DPA, the accountability and compliance with GDPR are described by specifying the roles of data controller and data processor. According to Article 4 of the GDPR, a data controller is the entity (person, organization, etc.) that determines the why and the how for processing personal data. A data processor is the entity that actually performs the data processing on the controller’s behalf. For further information see: https://www.ironmountain.com/resources/general-articles/d/data-processor-vs-data-controller

Further information on informed consent, including examples and forms:

4a.3 Anonymization

Anonymized data is no longer considered as personal data in the sense of the GDPR. Pseudonymised data (where the anonymization is reversible) still is personal data. If the (re)identifying information is securely kept separate from the pseudonymised data, outsiders can not establish whether the data is anonymized or pseudonymised.

De-anonymization is a technique used in data mining that attempts to re-identify encrypted or obscured information. It can be very hard to prove that data can not be re-identified or de-anonymized when additional information (from other data sources) is available. Still, the research team should make sure (= make a reasonable effort to guarantee) that anonymized research subjects cannot be re-identified by a combination of variables in the data set, or without excessive effort, by combining the research data with external information.

Further information on anonymisation:


4.b. How will other legal issues, such as intellectual property rights and ownership, be managed? What legislation is applicable?

4b.1 Legal framework: Malta Treaty

The Convention for the Protection of the Archaeological Heritage of Europe (briefly: Valletta Convention or Malta Treaty) establishes a framework of basic legal standards for Europe, to be met by national policies for the protection of archaeological assets as sources of scientific and documentary evidence, in line with the principles of integrated conservation. It is concerned in particular with arrangements to be made for co-operation among archaeologists and town and regional planners in order to ensure optimum conservation of archaeological heritage. The Convention sets guidelines for the funding of excavation and research work and publication of research findings. Although it does not mention access to digital data, the Convention constitutes an institutional framework for pan-European co-operation on archaeological heritage, entailing a systematic exchange of experience and experts among the various States. For ethical implications concerning the Valletta Convention see section 4c.

Further reading on the Malta Treaty (Council of Europe Treaty Series no. 143), see:  https://www.coe.int/en/web/culture-and-heritage/valletta-convention).

Data protection and the European Database Directive

Data as such is not protected under European law. Copyright or sui generis protection can be claimed under certain conditions specified in the European Database Directive 96/9/EC. The last version of the Directive dates from 6/6/2019, see: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A31996L0009

  • Copyright protection can be claimed for “the intellectual creation involved in the selection and arrangement of materials” in the database. For instance, it is possible to claim a particular database schema meets the directive’s requirement for “creativity” of the work.
  • Sui generis protection can be claimed for “the investment (in human and technical resources, effort and energy) in the obtaining, verification or presentation of the contents of the databases”.

4b.2 Open Data Directive

The Directive on open data and the reuse of public sector information provides a common legal framework for a European market for government-held data (or “public sector information”). It is built around two key pillars of the internal European market: transparency and fair competition. This directive, also known as the “Open Data Directive” (Directive (EU) 2019/1024) entered into force on 16 July 2019. The Open Data Directive replaces the Public Sector Information Directive, also known as the “PSI Directive” (Directive 2003/98/EC) which dated from 2003 and was subsequently amended by Directive 2013/37/EU. In the revision of 2013, cultural information was explicitly included as public sector information, and in the latest version of the directive, scientific data is also explicitly included. For the current version of the text (PE/28/2019/REV/1) see: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32019L1024

4b.3 Data ownership and copyright

Legal ownership of research data is difficult to establish. Data ownership refers to both the possession of and responsibility for the research data. Ownership implies power as well as control. The control of information includes not just the ability to access, create, modify, package, derive benefit from, sell or remove data, but also the right to assign these access privileges to others (Loshin, 2002). In spite of this, the principles of data ownership are debated. In some views, ownership does not apply to data at all. Implicit in having control over access to data is the ability to share data with colleagues or other researchers (the notable exception to the unqualified sharing of data would be research involving human subjects, whose personal information is protected by GDPR; see section 4a. Scofield (1998) suggests replacing the term data “ownership” with “stewardship”, which implies a broader responsibility, see: https://ori.hhs.gov/education/products/n_illinois_u/datamanagement/dotopic.html

In the context of data management, it is enough to establish who controls access to the data in practice. Most frequently, the owner of research data is:

  • The person or entity (research team, represented by the principal investigator or project leader) that has collected or created the data.
  • The employer (home institution of the project) of the person or entity that has collected or created the data.
  • The organisation funding the data collection or creation.

In case the data are collected or created in a multi-partner project, the data ownership and the rights to control access to the data can be specified in the consortium agreement.

Further reading:

4b.4 Access rights and restrictions

Examples of access rules and restrictions:

  • access restricted to the members of the research team
  • access open to academic researchers
  • open to everyone (no restrictions, public domain)

Learn more about Creative Commons CC0 licensing: https://creativecommons.org/publicdomain/zero/1.0/ 

4.c. What ethical issues and codes of conduct are there, and how will they be taken into account?

4c.1 Ethical issues

Ethics in archaeological research refers to the moral issues raised through the study of the material past, and hence is reflected in data collected about this past. It is of particular relevance when archaeologists work with human remains, and in the preservation of archaeological heritage (sites, remains and cultural items). Typically, ethical aspects of an archaeological project are dealt with in the research proposal, and usually referring to the relevant section will suffice for data management planning.

Research involving the study of living persons, as may be the case in experimental and ethnoarchaeology, also has ethical implications, but these are mostly dealt with in section 4a. If, as a result of data collection, risks of data abuse might harm people, the persons from whom data are collected should be made aware of such risks, which are to be included in the declaration of informed consent.

Further reading on ethics in archaeology:

4c.2 Ethics review and self-assessment

Although more common in medical and social research, an ethics self-assessment may be part of the review procedure of an archaeological research project. Among subjects to be described in such a self-assessment are:

  • Questions related to the risk of damaging archaeological heritage during an archaeological excavation.
  • Any legally or institutionally required ethical questions (for example formulated by an ethics committee or institutional review board) on data collection by the research team.
  • Ethical questions about the risks concerning the aims and methods of data collection, including the potential misuse of the data.
  • Ethical questions involving the participation of human subjects and the processing of personal data (see also section 3b).

Further reading on ethics review principles, requirements, procedures and checklists:

4c.3 Codes of conduct and related guidelines

Codes of conduct and ethical guidelines are closely related to professional quality norms and are usually defined in the same context (see section 2b). Some of these are formulated internationally, but there are also national codes of conduct, both for scientific research in general and for archaeological research in particular. In Europe, the Valletta Convention (or Malta Treaty, see also section 4b) provides a general framework for such codes of conduct. National codes of conduct in archaeological research exist in several countries and have different statuses. Codes of conduct can also take the form of quality norms or official requirements and protocols.

International:

Italy:

UK:

The Netherlands:


5. Data sharing and long-term preservation

5.a. How and when will data be shared? Are there possible restrictions to data sharing or embargo?

5a.1 Data curation, preservation and sharing

Consider how data will be curated, preserved and shared beyond the lifetime of the research or grant. Data sharing via a Trustworthy Digital Repository (TDR) is recommended and is further specified in section 5b, which also contains overviews of suitable repositories. If the data are not shared via a TDR, further details are required in the DMP on how it is guaranteed that the data will remain Findable, Accessible, Interoperable and Reusable. This includes information on how and where the data will be managed in a sustainable way for the long run.

One of the FAIR principles on access (A1) states: “(meta)data are retrievable by their identifier using a standardised communication protocol”. The rationale and meaning of this rather technical principle are explained here: https://www.go-fair.org/fair-principles/metadata-retrievable-identifier-standardised-communication-protocol/. The idea is that it should be possible to retrieve the data via a standard Internet protocol such as http(s) or ftp, “without specialised or proprietary tools or communication methods”, which might pose a barrier to access: “Barriers to access that should be avoided include protocols that have limited implementations, poor documentation, and components involving manual human intervention”. In spite of this, when data is sensitive or needs to be protected for privacy reasons, such barriers are allowed, but they need to be clearly and explicitly described in the metadata.

5a.2 Data reuse

Data collected in archaeological research will be useful for various purposes and people: for future research, for comparison with other finds, for conservation of cultural heritage, etc. Think about the target or primary audience of your research, but also about possible reuse by others after the project ends. If there are reasons to restrict access to certain persons or groups, or if special conditions for data reuse apply, this should be motivated and specified for third parties seeking access to the data.

  • Detailed access conditions to the research data are to be specified in a data-sharing agreement and/or access and reuse license. If access restrictions apply, the license should clearly state by whom and for which purposes the data may be reused.
  • For the reuse of personal data, a data processing agreement compliant with GDPR needs to be in place to guarantee that the data will be treated confidentially and in accordance with the informed consent declaration (see sections 3b and 4a).
  • Other sensitive information will be made available for reuse in agreement with applicable codes of conduct for archaeological research (see section 4c).
  • A period of exclusive use (embargo) of the data after completion of the data collection is usually permitted. A valid motivation is to grant the members of the research team time to publish about the research (e.g. for a PhD dissertation). An embargo period of 24 months is often seen as reasonable, and under certain conditions the embargo can be prolonged.

References and further reading:

5a.3 Confidentiality

A nondisclosure or confidentiality agreement can be a condition for granting access to reviewers. Such an agreement specifies the purpose and conditions of the access granted. For further information, see: https://en.wikipedia.org/wiki/Non-disclosure_agreement

5a.4 Open access to metadata

If access to data is restricted, or if data is not or no longer available at all, the data description or metadata should be openly accessible, in compliance with FAIR principle A2, see: https://www.go-fair.org/fair-principles/a2-metadata-accessible-even-data-no-longer-available/ See also section 5b

The Creative Commons network has formulated a standard public domain dedication known as CC0, see: https://creativecommons.org/publicdomain/zero/1.0/

5.b. How will data for preservation be selected, and where will data be preserved long-term?

5b.1 Selecting data for reuse

For the selection of assets to be preserved, consider how the data may be reused, e.g. to validate your research findings, to conduct new studies, for teaching purposes, or for the protection of cultural heritage. Decide which data to keep and for how long. This could be based on any obligations to retain certain data, the potential reuse value, what is economically viable to keep, and additional efforts required to prepare the data for preservation and sharing, such as the conversion of file formats (see section 1b).

Digital assets to consider for preservation:

  • The uncleaned or “raw” (unedited) data, as collected within the framework of the research project, provides the most direct registration of the finds, of field observations, copies of archival sources, photographs, measurements, etc.
  • The processed or final version(s) of cleaned and labeled data, including re-coded and added information (such as selections, weights, annotations, corrections, transformations or calculations) as analysed when preparing a publication.
  • All documentation required to use the data or to replicate the (published) findings of the project (as described in section 2a).
  • Computer code and routines in standard software packages used to process and analyse the data for publications, sufficiently documented to review, reuse and/or replicate the data (as specified in section 5c).

5b.2 Tombstone records for deleted data

The destruction or deletion of data must always be motivated. However, the metadata and data documentation should continue to exist as a “tombstone” for the original data, specifying when and why the data was destroyed.  This is compliant with FAIR principle A2, see: https://www.go-fair.org/fair-principles/a2-metadata-accessible-even-data-no-longer-available/. See also section 5a.4.

5b.3 Data sharing in repositories

It is good practice to deposit archaeological research data in a repository for long-term preservation and sharing. Information on sharing data in Trustworthy Digital Repositories (TDR):

If data will not be deposited in a TDR, this is to be specified in section 5a on data sharing.

5b.4 Data preservation and sharing policies

Trustworthy Digital Repositories are required to  publish their data preservation and sharing policy on their website.

5.c. What methods or software tools will be needed to access and use the data?

5c.1 Software for data handling

The software tools to collect, process or analyse the data, which potential users may need to access, interpret and (re-)use the data, should be described in the data documentation.

Any source code of software especially written for the collection, processing or analysis of the data (e.g. specific scripts, codes or algorithms developed during the project) should be documented and stored so that it is findable, accessible and reusable.

Also the software routines in standard software packages to process “raw” (unedited) data into analysis data and further into research results for publication, should be documented sufficiently for review, reuse and/or replicability of the steps taken. Examples of such routines are: SPSS syntax, Atlas.ti queries, MATLAB analysis scripts, R code, etc. Research data repositories and archives usually offer the possibility to store such routines together with the data to which they were applied.

Further reading on good practices to developing and sharing research software:

  • The website “FAIR software route” gives five recommendations to comply with this requirement:

#1. Use a publicly accessible repository with version control

#2. Add a licence (preferably open source)

#3. Register your code in a community registry

#4. Enable citation of the software

#5. Use a software quality checklist

For further information and references on FAIR software see: https://fair-software.nl/

Suggestions for storing and preserving software code:

5.d. How will the application of a unique and persistent identifier to each data set be ensured?

5d.1 Persistent Identifiers

Without the possibility to locate data any access or reuse is impossible. Over time, web resources tend to be moved to other locations or storage media, whereby the web address may change. In order to remain findable, any data object or dataset should be uniquely and persistently identifiable over time. Persistent Identifiers (PIDs) are designed to guarantee this. A PID continues to refer to the new location of a resource after its old web address was changed. PIDs can take different forms, such as a Handle, DOI, PURL, or URN. In short, a PID is a long-lasting reference to a digital object. In compliance with FAIR principles F1 and A1, a PID may be connected to a metadata record describing an item and/or to the data itself, see: https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/

PIDs can be assigned to all kinds of research outputs, including publications, data and software/code. Note that there are also persistent identifiers for other types of information than digital objects, such as ISBN for books or ORCID for researchers.

PIDs are usually provided by data repositories and other deposit platforms. The registry of repositories Re3data (www.re3data.org) includes tags to show which platforms assign PIDs to their content. Repositories that are certified to be trustworthy or that are recommended by Science Europe routinely assign a PID to deposited datasets.

Further information on PIDs:


6. Data management responsibilities and resources

6a. Who (for example role, position, and institution) will be responsible for data management (i.e. the data steward)?

6a.1 Data management responsibilities and tasks

Whether data management supervision requires a full-time function or whether it is a part-time role or responsibility assigned to a member of the research team, depends on the scale of the project. In bigger projects it may make sense to distribute responsibilities over several members of the team. The following roles and responsibilities for data management mentioned in various sections of this protocol can be distinguished:

To the tasks of the data management supervisor belong at least:

  • To implement and supervise the execution of the obligations and commitments made in this protocol and, if applicable, in the accompanying DMP.
  • To periodically review or evaluate the implementation of the protocol/DMP and to make revisions if the practice deviates from the protocol.

For multi-partner projects, the coordination of data management responsibilities across partners is best described in the consortium agreement.

6b. What resources (for example financial and time) will be dedicated to data management and ensuring that data will be FAIR (Findable, Accessible, Interoperable, Re-usable)?

6b.1 Data management tasks covered in Protocol

All data management tasks throughout the project lifespan are covered in the articles of the data protocol and related DMP templates.

6b.2 Costs of data management

The data management costs may include, for example, data storage and other hardware costs, staff time, costs for documenting and other data curation tasks for deposit, and repository charges.

Further information on data management costs: