DAG Essentials

Data Archive Geo (DAG) is a platform for archiving research data, which is considered to be static and no longer subjected to adjustments. Moreover, DAG is intended for research data that are not already housed in another data storage system that can be used to make them (publicly) accessible, such as a repository. DAG is only accessible to staff (including PhD candidates and post-doc researchers) of the Faculty of Geosciences of Utrecht University (UU). DAG is not available for students of the faculty and/or staff and students of other faculties or departments of the UU.

Depositing Data

Within DAG there are 2 main functions; depositing data to the archive and searching for data in the archive. Depositing data requires 3 aspects; the data itself, data documentation (such as a readme.txt), and metadata. To properly prepare for this, find more information here. After all data is uploaded and required metadata and documentation completed, the user can bring all information into the DAG via the deposit work flow.

Searching for Data

In addition to depositing, a user can search the DAG for all included data packages. This is because the associated metadata is readable and searchable by all users of the DAG. This does not mean that the data itself is available to everyone, this depends on the data access type, specified in the metadata. By giving in a search term in the different search bars or refine the already given search results or total archival content by clicking on certain sorting options on the left side of the ‘find and retrieve’ screen, datasets can be found.

Metadata

In the metadata field data access type, you can choose 2 options for opening up the data belonging to an archive package; open (readable downloadable for all faculty members) and restricted (accessible on request). When you choose for open, all DAG users can download the data without the depositor or owner of the data being notified. When you choose for restricted, interested faculty members have to contact the contact owner, depositor, or the support team for data access. Please know that when making a request, the following information is required;

What dataset is it about? Please provide the full name of the dataset.
Why do you want access to this data?
What is your intended purpose in reusing the data?
What will you do with the result of the processing?

If you want to adjust the data or metadata after you have deposited it in DAG, please contact the support team. They can adjust metadata directly in DAG itself. For adjustments or additions to the data, you are requested to make a new deposit of the data and discuss with the support team what to do with the ‘old’ data package – remove or make references to both of them.

Succession of Data Ownership

When a depositor or data owner intends to leave the faculty, the metadata of the associated datasets in DAG must be updated. Please notify the support team of such an intention, with a suggestion for a successor. If such notification is not made and the contact person or data owner is not available, the other person still present or a contact person from registered research group or project will be approached to designate a successor.

DAG Timeline

DAG has recently gone into production and is being introduced to faculty via a so-called soft-launch. This means that faculty members who also contributed to the establishment of the DAG through pilot groups, are asked to deposit their own data and promote the DAG in their own research group and/or department. In addition, faculty data stewards, the Faculty Research & IT group, and other stakeholders will be asked to submit data or make people aware of the DAG.

The overall planning for the introduction of DAG is as follows;

DAG put in production; July 2022
Informed direct stakeholders personally: August 2022
Initial filling and improving guiding documentation: August – September 2022
Release news items and planning introduction workshops: October 2022

FAQ:

[collapse title=”What is considered research data?”]

Data which is directly or indirectly related to a research project or program. DAG is intended for research data that is no longer active.

Feedback on this Question
[/collapse]

[collapse title=”What is considered (in)active data?”]

The archived data is read-only and suitable for data that you no longer (need to) change because you project is finished, or because the data has reached a final stage (e.g. raw, cleaned or processed).

If the data is still active, and you expect to make changes to it, then you can use Yoda for Geosciences, the faculty network drive or another approved storage location.

Related Links:
UU Geosciences Yoda
UU ICT Storage Finder Tool

Feedback on this Question
[/collapse]

[collapse title=”How does DAG relate to other archives and repositories?”]

DAG is an internal archive for the faculty of Geosciences. It is intended to safeguard research data that cannot or may not be published as FAIR data in public repositories such as Pangaea or Yoda.

Feedback on this Question
[/collapse]

[collapse title=”How does DAG relate to Yoda?”]

Although DAG and Yoda use the same infrastructure and make use of the same IRods technology, there are some fundamental differences. DAG is an internal data archive, which is only accessible by member of the faculty of Geosciences, while Yoda is an institutional repository, which is accessible by users from Utrecht University and beyond. DAG is focused on preserving data on the long-term which can be considered as static. Besides preserving data on the long-term in the vault, in Yoda data can also been stored in the workspace for sharing and collaboration purposes. In addition data in the vault of Yoda can be made publicly available, so a DOI can be assigned and data is online retrievable by means of the registered metadata. Also the data in DAG is retrievable, but only within DAG itself. By the way: it is possible to make data from DAG publicly available by using a publication workflow.

Related Links:
What is Yoda?

Feedback on this Question
[/collapse]

[collapse title=”Is data in DAG FAIR?”]

The data in DAG are only findable and accessible for researchers of the faculty and this makes the data less FAIR than data that is in most public repositories.

However, there is a lot of data in the faculty that remains on external disks or isolated network folders, in many cases because the data is not suitable or allowed to be published, even with restricted access.

DAG provides a solution for these data, to ensure that these data remain available. By providing metadata, access controls and guidance we maximize the findability, accessibility, interoperability and reusability of the data.

Feedback on this Question
[/collapse]

[collapse title=”Who may use DAG?”]

These guidelines are created in cooperation with researchers and supporters from different divisions of the faculty.

Data Manager and Stewards: Vincent Brunst, Garrett Speed, Ilja Kocken
Pilot Group sensitive data
Pilot Group lab data
Pilot Group big data
Yoda Team: Maarten Hoogerwerf, Erik Hakvoort, Monic Hodes

We welcome any feedback, suggestions, etc. Users can give feedback on questions and pages using the feedback link on each page.

Feedback on this Question
[/collapse]

[collapse title=”Can I archive sensitive data?”]

Yes, you may archive data with which is sensitive. However, you should restrict access to your data via the metadata.

If your data contains personal data, then you must indicate that in the metadata, so that we can manage it in compliance with GDPR.

Feedback on this Question
[/collapse]

[collapse title=”What is personal data?”]

Personal data means data that can be related to an identified or identifiable natural person. This data needs to comply with the regulations of the GDPR / AVG. Otherwise sensitive data is data that has been labelled as sensitive for ethical, commercial, or valorization reasons and should be treated as such. Please consult your data steward or the faculty privacy officer if you are unsure if your data may be considered personal data.

Related Links:
UU RDM Handling Personal Data

Feedback on this Question
[/collapse]

[collapse title=”How does DAG safeguard my data?”]

The DAG is built upon the Yoda platform, which complies with Utrecht University’s Information Security policy for data classified as public, internal use or sensitive.

As a depositor you can control restrict data access to either the whole faculty, or to yourself (and data managers).

Access restrictions are set in the metadata through the personal data field and the data sensitivity field. Note that metadata can be searched by the whole faculty. 

Feedback on this Question
[/collapse]

[collapse title=”Who can find my data?”]

Your data, once it is submitted and stored in the archive, can be found by all members of the faculty, after logging in to DAG. The metadata is not shared with other systems, so it cannot be found outside DAG. If you do need your data to be findable, you should consider publishing it in a public repository such as Yoda or Pangea, you can find public repositories on the UU Repository Finder tool, or you can contact your data steward for help.

Related Links:
UU Repository Decision Tool

Contact Information:
UU Geosciences Data Team (Data Stewards, Data Manager, Privacy Officer)

Feedback on this Question
[/collapse]

[collapse title=”Is there a maximum size to the data that I can archive?”]

In principle there is no maximum size to the files or dataset that you need to archive. However, there are some things that you need to consider:

Is it worth archiving the specific (large amount of data)?
How are you going to transfer the data efficiently?
Does DAG have enough capacity to store the data?

You should have no trouble archiving data up to a few GB with file explorer, or up to 100s of GBs if you use iCommands. If you plan to upload over 1 TB of data, you need to contact the DAG management so that we can reserve sufficient storage capacity and for the team to help with optimizing the data package.

Feedback on this Question
[/collapse]

[collapse title=”How to determine the owner or main responsible person for my data?”]

Primary responsibility for complying with these guidelines lies with the researcher who is also responsible for the generation of the research data. This also applies for PhD candidates and postdoctoral researchers. For research master’s students, their promotor or daily supervisor is responsible.

Feedback on this Question
[/collapse]

[collapse title=”Who should deposit the data?”]

Data creators and data owners have the primary responsibility to deposit data., because they have insight knowledge about the content of the data and how it is originated,. They should write the data documentation, decide in what structure the data should be recorded and create the metadata.

Data managers, data stewards, data custodians and/or administrative/support staff are not responsible for depositing data, but they are available to help. They do not have a full understanding of the structure and content of the data, so they should focus on supporting the data depositor, so he / she can deliver high-quality data, metadata and data documentation to DAG.  

Feedback on this Question
[/collapse]

[collapse title=”What sensitive data can I archive?”]

Selecting data means making choices about what to keep for the long term, and what data to archive securely. This means that you have to decide whether your dataset contains data that needs to be deleted or separated. Reasons to exclude data from publishing include (but are not limited to): 

The data is redundant 
Data concern temporary byproducts, which are irrelevant for future use 
Data is sensitive for privacy reasons in regard to the GDPR/AVG: like consent forms, voice recordings, transcripts, DNA data, or any other data the contains information on specific people.
Data containing state secrets
Data sensitive to competition in a commercial sense, preserving data for the long term is in breach of contractual arrangements with your consortium partners or other parties involved

Feedback on this Question
[/collapse]

[collapse title=”What restrictions on the inclusion of data in DAG might apply?”]

Not all research data can be included in DAG. This has to do with the fact that the faculty (owner of DAG) takes over the responsibility for making the data available from the data owner. The data owner still has the rights and the responsibility to determine what happens to the data stored in DAG. Data of which the UU (faculty or data owner) does not have the ownership or intellectual property rights cannot be included in DAG. This could also apply to data to which third parties, such as funder, publisher, data supplier or consortium partner have (shared) claims or rights. In that case it must be agreed with the relevant party whether the data can be included in DAG. In addition, commercial considerations or valorisation purposes could be reasons not to include data in DAG. If these reasons apply, find out first before data are deposited.

Feedback on this Question
[/collapse]

[collapse title=”What are the costs for archiving, and how are these covered?”]

There are costs for archiving your data the most important being the cost of the data storage.

The storage costs are currently covered by the faculty and will not be charged to the individual researcher or research group.

The storage usage will be monitored, and when inefficent usage is detected, we will contact the data depositor and data owners to discuss how the usage of DAG can be optimized.

Related Links:
UU Guide – Cost of Data Management

Feedback on this Question
[/collapse]

Feedback? Please tell us what you think on this page