Springe direkt zu Inhalt

Prepare your data

Before you deposit your data in the TRR170-DB repository, please make sure that your dataset(s) meet out guidelines below. TRR170-DB accepts only research data in digital formats that,

  • use comprehensible file names (=> file naming)
  • are saved in a preferred file format (=> preferred file formats)
  • described via metadata information (=> describe your data)

For more detail, see below:

1 File naming

Good practice in file naming and organizing makes it much easier for you and other researchers who want to reuse your data to find the right data. There are some basic file naming recommendations:

  • Name files in consistent way
  • Use descriptive file names (< 25 characters).
  • No spaces, use underscores (e.g. first_study), hyphens (e.g. first-study) or camel case (FirstStudy).
  • No characters such as \ / ? : * ” > < | : # % ” { } | ^ [ ] ` ~ æÆ øØ åÅ äÄ öÖ …
  • Use international dating convention (e.g., 2021-04-22)
  • Name of the original file and its corresponding file in preferred format (see below) should be the same name (e.g., text.docx => text.pdf)

 

2 Preferred file formats


A preferred file format ensures that your data will be readable in the future when the software you used to create this file might be not available anymore. Preferred file formats are

  • non-proprietary using documented international standards
  • using standard character encoding, preferably Unicode (e.g. UTF-8)
  • uncompressed (space permitting)

 

The table below provides you with a list of preferred vy non-preferred file formats that are commonly used. If your dataset(s) contains a file format nott listed here, please contact support.

Table: Preferred file formats

File type

Preferred file formats

Non-preferred file formats

Audio

Uncompressed and lossless Wav or AIFF (.wav/.aiff)

Compressed and lossless FLAC (.flac)

Compressed and lossy Mp3 (.mp3)

AAC (.m4a)

Monkey’s Audio (.ape)

Ogg Vorbis (.ogg)

Windows Media Audio (.wma)

Container file

Container files are automatically unpacked when uploaded and should only be used to keep the folder structure in your dataset; see more in section Upload data files.

In case container files need to be archived as container files, use .zip. Note! In this case, files must be packed twice. That way, the inner container will be preserved when uploaded.

Image

Uncompressed TIFF (.tif or .tiff)

Compressed and lossless PNG (.png)

Compressed and lossy JPEG (.jpg)

Adobe Photoshop (.psd)

Apple Picture File (.pct)

Graphics Interchange Format (.gif)

Raw Image Data File (.raw)

Windows Bitmap (.bmp)

Text (slides, illustrations)

PDF/A (.pdf) combined with original file

PowerPoint (.pptx)

Text (tables)

Tab separated Unicode plain text (.txt)

Excel (.xlsx)

Text (text)

Plain text (.txt)

If formatting needed:

XML, PDF/A (.pdf) combined with original file

Word (.docx)

HTML

Markup language

XML (.xml)

HTML (.html)

Related files: .css, .xslt, .js, .es

SGML (.sgml)

Markdown (.md)

Transcription

File format:

PDF/A (.pdf) combined with original file

PDF/A (.pdf) combined with Comma/Tab Separated Values (.csv/.txt)

Font:

Unicode IPA (e.g. Charis SIL, Doulos SIL, Gentium Plus, Andika), ASCII SAMPA

File format:

Word (.doxc)

Excel (.xlsx)

Font:

Transcription legacy fonts (SIL IPA(93))

Video

MPEG-4 (.mp4)

AVI (.avi)

Flash Video (FLV)

Quicktime (.mov)

Windows Media Video (WMV)

Array data

netCDF (.nc)

 

Statistical analysis

R (.R, .RData)

SPSS (.dat/.sps)

STATA (.dat/.DO)

SPSS Portable (.por)

SPSS (.sav)

STATA (.dta)

SAS (.7dat, .sd2, .tpt)

Qualitative data analysis

Basic data in preferred file format, e.g. PDF/A, plain text in Unicode (.txt)

Analysis dump/package as REFI-QDA Project (.qdpx)[1]

The different workspace dump formats, e.g. .nvp, .hpr

Workspace dump formats for mass spectrometry

mzML (.mzML)[2]

Agilent D (.D)

Bruker BAF (.BAF)

Bruker FID (.FID)

Chromtech DAT (.DAT)

[1] Read more about this format here.
[2] Read more about this format here.

3 Describe your data
In the TRR170-DB, there are two ways to describe your data,

  • using metadata fields provided by a metadata schema
  • create and also upload a Readme file if your data are not described in a referenced publication.

For software code you may use this template for software code.

Adding as much information to your data makes it easier for other researcher to find, trust and use them.

4 File size

The size of each individual file upload must not exceed 5 GB. If you want to upload files that are in total larger than 5 GB, upload files separately and save the dataset after each upload. If your single file is larger than 5 GB use this script. If you wish to add a dataset of more than 50 GB, please contact support.

5 References

We have adjusted and modified guidelines from https://site.uit.no/dataverseno/deposit/ and references therein.

For any questions, comments or suggestions, see our support page.