A mirror of our working group’s guides for reproducible coding projects in science.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Wolfgang Traylor f995755c14 Add a welcoming repository overview 6 days ago
LICENSES Make repository REUSE-compliant 3 months ago
figures Add REUSE logo 3 months ago
.gitattributes Make repository REUSE-compliant 3 months ago
.gitlab-ci.yml Add Continuous Integration to check REUSE compliance 1 week ago
Checklists.md Make repository REUSE-compliant 3 months ago
Copyright.md Make repository REUSE-compliant 3 months ago
Guide.md Make repository REUSE-compliant 3 months ago
README.md Add a welcoming repository overview 6 days ago
README_TEMPLATE.md Note that mrpharmacist IP is local 6 months ago
README_TEMPLATE.md.license Make repository REUSE-compliant 3 months ago
check_external_files Remove redundant copyright info 3 months ago

README.md

REUSE-compliant

Reproducible Science Projects

Repository Content

  • Guide.md: Tutorial for how to turn your project into a Git repository on our Git server. Plus some reproducibility tips.
  • Checklists.md: Points to consider when working on reproducibility and preparing for publication.
  • Copyright.md: Some thoughts on copyright and licensing.
  • README_TEMPLATE.md: A template for the README.md file for your repositories.
  • check_external_files: A handy little Bash script to check that all your external files are present. See Guide.md for details on external files.

Working Group Policy

  • Goal: Make all our research, whether published or not, method-reproducible sensu Goodman et al. 2016:
    “to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results.” (Goodman et al., 2016)

  • What we want to archive:

    • Master, Bachelor, and PhD projects
    • Any publications.
    • Any other research projects from our group, even if not completed.
    • For collaboration projects, check whether your partners can/want to follow our guidelines.
  • Encouragements:

    • Comment and document your project lavishly!
    • Keep a chronological lab notebook, journaling your work progress: like a diary.
    • Write down why you chose to do things the way you did. (This is for internal use, so you can reveal all your dirty hacks.)
    • If you have modified an existing code base, mark your changes with comments in a consistent way. Then anybody can find all changes by simply searching for a term.
    • Write your texts in plaintext (e.g. RMarkdown, LaTeX) and create a script to automatically convert them to PDF or HTML.
    • Consider mentioning your reproducibility policy in your research proposals.
  • Each experiment/project is archived in a self-contained Git repository on our in-house GitLab server. Follow the tutorial in Guide.md.

    • If you want to archive an existing version control repository, try to preserve the history.
    • Datasets too big for the GitLab server (with Git-LFS) may be stored outside of the repository on another storage. (TODO: What’s the maximum repository size?)
    • For such external files provide an MD5.txt file and specify exactly where to download them from. See Guide.md for instructions.
  • The README must contain (see README_TEMPLATE.md):

    • A statement that this repository is for internal use only.
    • General overview.
    • If applicable: abstract, DOI, and URL of the connected publication.
    • All authors and contributors with affiliations.
    • How directories and files are structured.
    • Usage instructions for re-running the experiment.
      • Advanced: Write a script to execute your project completely automatically. Consider using Popper.
    • A license or copyright statement (see Copyright.md).
    • Known bugs.
    • All software dependencies must be listed with exact versions.
      • Advanced: if possible re-create the execution environment in an automated way (→ Packrat, Docker, Slock, Anaconda,…).
    • Specify the hardware that was used to run the experiment.
  • At least one other person must have reproduced the experiment—at best solely based on the instructions from the README.

    • This person should not have been in the original developers team.
    • Resource-intensive computations don’t need to be executed completely. Just make sure that they start as expected (but be careful not to overwrite existing output files).
    • On success, this person creates a Git tag on the reviewed commit (=revision). See Guide.md for instructions.
    • Also create a tag for the version published in a journal.
  • Use free/libre software whenever possible.

Reproducibility concept overview.

Reasoning

Why we use Git

  • Git guarantees that the state of the project is correct and complete.
  • We can continue a project easily while retaining all version history.
  • Git is industry standard. There are plenty of resources and integrations. Git skills are an asset in many fields.
  • All projects are in one place (our GitLab server)

Why we host our own GitLab server instead of using 3rd party solutions

  • As of 11/2019 there is no communal account for Senckenberg or our BIMODAL working group on any 3rd party Git host.
  • If every employee were to use their personal GitHub account, data would become inaccessible after they leave.
  • We are not dependent on pricing policies.
  • Unlimited storage and unlimited computational power for continuous integration (CI).
  • Large external files and the Git repos are on the same machine.
  • Customization.
  • Our data is secure from potential large-scale data leaks.

The two great disadvantages of having our in-house server are that it is not directly accessible from outside the Senckenberg LAN and that it needs to be maintained by us.

Further Reading

Authors

License

This repository is compliant with the REUSE standard: Licensing information are in a comment at the top of each file or in a .license file; all license texts are in the LICENSES/ folder. After you have contributed something, please add your name and email to the [Authors][#authors] section and to the list of copyright holders in the respective license information with a new line, like SPDX-FileCopyrightText: 2020 Jane Doe <jane@doe.tld>.