Reproducible Science Projects
Guide.md: Tutorial for how to turn your project into a Git repository on our Git server. Plus some reproducibility tips.
Checklists.md: Points to consider when working on reproducibility and preparing for publication.
Copyright.md: Some thoughts on copyright and licensing.
README_TEMPLATE.md: A template for the
README.md file for your repositories.
check_external_files: A handy little Bash script to check that all your external files are present. See
Guide.md for details on external files.
Working Group Policy
Goal: Make all our research, whether published or not, method-reproducible sensu Goodman et al. 2016:
“to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results.” (Goodman et al., 2016)
What we want to archive:
- Master, Bachelor, and PhD projects
- Any publications.
- Any other research projects from our group, even if not completed.
- For collaboration projects, check whether your partners can/want to follow our guidelines.
- Comment and document your project lavishly!
- Keep a chronological lab notebook, journaling your work progress: like a diary.
- Write down why you chose to do things the way you did. (This is for internal use, so you can reveal all your dirty hacks.)
- If you have modified an existing code base, mark your changes with comments in a consistent way. Then anybody can find all changes by simply searching for a term.
- Write your texts in plaintext (e.g. RMarkdown, LaTeX) and create a script to automatically convert them to PDF or HTML.
- Consider mentioning your reproducibility policy in your research proposals.
Each experiment/project is archived in a self-contained Git repository on our in-house GitLab server. Follow the tutorial in Guide.md.
- If you want to archive an existing version control repository, try to preserve the history.
- Datasets too big for the GitLab server (with Git-LFS) may be stored outside of the repository on another storage.
(TODO: What’s the maximum repository size?)
- For such external files provide an
MD5.txt file and specify exactly where to download them from. See Guide.md for instructions.
The README must contain (see README_TEMPLATE.md):
- A statement that this repository is for internal use only.
- General overview.
- If applicable: abstract, DOI, and URL of the connected publication.
- All authors and contributors with affiliations.
- How directories and files are structured.
- Usage instructions for re-running the experiment.
- Advanced: Write a script to execute your project completely automatically. Consider using Popper.
- A license or copyright statement (see Copyright.md).
- Known bugs.
- All software dependencies must be listed with exact versions.
- Advanced: if possible re-create the execution environment in an automated way (→ Packrat, Docker, Slock, Anaconda,…).
- Specify the hardware that was used to run the experiment.
At least one other person must have reproduced the experiment—at best solely based on the instructions from the README.
- This person should not have been in the original developers team.
- Resource-intensive computations don’t need to be executed completely. Just make sure that they start as expected (but be careful not to overwrite existing output files).
- On success, this person creates a Git tag on the reviewed commit (=revision). See Guide.md for instructions.
- Also create a tag for the version published in a journal.
Use free/libre software whenever possible.
Why we use Git
- Git guarantees that the state of the project is correct and complete.
- We can continue a project easily while retaining all version history.
- Git is industry standard. There are plenty of resources and integrations. Git skills are an asset in many fields.
- All projects are in one place (our GitLab server)
Why we host our own GitLab server instead of using 3rd party solutions
- As of 11/2019 there is no communal account for Senckenberg or our BIMODAL working group on any 3rd party Git host.
- If every employee were to use their personal GitHub account, data would become inaccessible after they leave.
- We are not dependent on pricing policies.
- Unlimited storage and unlimited computational power for continuous integration (CI).
- Large external files and the Git repos are on the same machine.
- Our data is secure from potential large-scale data leaks.
The two great disadvantages of having our in-house server are that it is not directly accessible from outside the Senckenberg LAN and that it needs to be maintained by us.
- Cooper, Natalie, Pen-Yuan Hsing, Mike Croucher, Laura Graham, Tamora James, Anna Krystalli, and Francois Michonneau. 2017. “A Guide to Reproducible Code in Ecology and Evolution.” British Ecological Society.
- Feng, Xiao, Daniel S. Park, Cassondra Walker, A. Townsend Peterson, Cory Merow, and Monica Papeş. 2019. “A Checklist for Maximizing Reproducibility of Ecological Niche Models.” Nature Ecology & Evolution, September. https://doi.org/10.1038/s41559-019-0972-5.
- Goodman, Steven N., Daniele Fanelli, and John P. A. Ioannidis. 2016. “What Does Research Reproducibility Mean?” Science Translational Medicine 8 (341): 341ps12–341ps12. https://doi.org/10.1126/scitranslmed.aaf5027.
- Jimenez, I., M. Sevilla, N. Watkins, C. Maltzahn, J. Lofstead, K. Mohror, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. 2017. “The Popper Convention: Making Reproducible Systems Evaluation Practical.” In 2017 Ieee International Parallel and Distributed Processing Symposium Workshops (Ipdpsw), 1561–70. https://doi.org/10.1109/IPDPSW.2017.157.
- Noble, William Stafford. 2009. “A Quick Guide to Organizing Computational Biology Projects.” PLOS Computational Biology 5 (7): 1–5. https://doi.org/10.1371/journal.pcbi.1000424.
- Wilson et al. 2014. “Best Practices for Scientific Computing.” PLOS Biology 12 (1): 1–7. https://doi.org/10.1371/journal.pbio.1001745.
- Wilson et al. 2017. “Good Enough Practices in Scientific Computing.” PLOS Computational Biology 13 (6): 1–20. https://doi.org/10.1371/journal.pcbi.1005510.
This repository is compliant with the REUSE standard: Licensing information are in a comment at the top of each file or in a
.license file; all license texts are in the
After you have contributed something, please add your name and email to the [Authors][#authors] section and to the list of copyright holders in the respective license information with a new line, like
SPDX-FileCopyrightText: 2020 Jane Doe <email@example.com>.