Checklist



Reproduciblity can occur at every step in the history of your project. How easy will it be for others or your future self to answer these questions?

Documentation

❏ Is it clear where to begin? (e.g., can someone picking a project up see where to start running it)
❏ can you determine which file(s) was/were used as input in a process that produced a derived file?
❏ Who do I cite? (code, data, etc.)
❏ Is there documentation about every result?
❏ Have you noted the exact version of every external application used in the process?
❏ For analyses that include randomness, have you noted the underlying random seed(s)?
❏ Have you specified the license under which you’re distributing your content, data, and code?
❏ Have you noted the license(s) for others peoples’ content, data, and code used in your analysis?

Organization

❏ Which is the most recent data file/code?
❏ Which folders can I safely delete?
❏ Do you keep older files/code or delete them?
❏ Can you find a file for a particular replicate of your research project?
❏ Have you stored the raw data behind each plot?
❏ Is your analysis output done hierarchically? (allowing others to find more detailed output underneath a summary)
❏ Do you run backups on all files associated with your analysis?
❏ How many times has a particular file been generated in the past?
❏ Why was the same file generated multiple times?
❏ Where did a file that I didn’t generate come from?

Automation

❏ Are there lots of manual data manipulation steps are there?
❏ Are all custom scripts under version control?
❏ Is your writing (content) under version control?

Publication

❏ Have you archived the exact version of every external application used in your process(es)?
❏ Did you include a reproducibility statement or declaration at the end of your paper(s)?
❏ Are textual statements connected/linked to the supporting results or data?
❏ Did you archived preprints of resulting papers in a public repository?
❏ Did you release the underlying code at the time of publishing a paper?
❏ Are you providing public access to your scripts, runs, and results?

created at the Reproducibility Hackathon 2014



comments powered by Disqus