Using tools like git for version control and knitr for dynamic figure generation are great steps forward towards better research transparency and reproducibility.
But there are also steps to be made in improving code and preparing data for re-use. These steps don’t necessarily involve the use of new tools, but instead would be the result of applying “best practice” guidelines.
We aim to begin compiling resources here. This is only a start, and we’d encourage anyone to suggest further guidelines, particularly those that are specific to various statistical software packages.
Very useful manual:
We’d recommend reading the full guide, but briefly outline the sections of “RA Manual: Notes on Writing Code” here so as to give a preview. We also include one example from each section, but there are many other examples in the original paper.
Example: use the right data structures
Instead of writing a function like:
myfunction <- function(name, surname, age, height) {
if (age < 18) {}
}
We can use a structure like an object or array for that.
myfunction <- function(person) {
if (person.age < 18) { }
}
This would make the code easier to read and maintain.
Example: choose descriptive names
It might be easier for other users to read and adapt existing code if the variables, folders, classes and other elements have simple and intuitive names. So prefer writing code like:
calculate_status <- function(person) {}
Rather than:
cs <- function(p) {}
Example: check for errors
Users can use your code with different parameters and in different environments. It is a good practice to include code that check for erroneous values and provide clear feedback.
myfunction <- function(x, y) {
return x / y;
}
myfunction <- function(x, y) {
if (y == 0)
return 'y must be different than zero!';
return x / y;
}
Of course you can also write tests to make sure your verifications are working correctly ;-)