Monthly Archives: July 2015

Starting a new research project

tl;dr – I created a research project template for ease of use, but feel free to use and adjust it. Description below.


After starting many research projects and trying to figure out a good workflow, I noticed I could use a template for when I start a new research project. In this contribution to my Open Science Notebook, I outline my template that is publicly available for others to base theirs on (see Github).

Each project consists of specific contributions by specific people. Not every project has the same type of contributions and most research projects are collaborative. Therefore, it is important to know who has done what, but more importantly, who is going to do what. In order to help clarify the division of labor, the contribution taxonomy as proposed in Nature last year is added. This taxonomy was tested in a sample of 230 corresponding authors and 85% of them found this taxonomy easy to use. This taxonomy is included in the file \texttt{contributions.csv}. These contributions are most easily signed with initials.

Moreover, I always create a lot of files I do not need later on, but do not feel like I can remove confidently (e.g., random code snippets). I therefore always create an archive folder, for all those small files lying around but do not readily need. This can also be used to store temporary files when running code, the raw data in READ-ONLY format, revisions of a manuscript, etc.

Furthermore, folders such as data, figures, and bibliography are included for easy organization. Archiving the original raw data in the archive folder might seem redundant if there is a data folder, but I always consider it best to actually put it in another folder to prevent random deletion when using working data files. A folder for figures is easy in the manuscript submission process, but can also be extended to include code that creates each figure (e.g., Fig1.tiff and Fig1.R, the image and its underlying code, respectively). For the bibliography folder I typically just dump my bibtex file, but it could be used to save all the PDFs as well for easy reference.

Many people have certain specific needs in their data analyses and create custom functions or adjust existing functions. In order to be able to run the scripts included for data analysis, these functions need to be included. No better way than to create a spot to put and find them. The scripts folder is similar but is the place for data analysis scripts.

What I have found to be very handy in running data analyses is to use RMarkdown to create an annotated file. If knit (or spin) into a .md file, Github will generate this in the browser, making it easy to write your report after doing all the analyses. This also allows you to convey a larger picture than you can in the research paper itself (e.g., the results of assumption checking, which we know everybody does, right?).

At this point I feel like I am just creating folder after folder and there must be more. Bad thing is, there is not, except for one file that can help others navigate your folder if you wish to share it on github. This is to make use of the file to highlight the most important files (like I do here). Often times a project folder will become a haphazard of files regardless of the structure you try to give it. Highlighting those files which are most important to understanding your results and reproducing your findings will serve as waymarkers for others.

Providing a stable project structure will help in organizing your new research projects and, as time goes by, to better understand your previous research projects. The template I made now is hardly finished and if anyone has suggestions, feel free (or fork and issue a pull request if your on github).