Reproducible manuscripts are the future?

This week, a paper that was almost three years in the making finally got published. I feel confident about the paper and the results in it, not because it took three years to write, but because I used a dynamic document to produce it (e.g., Rmarkdown).

Dynamic document? Yes! I no longer had to manually enter all results from the data into tables or the text — the computer did it for me. All I had to do was point it in the right direction. Figures? The same! It saved me tons of time after I made the initial investment to learn how to use it (something else that saved me time was git version control, but that’s for another time).

Why is this important? We are all human, and we make mistakes. And that’s okay! What matters is how we try to remedy those mistakes if they do occur, but even more important, if we can change the way we work in order to prevent some of them, that can help us tremendously. I think dynamic documents like Rmarkdown help us do so.

Markdown is a simple document language, which you can create in any text editor (notepad as well). All it does is standardize how headers are defined (with asterisks, # Header level 1, ## Header level 2, etc.) and how text style is defined (e.g., *text* is italic text). Subsequently, the text file can be converted to pretty much anything (e.g., html, pdf, and even a Word file for those relentless co-authors who love track changes so much).

Rmarkdown takes markdown, and allows you to put R code in between text chunks (which it actually runs!) or even WITHIN the text. Yes, you read that correctly. As such, you can do analyses, make figures, format results (no more manual p-values! statcheck won’t find any errors if you use Rmarkdown) AUTOMATICALLY.

I will just show one exciting and simple aspect, but more step-by-step guides are available (if you want to follow along, install R and Rstudio).

Usually, we tend to type results in the running text ourselves, like such.

Using Rmarkdown to just write a document

Using Rmarkdown to just write a document

As we see, RMarkdown creates a document from a very simple plain text document (this is just markdown doing what it’s supposed to). However, we have a p-value that is calculated based on that t-value and degrees of freedom. So let’s make it dynamic to ensure we have the rounding correct.

Using RMarkdown to generate a document with dynamic results, to ease result presentation

Using RMarkdown to generate a document with dynamic results, to ease result presentation

As we see, the original contained a mistake (p = .027 now turned it into .028) — but Rmarkdown allowed us to catch that by just putting in the R code that generates that p-value and rounds it (i.e., round(pt(q = 1.95, df = 69, lower.tail = FALSE), 3)). No more mistake, and we can be confident. Disclaimer: of course you can still input wrong code — garbage in garbage out!

But this is just a simple example. You can write entire manuscripts in this type of way. That’s what I did for our Collabra manuscript (see here [1]). You can even use citations and alter the citation style without any problem; my experience is that it’s easier with RMarkdown than with EndNote or Mendeley even. All it takes is some initial time investment to learn how to work with it (Markdown can be learned in five minutes) and change your workflow to accomodate this modern approach to writing manuscripts.

The only downside to working this way is that journals don’t accept a raw RMarkdown file as submission, which is too bad — they could link the results directly to the code that produces a result. Now we still end up with a document (e.g., Word file) that hard-codes all results as traditionally was the case. I hope dynamic documents will become more and more widespread in the future, both in how often they’re used by the authors and how publishers support this type of document to truly innovate how scholarly information is communicated and consumed. Image just getting a highlight when you hover over a result, and seeing the underlying code — it would allow you to more directly evaluate the methods in a paper and empower you as a reader to be critical of what you are presented with.

[1] I preferred LaTeX for that project and used Sweave, which is RMarkdown for LaTeX

UPDATE: This blog post has been cross-posted on both the eLife innovation blog and R-bloggers. For more R news and tutorials, please visit https://www.r-bloggers.com/.’

One thought on “Reproducible manuscripts are the future?

  1. Corinne Riddell

    Hi Chris,

    I came across your blog post through eLife’s tweet about potentially accepted manuscripts in Rmd. This would be a dream!

    Would you be able to comment on using git for version control? I recently started using Rmd + github for my manuscripts. One issue I’m having is trying to determine what exactly has been changed when a co-author makes changes to the free text in the Rmd document. For example, if I have an Introduction paragraph and a co-author makes several edits to many rows of text then *all the rows* will be highlighted green in the “compare diffs” document and I can’t tell easily what *specific words* were modified. Have you encountered this issue in your workflow?

    Kindly,

    Corinne

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


e.g. 0000-0002-7299-680X

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>