Copyright and licenses in Open Access publishing

tl;dr copyright restricts knowledge sharing and by extension production; the author can decide to empower knowledge sharing and production with specific Creative Commons licenses (i.e., only CC 0 or CC BY)

If you came here purely for a clarification of Creative Commons licenses in open access publishing click here

Before we start this blog, I want to ask you a question: Do you want knowledge to be shared freely and widely or do you want knowledge to be controlled and potentially censored?

If you answered that you want it to be controlled and potentially censored, you can stop reading. Feel free to read on anyway, but my intended audience is those who want knowledge to be shared and built upon by others. We do not learn or solve problems in isolation, we do it together.

Copyright is a way of controlling how knowledge is shared. The rightsholder has this power, but can also relinquish it. Scholars, as the creators of scholarly papers, own copyright immediately when they write their manuscripts and therefore have a decision to make: Do they want knowledge to be controlled or shared freely?

If you are a scholar, I ask you to take this question to heart because what you decide matters. When you publish in a closed access journal, you relinquish the power to a publisher or make it their exclusive product. Publishers’ business model then leads them to restrict sharing and production of knowledge by restricting access. They demand egregious amounts of money (you pay more per page than most books) and they can make decisions about how your work is available now and in the future (potentially resulting in unavailability due to censorship or extended server outages). If knowledge flows freely and widely it is much harder to make it go away. Read more about why you should care about copyright here.

When publishing open access, you do not relinquish your copyright to a publisher, you retain it [1]. You, the author are and remain the rightsholder. You get to make the decisions on how others may share. With open access publishing, you are typically limited to Creative Commons licenses, which are permissive instead of restrictive. This is progress!

Creative Commons

There still is much confusion about Creative Commons licenses in open access publishing. CC 0, CC BY, CC BY-NC, CC BY-SA, CC BY-ND, CC BY-NC-ND, or CC BY-NC-SA. To capture the understandable confusion: CC WTF [2].

To simplify the idea behind these licenses, let’s talk about these differently: they grant the rightsholder (you) the limited legal tools to control knowledge sharing and reuse. The above licenses range from no legal tools (CC 0) through three sets of legal tools (e.g., CC BY-NC-SA).

So before asking what the licenses are, you should ask yourself: Am I willing to wield any legal tools in court if push comes to shove? If not, CC 0 should be your preferred choice because it grants no legal tools (this is why this is my preferred option). If you do choose another option, you might scare reuse by others because you’re equipping these tools. Often CC 0 might not be provided as a default option (but asking often helps! [3]), so choosing CC BY is the next best option.

However, if you are willing wield these legal tools in court, ask yourself the following: which of these tools restrict who might use your content?

CC BY requires the person [4] who redistributes or reuses (parts of) your work to be legally required to refer back to you (not to be confused with professional citation standards, something wholly different altogether). This does not restrict the potential users of your content because everyone is able to refer back to the original (which is why this is the next best option after CC 0). As such, we call this a free culture license!

CC BY-NC requires the person to refer back to you and not use the content in a commercial manner (i.e., NC = non-commercial). This excludes a set of potential users and therefore restricts knowledge flow in commercial settings. This is not a free culture license because it is only free in a restricted manner (which seems contradictory, doesn’t it?). Arguing that free in non-commercial settings is free for everyone is like arguing that treating all white people equally is equality for all: it simply makes no sense because it refers to subgroups.

CC BY-SA requires the person to refer back to you and to license any works that are reuse the content the same license. This forces the content to stay open (which I encourage), but restricts the content by discouraging reuse in sectors that might not be open for good reasons. Technically this is a free culture license, but it might still discourage people who do not want to work open to reuse. This is a tough decision, but up to you.

CC BY-ND requires the person to refer back to you and only make direct copies, instead of remixing it into a new creative work. This is not a free culture license, because you’re prohibiting reuse! Nobody can reuse your work and put it in a compilation, use one of your figures in a collage for their show-and-tell, etc.

All other licenses are combinations of these four different legal tools [5]. If you chose to publish open access, you’re already making a good decision. But the license matters — most of the time using a restrictive license will result in your publication being considered not open, despite you wanting to make it open. So take some time to decide which license, where the true open licenses are only CC 0, CC BY, and CC BY-SA. All other options, no matter what publishers tell you, is a truly open license for scholarly publishing. When in doubt, feel free to reach out to me and I will try to help you out.

[1] Green open access is a form of restrictive licensing, where the author gets to redistribute their original work (in a limited manner) and by the grace of the publisher who still owns the copyright.

[2] Thanks Nicole Allen (@txtbks). Disclaimer: this is not a real license :-)

[3] I’ve requested CC 0 several times and have never received a “no we don’t do this” and a “we will have to change our system, are you willing to wait?” once.

[4] Where I say person this can of course also be taken as organization or other collective.

[5] Note that Non-Derivatives and Share-Alike cannot go together, because Share-Alike requires a derivate to be made.

 

Validly closed data must be secured

tl;dr if you cannot share your data due to valid reasons, simply not sharing is insufficient if you don’t also create a way to keep those data secure.

I am a big proponent of publicly sharing data (skip the advocacy/propaganda by skipping to paragraph 4), because “interpretations may change, but raw data will remain unchanged forever”. You might disagree with the interpretation and want to reanalyze the data, or in the future, when analysis methods have improved, an interpretation can be strengthened by better analyses. Maybe the data will be used to find answers for other questions (because “the best thing to do with your data will be thought of by someone else”). Maybe the data will be combined with another dataset to research something. The reuse potential of research data is massive — and legitimate.

I find sharing data convincing from a moral perspective because of the aforementioned reasons (amongst others), but there is also a utilitarian aspect to it. If you play the citation game, there is some evidence you get more citations if you share your data. If you share well documented data, it will also save you time when you receive questions about your results. Additionally, anticipating such questions also serves as a reflection exercise, strengthening your own research process. These aren’t all the reasons, but just some examples.

However, data sharing means nothing when data are not properly documented or prepared for (re)use. Dumping a dataset online with the variables V1, V2, through V99 will mean nothing without a clear codebook. Moreover, if data are shared in proprietary formats (such as .xlsx), it might be non-trivial to open the file readily (now and in the future). PDFs with tables that contain data undermine the value of sharing as well, because that data is not machine readable (have you ever tried importing a PDF of data into any analysis program? Exactly). Machine readable also means that a spreadsheet shouldn’t be just visually appealling and easy to understand for humans (argh nested column headers!) but should be readily importable into an analysis program. Both by yourself and anyone else without having to manually adjust or convert those files. From experience, reusing data without documentation can be trivial and rather quick (half an hour; best case scenario), non-trivial and excruciatingly slow (several days; worst case scenario), or simply infeasible. The problem is you never now beforehand whether you or others can reuse your data, except if you set up your data to be shared (even with just your future self). Lastly, sharing a Dropbox link is not sustainable (when you move the file the link breaks). Just go ahead and deposit it in Zenodo (for example).

I know that there are valid reasons to (temporarily) not share data. Transparency cannot and should not be absolute,* otherwise it becomes problematic. For example, if a dataset contains medical data that can (easily) be used to identify a person and his/her medical conditions, sharing is unethical for both moral reasons (e.g., autonomy of the individual is harmed by indirectly disclosing personal information that is not the researcher’s to [indirectly] disclose) and utilitarian reasons (e.g., the individual might experience harmful consequences due to disclosure). A harmful consequence in this specific example of identifiable medical data might be that of insurance providers saving the medical conditions of their clients as a way of (partly) refusing a future claim or even simpler, the discomfort of people realizing that their personal information is now public. There are many scenarios possible, too many fully grasp.

But let us assume a researcher has data that are identifiable, with specific medical conditions, which that researcher does not share. Has this researcher taken sufficient measures to prevent leaks of personal data by simply not sharing them? At face value, no; securing something requires active behaviors preventing access, where not sharing is a passive behavior that does not promote access. Digging deeper, whether the data have been properly secured would require answering several operational questions about data handling such as (in no particular order and not exhaustive):

  1. Who is allowed to have (access to) the data?
  2. Who handled (parts of) the data at any stage during the collection process?
  3. Who could have copies of the data, but are not addressed in 1?
  4. Who should not have (access to) the data?
  5. Who has access to the data?
    • Does everyone in 1?
    • Does anyone from 4?
    • Are those from 3 that are not also in 4 a threat to further disclosure of the data?
  6. How are the data stored?
    • Which security measures are taken? (e.g., encryption, physical security layers)
    • Are security measures clearly agreed for everyone in 1?
  7. What measures are taken if (part of) the data are leaked in some (unforseeable) way?
    • Can damage to the individuals be mitigated?
    • Will individuals be compensated in any way if mitigating is impossible?

When data cannot be shared for valid reasons, securing those data is essential to prevent unwarranted and unintended access. This is the kicker: I think most researchers who don’t share data and say they cannot or should not be shared (e.g., for privacy reasons) don’t think about the problem deeply to really call what they’re doing (i.e., not sharing the data) a way of securing the data.

Securing something requires more active behavior than simply refraining from sharing. If such researchers do not secure their data, they fall victim to their own reasoning because their own rhetoric stresses the importance that data should not become available to others in any way. In such a situation, it seems to me that complacency with respect to data management is more at play in deciding not to share data than true concern about data security.

For example, if you cannot share data due to valid privacy concerns, but subsequently store those data in your dropbox account or have an unencrypted USB with the data lying around, you’re not securing the data properly. As such, if there is no clear threat model for the data, then there would also be no initial reason to keep the data private**. If there is a clear threat model, but it isn’t set up by the researcher, that researcher is exposing the data to the unnecessary risk of leaking despite the threat. Either way, data require attention: either by documenting the data for sharing if there is no threat, or by creating secure protocols to prevent leaks in the case of valid ethical threats.

All researchers need to think about how to manage their data, but many currently don’t. Some profer valid reasons to not share data, but fail to accept the consequences of their own rhetoric, calling in question how serious researchers take ethical considerations with respect to data integrity.

Notes:

[*] Some researchers hide too easily behind this so if anyone refers to this blog as a reason why they won’t share, ask them for their exact arguments.

[**] However, there might be unknown unknowns

Reproducible manuscripts are the future?

This week, a paper that was almost three years in the making finally got published. I feel confident about the paper and the results in it, not because it took three years to write, but because I used a dynamic document to produce it (e.g., Rmarkdown).

Dynamic document? Yes! I no longer had to manually enter all results from the data into tables or the text — the computer did it for me. All I had to do was point it in the right direction. Figures? The same! It saved me tons of time after I made the initial investment to learn how to use it (something else that saved me time was git version control, but that’s for another time).

Why is this important? We are all human, and we make mistakes. And that’s okay! What matters is how we try to remedy those mistakes if they do occur, but even more important, if we can change the way we work in order to prevent some of them, that can help us tremendously. I think dynamic documents like Rmarkdown help us do so.

Markdown is a simple document language, which you can create in any text editor (notepad as well). All it does is standardize how headers are defined (with asterisks, # Header level 1, ## Header level 2, etc.) and how text style is defined (e.g., *text* is italic text). Subsequently, the text file can be converted to pretty much anything (e.g., html, pdf, and even a Word file for those relentless co-authors who love track changes so much).

Rmarkdown takes markdown, and allows you to put R code in between text chunks (which it actually runs!) or even WITHIN the text. Yes, you read that correctly. As such, you can do analyses, make figures, format results (no more manual p-values! statcheck won’t find any errors if you use Rmarkdown) AUTOMATICALLY.

I will just show one exciting and simple aspect, but more step-by-step guides are available (if you want to follow along, install R and Rstudio).

Usually, we tend to type results in the running text ourselves, like such.

Using Rmarkdown to just write a document

Using Rmarkdown to just write a document

As we see, RMarkdown creates a document from a very simple plain text document (this is just markdown doing what it’s supposed to). However, we have a p-value that is calculated based on that t-value and degrees of freedom. So let’s make it dynamic to ensure we have the rounding correct.

Using RMarkdown to generate a document with dynamic results, to ease result presentation

Using RMarkdown to generate a document with dynamic results, to ease result presentation

As we see, the original contained a mistake (p = .027 now turned it into .028) — but Rmarkdown allowed us to catch that by just putting in the R code that generates that p-value and rounds it (i.e., round(pt(q = 1.95, df = 69, lower.tail = FALSE), 3)). No more mistake, and we can be confident. Disclaimer: of course you can still input wrong code — garbage in garbage out!

But this is just a simple example. You can write entire manuscripts in this type of way. That’s what I did for our Collabra manuscript (see here [1]). You can even use citations and alter the citation style without any problem; my experience is that it’s easier with RMarkdown than with EndNote or Mendeley even. All it takes is some initial time investment to learn how to work with it (Markdown can be learned in five minutes) and change your workflow to accomodate this modern approach to writing manuscripts.

The only downside to working this way is that journals don’t accept a raw RMarkdown file as submission, which is too bad — they could link the results directly to the code that produces a result. Now we still end up with a document (e.g., Word file) that hard-codes all results as traditionally was the case. I hope dynamic documents will become more and more widespread in the future, both in how often they’re used by the authors and how publishers support this type of document to truly innovate how scholarly information is communicated and consumed. Image just getting a highlight when you hover over a result, and seeing the underlying code — it would allow you to more directly evaluate the methods in a paper and empower you as a reader to be critical of what you are presented with.

[1] I preferred LaTeX for that project and used Sweave, which is RMarkdown for LaTeX

UPDATE: This blog post has been cross-posted on both the eLife innovation blog and R-bloggers. For more R news and tutorials, please visit https://www.r-bloggers.com/.’

Open mind, open science [Speech for Tilburg uni board]

This is a speech I gave for the board of Tilburg University on why Open science is important for the future of TIlburg University or any knowledge institute, honestly. Speech was given on March 9, 2017.

We produce loads of knowledge at this university, and yet we throw most of that knowledge away. We are throwing away taxpayer’s money; stifling scientific discovery; hampering the curiosity of our students and society.

Research data; peer reviews; research materials; developed software; research articles — they are literally and figuratively thrown away. But these are all building blocks for new knowledge, and the more building blocks available, the more knowledge we can build. Science is a bit like Legos in that sense: more pieces allow you to build greater structures.

Students can use these building blocks to learn better — learn about the real, messy process of discovery for example. Businesses can create innovative tools for information consumption by taking the hitherto unavailable information and reshaping it into something valuable. Citizens can more readily participate in the scientific process. And last but not least, science can become more accurate and more reliable.

Researchers from different faculties and staff members from different support services see the great impact of open science, and today I call on you to give make it part of the new strategic plan.

Let’s as a university work towards making building blocks readily available instead of throwing or locking them away.

Open Access and Open Data, two of these building blocks, have already been on the board’s agenda. All researchers at Tilburg are mandated to make their publications freely available since 2016 and free to re-use by 2024. For Open Data, the plans are already in motion to make all data open per 2018, across all faculties, as I was happy to read in a recent memorandum. So, why am I here?

Open Access and Open Data are part of the larger idea of Open Science; they are only two of many building blocks. Open science is that all research pieces are available for anyone to use for any purpose.

Advancing society is only possible if we radically include society in what we do. The association of Dutch universities, state secretary Sander Dekker, and 17 other organizations have underscored the importance of this when they signed the National Plan Open Science just a few weeks ago.

So I am happy to see the landscape is shifting from closed to open. However, it is happening slowly and incompletely if we only focus only on data and access. Today, I want to focus on one of the biggest problems facing science, that open science can solve: selective publication.

Researchers produce beautifully written articles that read like the script of a good movie: they set the scene with an introduction and methods, have beautiful results, and provide a happy ending in the discussion that makes us think we actually understand the world. And just like movies, not all are available in the public theatre.

But research isn’t about good and successful stories that sell; it is about accuracy. We need to see not just the successful stories, we need to see all stories. We need to see the entire story, not just the exciting parts. Only then can we efficiently produce new knowledge and and understand society.

Because good movies pretend to find effects even if there is truly nothing to find. Here, researchers investigate the relation between jelly beans and pimples.

So they start researching. Nothing.

xkcd comic "Significant"; https://www.xkcd.com/882/

xkcd comic “Significant”; https://www.xkcd.com/882/

More studies; nothing.

More studies; an effect and a good story!

More studies; nothing.

And what is shared? The good story. While there is utterly nothing to find. And this happens daily.

Researchers fool each other, including themselves, and it has been shown time and time again that we researchers wish to see what we want to see. This human bias greatly harms science.

The results are disconcerting. Psychology; cancer research; life sciences; economics — all fields have issues with providing a valid understanding of the world, to a worrying extent. This is due to researchers fooling themselves and confirming prior beliefs to produce good movies instead of being skeptical and producing accurate, good science.

So I and other members across the faculties and services say: Out with the good movies, in with the good science we can actually build on — OPEN science.

Sharing all research that is properly conducted is feasible and will increase validity of results. Moreover, it will lead to less research waste. We as a university could become the first university to share all our research output. All based on a realistic notion of a researcher: do not evaluate results based on whether they are easy to process, confirm your expectations, and whether they provide a good story — evaluate them on their methods.

But please please members of our university, do not expect this change open science to come easily or by magically installing a few policies!

It requires a cultural shift that branches out into the departments and even the individual researchers’ offices. Policies don’t necessarily result in behavior change.

And as a researcher, I want to empirically demonstrate that policy doesn’t necessarily result in behavioral change.

Here is my Open Access audit for this university. Even though policies have been instated by the university board, progress is absent and we are actually doing worse at making our knowledge available to society than in 2012. This way we will not reach ANY of our Open Access goals we have set out.

Open Access audit Tilburg University; data and code: https://github.com/libscie/access-audit

Open Access audit Tilburg University; data and code: https://github.com/libscie/access-audit

In sum, let us advance science by making it open, which in turn will help us advance society. I will keep fighting for more open science. Anyone present, student or staff, I encourage you to do so as well. I am here to help.

Open science is out of the box, and it won’t go back in. The question is, what are we as a university going to do with that knowledge?

A glimpse in the mind of a fabricator

After the Guardian article came out portraying the Meta-Research team’s efforts to improve detection of data fabrication, I received a bunch of e-mails of support, questions, or people who wanted to change science (which we need to for larger issues than scientific misconduct).

The following are excerpts of an e-mail conversation and show how someone might go about fabricating data or why they would do such a thing. It is not always the researcher, it can also be one of the assistants, or anyone involved in the research process. I found this interesting, but most of all very blatant. Maybe people who fabricate are overconfident in their capability to do so.

A woman was telling her friend about the method she uses to produce the results for medical surveys related to drug trials. She stated that she normally has to get around 150-250 patient responses to surveys (I assume by phone) and described using a mobile phone app to take the recordings from only a few responses and manipulate it in order to sound like a different person. She also described making audio recordings of herself putting on different accents in order to generate the responses. As far as I could tell, the motivation was to reduce the amount of work; possibly combined with being able to claim any voucher associated with completing the questionnaire.

It sounded like it was at a low level of worker, presumably not someone who was involved in using the data. It sounded as if the recordings were audited in some way – that was why she was using the voice modulator in order to generate the samples but it wasn’t clear whether the audit was someone doing spot checks listening to them; or something more automated. I got the impression she had learnt the trick off a colleague but I’m not sure. As far as I could tell there was no intention to push the results one way or the other, but presumably a very uncertain result is almost as dangerous if not more.

If anyone has anecdotes of people boasting about fabrication that they have overheard and would like to share, please send them to me. We hardly know how people go about fabricating data, so anecdotes are more than welcome to improve some understanding and provide food-for-thought.

Interview Danish Psychology Association responses

Below, I copy my responses to an interview for the Danish Psychology Association. My responses are in italic. I don’t know when the article will be shared, but I am posting my responses here,  licensed CC0. This is also my way of sharing the full responses, which won’t be copied verbatim into an article because they are simply too lengthy.
***What do you envision that this kind of technology could do in a foreseable future?What do you mean by “this” kind of technology? If you mean computerized tools assisting scholars, I think there is massive potential in both development of new tools to extract information (for example what ContentMine is doing) and in application. Some formidable means are already here. For example, how much time do you spend as a scholar to produce your manuscript when you want to submit it? This does not need to cost half a day when there are highly advanced, modern submission managers. Same when submitting revisions. Additionally, annotating documents colloboratively on the Internet with hypothes.is is great fun, highly educational, and productive. I could go on and on about the potential of computerized tools for scholars.

Why do you think this kind of computerized statistical policing is necessary in the field of psychology and in science in general?

Again, what is “this kind of computerized statistical policing”? I assume you’re talking about statcheck only for the rest of my answer. Moreover, it is not policing — a spell-checker does not police your grammar, it helps you improve your grammar. statcheck does not police your reporting, it helps you improve your reporting. Additionaly, I would like to reverse the question: should science not care about the precision of scientific results? With all the rhetoric going on in the USA about ‘alternative facts’, I think it highlights how dangerous it is to let go of our desire to be precise in what we do. Science’s inprecision has trickle down effects in the policies that are subsequently put in place, for example. We put in all kinds of creative and financial effort to progress our society, why should we let it be diminished by simple mistakes that can be prevented so easily? If we agree that science has to be precise in the evidence it presents, we need to take steps to make sure it is. Making a mistake is not a problem, it is all about how you deal with it.

So far the Statcheck tool is only checking if the math behind the statistical calculations in the published articles are wrong when the null-hypothesis significance testing has been used. What you refer to as reporting errors in your article from December last year published in Behaviour Research Methods. But these findings aren’t problematic as long as the conclusions in the articles aren’t affected by the reporting errors?

They aren’t problematic?—who is the judge of whether errors aren’t problematic? If you consider just statistical significance, there are still 1/8 papers that contain such a problem. Moreover, all errors in reported results affect meta-analyses — is that not also problematic down-the-line? I find it showing of hubris for any individual to say that they can determine whether something is problematic or not, when there can be many things that that person doesn’t realize even can be affected. It should be open to discussion, so information about problems need to be shared and discussed. This is exactly what I aimed to do with the statcheck reports on PubPeer for a very specific problem.

In the article in Behaviour Research Methods you find that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. And that One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. What does this mean? I’m not a mathematician.

You don’t need to be a mathematician to understand this. Say we have a set of eight research articles presenting statistical results with certain conclusions. Four of those eight will contain a result that does not match up to the results presented (i.e., inconsistent), but does not affect the broad strokes of the conclusion. One of those eight contains a result that does not match up to the conclusion and potentially nullifies the conclusions. For example, if a study contains a result that does not match up with the conclusion, but concluded that a new behavioral therapy is effective at treating depression. That means the evidence for the therapy effectiveness is undermined — affecting direct clinical benefits as a result.

Why are these findings important?

Science is vital to our society. Science is based on empirical evidence. Hence, it is vital to our society that empirical evidence is precise and not distorted by preventable or remediable mistakes. Researchers make mistakes, no big deal. People like to believe scientists are more objective and more precise than other humans — but we’re not. The way we build checks- and balances to prevent mistakes from proliferating and propagating into (for example) policy is crucial. statcheck contributes to understanding and correcting one specific aspect of such mistakes we can all make.

Why did you decide to run the statcheck on psychology papers specifically?

statcheck was designed to extract statistical results reported as prescribed by the American Psychological Association. It is one of the most standardized ways of reporting statistical results. It makes sense to apply software developed on standards in psychology to psychology.

Why do you find so many statistical errors in psychology papers specifically?

I don’t think this is a problem to psychology specifically, but more a problem of how empirical evidence is reported and how manuscripts are written.

Are psychologists not as skilled at doing statistical calculations as other scholars?

I don’t think psychologists are worse at doing statistical calculations. I think point-and-click software has made it easy for scholars to compute statistical results, but not to insert them into manuscripts reliably. Typing in those results is error prone. I make mistakes when I’m doing my finances at home, because I have to copy the numbers. I wish I had something like statcheck for my finances. But I don’t. For scientific results, I promote writing manuscripts dynamically. This means that you no longer type in the results manually, but inject the code that contains the result. This is already possible with tools such as Rmarkdown and can greatly increase the productivity of the researcher. It has saved my skin multiple times, although you still have to be vigilant for mistakes (wrong code produces wrong results).

Have you run the Statcheck tool on your own statistical NHST-testing in the mentioned article?

Yes! This was the first thing I did, way before I was running it on other papers. Moreover, I was non-selective when I started scanning other people’s papers — I apparently even made a statcheck report that got posted on PubPeer for my supervisor (see here). He laughed, because the paper was on reporting inconsistencies and the gross inconsistency was simply an example of one in the running text. A false positive, highlighting that statcheck‘s results always need to be checked by a human before concluding anything definitive.

Critics call Statcheck “a new form of harassment” and accuse you of being “a self appointed data police”. Can you understand these reactions?

Proponents of statcheck praise it as a good service. Researchers who study how researchers conduct research are called methodological terrorists. Any change comes with proponents and critics. Am I a self-appointed data policer? To some, maybe. To others, I am simply providing a service. I don’t chase individuals and I am not interested in that at all — I do not see myself as part of a “data police”. That people think these reports is like getting reprimanded highlights to me that there still rests a taboo on skepticism within science. Skepticism is one of the ideals of science, so let’s aim for that.

Why do you find it necessary to send out thousands of emails to scholars around the world informing them that their work has been reviewed and point out to them if they have miscalculated?

It was not necessary — I thought it was worthwhile. Why do some scholars find it necessary to e-mail a colleague about their thoughts on a paper? Because they think it is worthwhile and can help them or the original authors. Exactly my intentions by teaming up with PubPeer and posting those 50,000 statcheck reports.

Isn’t it necessary and important for ethical reasons to be able to make a distinction between deliberate miscalculations and miscalculations by mistake when you do this kind of statcheck?

If I was making accusations about gross incompetence towards the original authors, such a distinction would clearly be needed. But I did not make accusations at all. I simply stated the information available, without any normative or judging statements. Mass-scale post-publication peer review of course brings with it ethical problems, which I carefully weighed before I started posting statcheck reports with the PubPeer team. The formulation of these reports was discussed within our group and we all agreed this was worthwhile to do.

As a journalist I can write and publish an article with one or two factual errors. This doesn’t mean the article isn’t of a general high journalistic standard or that the content of the article isn’t of great relevance for the public- couldn’t you make the same argument about a scientific article? And when you catalogue these errors online you are at the risk of blowing up a storm in a tea cup and turn everybody’s eyes away from the actual scientific findings?

Journalists and scholars are playing different games. An offside in football is not a problem in tennis and the comparison between journalists and scholars seems similar to me. I am not saying that an article is worthless if it contains an inconsistency, I just say that it is worth looking at before building new research lines on it. Psychology has wasted millions and millions of euros/dollars/pounds/etc on chasing ephemeral effects that are totally unreasonable, as several replication projects have highlighted in the last years. Moreover, I think the general opinion of science will only improve if we are more skeptical and critical of each other instead of trusting findings based on reputation, historical precedent, or ease with which we can assimilate the findings.

False claims of copyright and STM

Recently, I have become interested in the issue of false claims of copyright (i.e., copyfraud) in publishing. I just wrote to the publisher’s association (STM) to ask them what their perspective is on copyfraud is and whether they condone such behavior by their member associations. Read my letter here. I will update this blog when I get a response.

An example of copyright is this index page from the Lancet, published in 1823. Let’s assume copyright for this index page was actively registered and that it received protection under copyright legislation (copyright was not automatic before the 1886 Berne Convention). That would mean the duration of copyright would have to be at least 192 years for this claim to be valid! Even under the current situation, copyright does not last that long for organizations (if I am correct, it is around ~120 years).

Regretfully, it is easy for a rightsholder to legally pursue someone who violates their copyright, but when someone falsely claims to be a rightsholder the public cannot fight back in the same way. This is an inherent asymmetric power relation in copyright. The World Intellectual Property Organization (WIPO) does not provide a way to easily report potential copyfraud it seems and I would like to call on them to make this possible. Opening up a way to reliably report it at least allows everyone to get a better view on how often copyfraud might occur. Even better, form a legislation that empowers the public to fight back against copyfraud.

Copyfraud is a widespread problem that does not only occur with old works, but also with for example works by U.S. federal employees, which are uncopyrightable under United States federal law 17 U.S. Code § 105). Recent articles by the 44th President Barack Obama have been illegally copyrighted and yet all we can do is ask nicely that they remove the copyright notice.

Fighting for “non-negotiable” copyright

Recently, the American Psychological Association (APA) has decided to not allow me to retain my copyright for a book chapter I wrote — after weeks of back-and-forth of them saying they are “very flexible” in their current license agreement and neglecting my counter-offers. How requesting all copyright and making the agree non-negotiable is flexible, I do not know. I am not the only one who has stumbled upon such problems (see Rajiv Jhanghiani’s blog post).

In February I asked the editors of the book for a copy of the license agreement, suspecting that the APA would want a full copyright transfer. On March 29 2016 I received the agreement (available here) and it indeed stated the suspected copyright transfer.

I asked the editors to inquire for an alteration to the agreement, such that the APA could print the chapter, make money but the copyright would remain with us (and I could publish a copy online under a CC-BY or CC-0 license). Note that I not only feel morally obliged to do this, but also practically have to: if I sign away all rights I cannot reuse my own book chapter in my dissertation, without getting prior approval of the rightsholder and hoping I get that exception.

Alas, the APA stated that the license agreement is “non-negotiable” and that their “policies on use of the material with proper citation are not at all stringent.” Note that they here refer to academic citations, which is not concerned with copyright (i.e., reuse) but attribution of ideas and professional standards.

Moreover, acquiring all copyright was essential according to them, due to the “financial risk assessment” involved in publishing a book. They neglected to respond to my argument that the free (!) chapter does not, in itself, pose a risk and therefore any risk is incurred by their own doing and I will not pay for it with my copyright.

So I made a counter-offer: a non-exclusive reproduction right, where I simply put the chapter in the public domain and the APA can print the chapter, make money, and I cannot make any claims on their revenue/profit, all while others are able to reuse the chapter freely and without restrictions. They did not agree after offering this three times and just now, my deadline passed for them to accept this offer.

So now I will publish this chapter elsewhere. The beauty is that I just put it online with a non-restrictive license, so they can technically still print it if they’d like despite their claim that I do not allow publication of the book chapter. But now you can read it and freely reuse it. They are simply blocking publication because they cannot publish on their terms and I want to renegotiate the apparently one-sided agreement.

Note: This matter is wholly unrelated to the recent post about a flaw in EBSOhost that inadvertently made all of APA Open Access if you had a direct link.

Did I just ‘make’ all of APA Open Access?

The American Psychological Association (APA) is one massive, (primarily) closed-access publisher in psychology, which Tilburg University accesses through EBSCOhost. This has accidentally made all of the APA published journals free to access. I assume both the APA and EBSCOhost are unaware of this.

During my mining endeavors I also wanted to mine the APA (for research purposes, as described in earlier posts here and here). After collecting links to access these articles via EBSOhost with my spiderer, I accidentally tried to access one of those links outside of the university network — to my surprise, I could!

I tried a VPN to access it from several other countries in the world, and it still worked. Other computers, the same. Open access to closed articles — a seeming paradox but possible apparently.

Direct links to EBSCOhost simply bypass all technical walls implemented by EBSCO, which the APA will not be all too happy with. A stable session ID works fine, even when the collected links are accessed more than six months later. I figure this generalizes to non-APA articles in EBSCOhost, but I have not tried that.

For example, this link (try it!) provides access to the paper on “Arab Youth Involvement in Delinquency” (no specific reason why I chose this one, just the first random pick). You can even navigate to the PDF that is attached to it. If you follow the link based on the DOI, you hit a paywall. You can play around with one of these 1000 links to see this actually works (see this spreadsheet). I collected more than 70,000 (!) of these, which are all free to access with these direct links, even when the APA probably wants them paywalled outside of Tilburg’s network.

An example of accessing a closed article freely through EBSCOhost.

An example of accessing a closed article freely through EBSCOhost.

And of course, if you have these links, it is relatively easy to systematically download these and identify which link is which paper. I am not dumping an entire database of 70,000 links with article DOIs and article titles simply because I figure this is a flaw in the system and I do not want to encourage the APA and their lawyers, considering I am already busy enough with Elsevier. However, if you need these links for mining purposes, send me an email or tweet.

If closed access publishers worry so much about the widespread use of Sci-Hub and how to maintain revenue in an increasingly Open Access world, these kinds of technological flaws undermine even their closed model. I did not actively try to hack their system (although I might be accused of hacking for this), I just stumbled upon this per chance. They can just as well dump all their articles in the Open if this is so easy (please do).

UPDATE: The example link now requires a login. Here are some additional examples, from the spreadsheet — example, example, example.