Open mind, open science [Speech for Tilburg uni board]

This is a speech I gave for the board of Tilburg University on why Open science is important for the future of TIlburg University or any knowledge institute, honestly. Speech was given on March 9, 2017.

We produce loads of knowledge at this university, and yet we throw most of that knowledge away. We are throwing away taxpayer’s money; stifling scientific discovery; hampering the curiosity of our students and society.

Research data; peer reviews; research materials; developed software; research articles — they are literally and figuratively thrown away. But these are all building blocks for new knowledge, and the more building blocks available, the more knowledge we can build. Science is a bit like Legos in that sense: more pieces allow you to build greater structures.

Students can use these building blocks to learn better — learn about the real, messy process of discovery for example. Businesses can create innovative tools for information consumption by taking the hitherto unavailable information and reshaping it into something valuable. Citizens can more readily participate in the scientific process. And last but not least, science can become more accurate and more reliable.

Researchers from different faculties and staff members from different support services see the great impact of open science, and today I call on you to give make it part of the new strategic plan.

Let’s as a university work towards making building blocks readily available instead of throwing or locking them away.

Open Access and Open Data, two of these building blocks, have already been on the board’s agenda. All researchers at Tilburg are mandated to make their publications freely available since 2016 and free to re-use by 2024. For Open Data, the plans are already in motion to make all data open per 2018, across all faculties, as I was happy to read in a recent memorandum. So, why am I here?

Open Access and Open Data are part of the larger idea of Open Science; they are only two of many building blocks. Open science is that all research pieces are available for anyone to use for any purpose.

Advancing society is only possible if we radically include society in what we do. The association of Dutch universities, state secretary Sander Dekker, and 17 other organizations have underscored the importance of this when they signed the National Plan Open Science just a few weeks ago.

So I am happy to see the landscape is shifting from closed to open. However, it is happening slowly and incompletely if we only focus only on data and access. Today, I want to focus on one of the biggest problems facing science, that open science can solve: selective publication.

Researchers produce beautifully written articles that read like the script of a good movie: they set the scene with an introduction and methods, have beautiful results, and provide a happy ending in the discussion that makes us think we actually understand the world. And just like movies, not all are available in the public theatre.

But research isn’t about good and successful stories that sell; it is about accuracy. We need to see not just the successful stories, we need to see all stories. We need to see the entire story, not just the exciting parts. Only then can we efficiently produce new knowledge and and understand society.

Because good movies pretend to find effects even if there is truly nothing to find. Here, researchers investigate the relation between jelly beans and pimples.

So they start researching. Nothing.

xkcd comic "Significant"; https://www.xkcd.com/882/

xkcd comic “Significant”; https://www.xkcd.com/882/

More studies; nothing.

More studies; an effect and a good story!

More studies; nothing.

And what is shared? The good story. While there is utterly nothing to find. And this happens daily.

Researchers fool each other, including themselves, and it has been shown time and time again that we researchers wish to see what we want to see. This human bias greatly harms science.

The results are disconcerting. Psychology; cancer research; life sciences; economics — all fields have issues with providing a valid understanding of the world, to a worrying extent. This is due to researchers fooling themselves and confirming prior beliefs to produce good movies instead of being skeptical and producing accurate, good science.

So I and other members across the faculties and services say: Out with the good movies, in with the good science we can actually build on — OPEN science.

Sharing all research that is properly conducted is feasible and will increase validity of results. Moreover, it will lead to less research waste. We as a university could become the first university to share all our research output. All based on a realistic notion of a researcher: do not evaluate results based on whether they are easy to process, confirm your expectations, and whether they provide a good story — evaluate them on their methods.

But please please members of our university, do not expect this change open science to come easily or by magically installing a few policies!

It requires a cultural shift that branches out into the departments and even the individual researchers’ offices. Policies don’t necessarily result in behavior change.

And as a researcher, I want to empirically demonstrate that policy doesn’t necessarily result in behavioral change.

Here is my Open Access audit for this university. Even though policies have been instated by the university board, progress is absent and we are actually doing worse at making our knowledge available to society than in 2012. This way we will not reach ANY of our Open Access goals we have set out.

Open Access audit Tilburg University; data and code: https://github.com/libscie/access-audit

Open Access audit Tilburg University; data and code: https://github.com/libscie/access-audit

In sum, let us advance science by making it open, which in turn will help us advance society. I will keep fighting for more open science. Anyone present, student or staff, I encourage you to do so as well. I am here to help.

Open science is out of the box, and it won’t go back in. The question is, what are we as a university going to do with that knowledge?

A glimpse in the mind of a fabricator

After the Guardian article came out portraying the Meta-Research team’s efforts to improve detection of data fabrication, I received a bunch of e-mails of support, questions, or people who wanted to change science (which we need to for larger issues than scientific misconduct).

The following are excerpts of an e-mail conversation and show how someone might go about fabricating data or why they would do such a thing. It is not always the researcher, it can also be one of the assistants, or anyone involved in the research process. I found this interesting, but most of all very blatant. Maybe people who fabricate are overconfident in their capability to do so.

A woman was telling her friend about the method she uses to produce the results for medical surveys related to drug trials. She stated that she normally has to get around 150-250 patient responses to surveys (I assume by phone) and described using a mobile phone app to take the recordings from only a few responses and manipulate it in order to sound like a different person. She also described making audio recordings of herself putting on different accents in order to generate the responses. As far as I could tell, the motivation was to reduce the amount of work; possibly combined with being able to claim any voucher associated with completing the questionnaire.

It sounded like it was at a low level of worker, presumably not someone who was involved in using the data. It sounded as if the recordings were audited in some way – that was why she was using the voice modulator in order to generate the samples but it wasn’t clear whether the audit was someone doing spot checks listening to them; or something more automated. I got the impression she had learnt the trick off a colleague but I’m not sure. As far as I could tell there was no intention to push the results one way or the other, but presumably a very uncertain result is almost as dangerous if not more.

If anyone has anecdotes of people boasting about fabrication that they have overheard and would like to share, please send them to me. We hardly know how people go about fabricating data, so anecdotes are more than welcome to improve some understanding and provide food-for-thought.

Interview Danish Psychology Association responses

Below, I copy my responses to an interview for the Danish Psychology Association. My responses are in italic. I don’t know when the article will be shared, but I am posting my responses here,  licensed CC0. This is also my way of sharing the full responses, which won’t be copied verbatim into an article because they are simply too lengthy.
***What do you envision that this kind of technology could do in a foreseable future?What do you mean by “this” kind of technology? If you mean computerized tools assisting scholars, I think there is massive potential in both development of new tools to extract information (for example what ContentMine is doing) and in application. Some formidable means are already here. For example, how much time do you spend as a scholar to produce your manuscript when you want to submit it? This does not need to cost half a day when there are highly advanced, modern submission managers. Same when submitting revisions. Additionally, annotating documents colloboratively on the Internet with hypothes.is is great fun, highly educational, and productive. I could go on and on about the potential of computerized tools for scholars.

Why do you think this kind of computerized statistical policing is necessary in the field of psychology and in science in general?

Again, what is “this kind of computerized statistical policing”? I assume you’re talking about statcheck only for the rest of my answer. Moreover, it is not policing — a spell-checker does not police your grammar, it helps you improve your grammar. statcheck does not police your reporting, it helps you improve your reporting. Additionaly, I would like to reverse the question: should science not care about the precision of scientific results? With all the rhetoric going on in the USA about ‘alternative facts’, I think it highlights how dangerous it is to let go of our desire to be precise in what we do. Science’s inprecision has trickle down effects in the policies that are subsequently put in place, for example. We put in all kinds of creative and financial effort to progress our society, why should we let it be diminished by simple mistakes that can be prevented so easily? If we agree that science has to be precise in the evidence it presents, we need to take steps to make sure it is. Making a mistake is not a problem, it is all about how you deal with it.

So far the Statcheck tool is only checking if the math behind the statistical calculations in the published articles are wrong when the null-hypothesis significance testing has been used. What you refer to as reporting errors in your article from December last year published in Behaviour Research Methods. But these findings aren’t problematic as long as the conclusions in the articles aren’t affected by the reporting errors?

They aren’t problematic?—who is the judge of whether errors aren’t problematic? If you consider just statistical significance, there are still 1/8 papers that contain such a problem. Moreover, all errors in reported results affect meta-analyses — is that not also problematic down-the-line? I find it showing of hubris for any individual to say that they can determine whether something is problematic or not, when there can be many things that that person doesn’t realize even can be affected. It should be open to discussion, so information about problems need to be shared and discussed. This is exactly what I aimed to do with the statcheck reports on PubPeer for a very specific problem.

In the article in Behaviour Research Methods you find that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. And that One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. What does this mean? I’m not a mathematician.

You don’t need to be a mathematician to understand this. Say we have a set of eight research articles presenting statistical results with certain conclusions. Four of those eight will contain a result that does not match up to the results presented (i.e., inconsistent), but does not affect the broad strokes of the conclusion. One of those eight contains a result that does not match up to the conclusion and potentially nullifies the conclusions. For example, if a study contains a result that does not match up with the conclusion, but concluded that a new behavioral therapy is effective at treating depression. That means the evidence for the therapy effectiveness is undermined — affecting direct clinical benefits as a result.

Why are these findings important?

Science is vital to our society. Science is based on empirical evidence. Hence, it is vital to our society that empirical evidence is precise and not distorted by preventable or remediable mistakes. Researchers make mistakes, no big deal. People like to believe scientists are more objective and more precise than other humans — but we’re not. The way we build checks- and balances to prevent mistakes from proliferating and propagating into (for example) policy is crucial. statcheck contributes to understanding and correcting one specific aspect of such mistakes we can all make.

Why did you decide to run the statcheck on psychology papers specifically?

statcheck was designed to extract statistical results reported as prescribed by the American Psychological Association. It is one of the most standardized ways of reporting statistical results. It makes sense to apply software developed on standards in psychology to psychology.

Why do you find so many statistical errors in psychology papers specifically?

I don’t think this is a problem to psychology specifically, but more a problem of how empirical evidence is reported and how manuscripts are written.

Are psychologists not as skilled at doing statistical calculations as other scholars?

I don’t think psychologists are worse at doing statistical calculations. I think point-and-click software has made it easy for scholars to compute statistical results, but not to insert them into manuscripts reliably. Typing in those results is error prone. I make mistakes when I’m doing my finances at home, because I have to copy the numbers. I wish I had something like statcheck for my finances. But I don’t. For scientific results, I promote writing manuscripts dynamically. This means that you no longer type in the results manually, but inject the code that contains the result. This is already possible with tools such as Rmarkdown and can greatly increase the productivity of the researcher. It has saved my skin multiple times, although you still have to be vigilant for mistakes (wrong code produces wrong results).

Have you run the Statcheck tool on your own statistical NHST-testing in the mentioned article?

Yes! This was the first thing I did, way before I was running it on other papers. Moreover, I was non-selective when I started scanning other people’s papers — I apparently even made a statcheck report that got posted on PubPeer for my supervisor (see here). He laughed, because the paper was on reporting inconsistencies and the gross inconsistency was simply an example of one in the running text. A false positive, highlighting that statcheck‘s results always need to be checked by a human before concluding anything definitive.

Critics call Statcheck “a new form of harassment” and accuse you of being “a self appointed data police”. Can you understand these reactions?

Proponents of statcheck praise it as a good service. Researchers who study how researchers conduct research are called methodological terrorists. Any change comes with proponents and critics. Am I a self-appointed data policer? To some, maybe. To others, I am simply providing a service. I don’t chase individuals and I am not interested in that at all — I do not see myself as part of a “data police”. That people think these reports is like getting reprimanded highlights to me that there still rests a taboo on skepticism within science. Skepticism is one of the ideals of science, so let’s aim for that.

Why do you find it necessary to send out thousands of emails to scholars around the world informing them that their work has been reviewed and point out to them if they have miscalculated?

It was not necessary — I thought it was worthwhile. Why do some scholars find it necessary to e-mail a colleague about their thoughts on a paper? Because they think it is worthwhile and can help them or the original authors. Exactly my intentions by teaming up with PubPeer and posting those 50,000 statcheck reports.

Isn’t it necessary and important for ethical reasons to be able to make a distinction between deliberate miscalculations and miscalculations by mistake when you do this kind of statcheck?

If I was making accusations about gross incompetence towards the original authors, such a distinction would clearly be needed. But I did not make accusations at all. I simply stated the information available, without any normative or judging statements. Mass-scale post-publication peer review of course brings with it ethical problems, which I carefully weighed before I started posting statcheck reports with the PubPeer team. The formulation of these reports was discussed within our group and we all agreed this was worthwhile to do.

As a journalist I can write and publish an article with one or two factual errors. This doesn’t mean the article isn’t of a general high journalistic standard or that the content of the article isn’t of great relevance for the public- couldn’t you make the same argument about a scientific article? And when you catalogue these errors online you are at the risk of blowing up a storm in a tea cup and turn everybody’s eyes away from the actual scientific findings?

Journalists and scholars are playing different games. An offside in football is not a problem in tennis and the comparison between journalists and scholars seems similar to me. I am not saying that an article is worthless if it contains an inconsistency, I just say that it is worth looking at before building new research lines on it. Psychology has wasted millions and millions of euros/dollars/pounds/etc on chasing ephemeral effects that are totally unreasonable, as several replication projects have highlighted in the last years. Moreover, I think the general opinion of science will only improve if we are more skeptical and critical of each other instead of trusting findings based on reputation, historical precedent, or ease with which we can assimilate the findings.

False claims of copyright and STM

Recently, I have become interested in the issue of false claims of copyright (i.e., copyfraud) in publishing. I just wrote to the publisher’s association (STM) to ask them what their perspective is on copyfraud is and whether they condone such behavior by their member associations. Read my letter here. I will update this blog when I get a response.

An example of copyright is this index page from the Lancet, published in 1823. Let’s assume copyright for this index page was actively registered and that it received protection under copyright legislation (copyright was not automatic before the 1886 Berne Convention). That would mean the duration of copyright would have to be at least 192 years for this claim to be valid! Even under the current situation, copyright does not last that long for organizations (if I am correct, it is around ~120 years).

Regretfully, it is easy for a rightsholder to legally pursue someone who violates their copyright, but when someone falsely claims to be a rightsholder the public cannot fight back in the same way. This is an inherent asymmetric power relation in copyright. The World Intellectual Property Organization (WIPO) does not provide a way to easily report potential copyfraud it seems and I would like to call on them to make this possible. Opening up a way to reliably report it at least allows everyone to get a better view on how often copyfraud might occur. Even better, form a legislation that empowers the public to fight back against copyfraud.

Copyfraud is a widespread problem that does not only occur with old works, but also with for example works by U.S. federal employees, which are uncopyrightable under United States federal law 17 U.S. Code § 105). Recent articles by the 44th President Barack Obama have been illegally copyrighted and yet all we can do is ask nicely that they remove the copyright notice.

Fighting for “non-negotiable” copyright

Recently, the American Psychological Association (APA) has decided to not allow me to retain my copyright for a book chapter I wrote — after weeks of back-and-forth of them saying they are “very flexible” in their current license agreement and neglecting my counter-offers. How requesting all copyright and making the agree non-negotiable is flexible, I do not know. I am not the only one who has stumbled upon such problems (see Rajiv Jhanghiani’s blog post).

In February I asked the editors of the book for a copy of the license agreement, suspecting that the APA would want a full copyright transfer. On March 29 2016 I received the agreement (available here) and it indeed stated the suspected copyright transfer.

I asked the editors to inquire for an alteration to the agreement, such that the APA could print the chapter, make money but the copyright would remain with us (and I could publish a copy online under a CC-BY or CC-0 license). Note that I not only feel morally obliged to do this, but also practically have to: if I sign away all rights I cannot reuse my own book chapter in my dissertation, without getting prior approval of the rightsholder and hoping I get that exception.

Alas, the APA stated that the license agreement is “non-negotiable” and that their “policies on use of the material with proper citation are not at all stringent.” Note that they here refer to academic citations, which is not concerned with copyright (i.e., reuse) but attribution of ideas and professional standards.

Moreover, acquiring all copyright was essential according to them, due to the “financial risk assessment” involved in publishing a book. They neglected to respond to my argument that the free (!) chapter does not, in itself, pose a risk and therefore any risk is incurred by their own doing and I will not pay for it with my copyright.

So I made a counter-offer: a non-exclusive reproduction right, where I simply put the chapter in the public domain and the APA can print the chapter, make money, and I cannot make any claims on their revenue/profit, all while others are able to reuse the chapter freely and without restrictions. They did not agree after offering this three times and just now, my deadline passed for them to accept this offer.

So now I will publish this chapter elsewhere. The beauty is that I just put it online with a non-restrictive license, so they can technically still print it if they’d like despite their claim that I do not allow publication of the book chapter. But now you can read it and freely reuse it. They are simply blocking publication because they cannot publish on their terms and I want to renegotiate the apparently one-sided agreement.

Note: This matter is wholly unrelated to the recent post about a flaw in EBSOhost that inadvertently made all of APA Open Access if you had a direct link.

Did I just ‘make’ all of APA Open Access?

The American Psychological Association (APA) is one massive, (primarily) closed-access publisher in psychology, which Tilburg University accesses through EBSCOhost. This has accidentally made all of the APA published journals free to access. I assume both the APA and EBSCOhost are unaware of this.

During my mining endeavors I also wanted to mine the APA (for research purposes, as described in earlier posts here and here). After collecting links to access these articles via EBSOhost with my spiderer, I accidentally tried to access one of those links outside of the university network — to my surprise, I could!

I tried a VPN to access it from several other countries in the world, and it still worked. Other computers, the same. Open access to closed articles — a seeming paradox but possible apparently.

Direct links to EBSCOhost simply bypass all technical walls implemented by EBSCO, which the APA will not be all too happy with. A stable session ID works fine, even when the collected links are accessed more than six months later. I figure this generalizes to non-APA articles in EBSCOhost, but I have not tried that.

For example, this link (try it!) provides access to the paper on “Arab Youth Involvement in Delinquency” (no specific reason why I chose this one, just the first random pick). You can even navigate to the PDF that is attached to it. If you follow the link based on the DOI, you hit a paywall. You can play around with one of these 1000 links to see this actually works (see this spreadsheet). I collected more than 70,000 (!) of these, which are all free to access with these direct links, even when the APA probably wants them paywalled outside of Tilburg’s network.

An example of accessing a closed article freely through EBSCOhost.

An example of accessing a closed article freely through EBSCOhost.

And of course, if you have these links, it is relatively easy to systematically download these and identify which link is which paper. I am not dumping an entire database of 70,000 links with article DOIs and article titles simply because I figure this is a flaw in the system and I do not want to encourage the APA and their lawyers, considering I am already busy enough with Elsevier. However, if you need these links for mining purposes, send me an email or tweet.

If closed access publishers worry so much about the widespread use of Sci-Hub and how to maintain revenue in an increasingly Open Access world, these kinds of technological flaws undermine even their closed model. I did not actively try to hack their system (although I might be accused of hacking for this), I just stumbled upon this per chance. They can just as well dump all their articles in the Open if this is so easy (please do).

UPDATE: The example link now requires a login. Here are some additional examples, from the spreadsheet — example, example, example.

Awarded Shuttleworth Flash Grant

I am proud to announce that I have been awarded a Shuttleworth Flash Grant. This $5000 grant is an empowering grant, considering that there are simply no strings attached except communicating about what you do with it openly (YES: no budgets/proposals/record keeping/you name all the other tedious aspects of grants that detract from actually doing things with the grant).

Not only is it empowering because of a lack of bureaucracy — it is also a badge of honor considering how it is described: “we award a number of small grants to a collection of social change agents, no strings attached, in support of their work.” Being called a change agent sounds like a humongous complement to me! Additionally, the Shuttleworth Foundation just oozes openness (see video below), which adds to the weight I assign to the Foundation.

I am proud to have been chosen as a Flash Grantee and I look forward to finding effective ways to utilize it for change (e.g., for copyright reform). I will keep you posted on what I do with it here!

 

How a professional webpage can harm your privacy

I have a professional webpage, which serves as the homepage when people search for me on Google. Great — right?

Yes, but many academics do not realize that when they register their domain, they are making personal information public, potentially. More specifically, your personal email, your personal phone number, and home address. Several domain registrars see keeping this information private as an extra service and require additional fees, so be sure to check (send me an email if you want me to check your domain). I suggest you switch if they charge you for privacy.

So, the problem is that not all domain registrars keep your personal information private and if they do not, anyone can simply query the origin of the website and find it (e.g., with the whois terminal command). I have tested this on several professional webpages, and I have learned information about my colleagues I did not know before (e.g., home address), shamefully. Depending on the registrar, you can see information like this (of course this is a fictitious example)

Registrant Name: John Doe
Registrant Organization:
Registrant Street: Blastreet 12-A
Registrant City: Blacity
Registrant State/Province:
Registrant Postal Code: 42176
Registrant Country: US
Registrant Phone: 001555666777
Registrant Phone Ext:
Registrant Fax:
Registrant Fax Ext:
Registrant Email: john.doe@gmail.com

but when the domain registrar does keep it private (as for my webpage), it might just refer to the registrar

[chjh@pandorica ~]$ whois chjh.nl
[Querying whois.domain-registry.nl]
[whois.domain-registry.nl]
Domain name: chjh.nl
Status: active

Registrar:
Hostnet bv
De Ruijterkade 6
1013AA Amsterdam
Netherlands

Please be sure to check your professional page, or let me know if you would like me to check for you (I promise I won’t save any of the information). Despite Open Data in research, I think privacy for researchers is still warranted. Especially if you want to prevent getting harassed when you do research some might find controversial.

EDIT: I was notified that in some countries, this information is mandatory. For example, in Germany the Impressumspflicht mandates this information.

European Open Science Policy candidacy

In line with the high priority the European Commission has put on Open Science, the Directorate-General (DG) is currently working on forming an Open Science Policy Platform (OSPP). Its members will help build European policies on Open Science, ranging from inducing cultural change towards Open Science and regulations in European funding. I would like to put myself forward as a candidate to represent the early career researchers in this platform. I am drafting my candidacy letter in the open (comments welcome). If you are willing to endorse me, please comment on this post or tweet about this post and mention my handle, @chartgerink, saying you endorse my candidacy for the Open Science Policy Platform. Each endorsement counts (honestly: only those prior to Tuesday March 22 count because the application is due then).

The call for candidates explicitly states that they are looking for “high-level” experts with policy experience, indicating that this platform runs the risk of ignoring the interests of early career researchers such as PhD students or post-docs. We as early career researchers are by definition not “high-level” and lack experience on the policy level, whereas we are those who will be greatly affected by the renewed policies (and potentially the longest, because our careers will hopefully last).

For example, if European funding becomes subject to Open Science regulations, we as early career researchers will be the ones who have to figure out how to conduct research in an open fashion. Principal Investigators (PIs) receive the European grants with Open Science policies, but will have PhD students and post-docs conduct a large part of the research. As a consequence, we are saddled with the responsibilities of putting Open Science into practice. PIs have little idea how to do this, because they were never educated in this manner nor are they able to properly conduct Open Science. Thorough Open Science requires knowledge of all procedures and steps in a research process, which is difficult when you are supervising (PI) instead of conducting the research (PhD students, post-docs).

I therefore think that the OSPP requires an early career researcher as representative (whoever it may be), who knows the intricacies of putting Open Science into practice. I have been conducting my research in an open fashion since before my PhD and have found that easy Open Science is possible, but requires proper training. Currently, that training is missing. If the Commission thinks a successful cultural change to Open Science is possible without the input of early career researchers, leaving the policy-making to these “high-level”, experienced researchers only, I believe cultural change will be immensely difficult. I would like to help it succeed and partake in this platform as an early career researcher.