Elsevier stopped me doing my research

I am a statistician interested in detecting potentially problematic research such as data fabrication, which results in unreliable findings and can harm policy-making, confound funding decisions, and hampers research progress.

To this end, I am content mining results reported in the psychology literature. Content mining the literature is a valuable avenue of investigating research questions with innovative methods. For example, our research group has written an automated program to mine research papers for errors in the reported results and found that 1/8 papers (of 30,000) contains at least one result that could directly influence the substantive conclusion [1].

In new research, I am trying to extract test results, figures, tables, and other information reported in papers throughout the majority of the psychology literature. As such, I need the research papers published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention to redistribute the downloaded materials, had legal access to them because my university pays a subscription, and I only wanted to extract facts from these papers.

Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days. This boils down to a server load of 35KB/s, 0.0021GB/min, 0.125GB/h, 3GB/day.

Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.

I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly hampering me in my research.

[1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22. doi: 10.3758/s13428-015-0664-2

[MINOR EDITS: the link to the article was broken, should be fixed now. Also, I made the mistake of using "0.0021GB/s" which is now changed into "0.0021GB/min"; I also added "35KB/s" for completeness. One last thing: I am aware of Elsevier's TDM License agreement, and I nonetheless thank those who directed me towards it.]

Leave a Reply

Your email address will not be published. Required fields are marked *


e.g. 0000-0002-7299-680X

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  1. Dear Chris,

    We are happy for you to text mind content that we publish via the ScienceDirect API, but not via screen scraping. You can get access to an API key via our developer’s portal (http://dev.elsevier.com/myapikey.html). If you have any questions or problems, do please let me know. If helpful, I am also happy to engage with the librarian who is helping you.

    With kind wishes,
    Alicia

    Dr Alicia Wise
    Director of Access & Policy
    Elsevier
    a.wise@elsevier.com
    @wisealic

  2. Hi Alicia,

    Does this mean that, if you go through the API, you’re allowed to mine the full text of all Elsevier articles that you also have access to via ScienceDirect? Unlimited text mining, in other words, as long as you go through the API.

    If so, then what’s the logic behind not allowing text mining through ScienceDirect? What difference does it make to Elsevier if a researcher chooses to be inefficient in the way he/she mines text? (Assuming that the API is more efficient, which I imagine it is.)

    Cheers,
    Sebastiaan

    • Hi Sebastiaan,

      The reason that we require miners to use the API is so that we can meet their needs AND ALSO the needs of our human users who can continue to read, search and download articles and not have their service interrupted in any way. Science Direct holds 11 million pieces of content, shares infrastructure with Scopus, ClinicalKey, and other Elsevier products, and serves millions of researchers. I am told we are not alone in providing an API for this sort of high-volume access and that APIs also are used by others including Wikipedia and Twitter. We appreciate that users might wish to text mine across publisher platforms, and this is why we also participate in the multi-publisher cross-platform text and data mining service offered by CrossRef http://tdmsupport.crossref.org/

      With kind wishes,
      Alicia

      Dr Alicia Wise
      Director of Access and Policy
      Elsevier
      a.wise@elsevier.com
      @wisealic

    • In response to Sebastiaan, I think there are extremely good reasons not to use the Elsevier API, not least those mentioned by Richard Smith-Unna. For instance they have rate-limits and restrictive terms & conditions on usage. It is not in any way “unlimited”.

      “Elsevier has chosen to provisionally limit researchers to 10,000 articles per week” — Nature News
      http://www.nature.com/news/elsevier-opens-its-papers-to-text-mining-1.14659

      This is far too restrictive to be useful. I support Chris in his decision not to use Elsevier’s API. I have also done mining work at the Natural History Museum, London on ScienceDirect content and I did not use the Elsevier API. Researchers should be free to choose which tools and methods they use to do research.

      • Hi Ross,

        This is incorrect, and there is no hard limit on the number of articles that can be mined per week. We do have some rate limits in place to ensure equal access to the API for all users, but feedback from researchers suggests these are reasonable. You can access up-to-date information about our TDM services here: https://www.elsevier.com/about/company-information/policies/text-and-data-mining/text-and-data-mining-faq

        With kind wishes,
        Alicia

        Dr Alicia Wise
        Director of Access & Policy
        Elsevier
        a.wise@elsevier.com
        @wisealic

        • Dear Alicia,

          Thank you for your comment. At the moment, Elsevier’s API policy is terribly unclear. You state “there is no hard limit on the number of articles that can be mined per week” – thank you for being so specific. However I am intrigued by your next sentence which is not so specific: “We do have some rate limits…”

          If these unspecified limits are not on number of articles, perhaps they are on bandwidth (or some other property)? It would be extremely helpful if Elsevier was clearer about what its rate limits actually are. Publish this information, clearly! Both on the Elsevier site you linked to, and your comments here the information given appears to be purposefully vague and unhelpful. I cannot use a service for which I honestly still don’t understand the limits of.

  3. Pingback: Elsevier stopped me doing my research | Science...

  4. So if if it’s only 9 a minute, what’s stopping 20 of my colleagues downloading an article from ScienceDirect every two minutes for our shared reading group? On the other hand, there could even be hundreds of people at my university alone simultaneously accessing ScienceDirect, thousands across the country, tens of thousands or hundreds of thousands globally. I hope the SD servers can stand up to that. I’m getting worried, given the statements above…

  5. Hi Alicia,

    (I cannot seem to re-reply directly to your comment, so I’ll post it like this.)

    First, thanks for taking the time to reply, and giving Elsevier’s point of view. However, I would like to press you a bit on my main question, which you didn’t answer:

    Does this mean that, if you go through the API, you’re allowed to mine the full text of all Elsevier articles that you also have access to via ScienceDirect? Unlimited text mining, in other words, as long as you go through the API.

    If no, then I feel that your reply is disingenuous—suggesting that all researchers need to do is use the API, while this is in fact restricted. On the other hand, if yes, then you have point. So …? It’s a simple yes/ no question.

    Cheers,
    Sebastiaan

  6. Alicia Wise writes:

    “I am told we are not alone in providing an API for this sort of high-volume access and that APIs also are used by others including Wikipedia and Twitter. ”

    While Wikipedia supports access through an API, they don’t use it as a way to limit access, as Elsevier apparently does. First of all, the Wikimedia API doesn’t have hard limits on access; the documentation simply says “There is no hard and fast limit on read requests, but we ask that you be considerate and try not to take a site down.” (See https://www.mediawiki.org/wiki/API:Etiquette . Some WIkimedia instances can add rate limits, but they’re not built into the API and I’m not aware of Wikipedia imposing a hard limit.)

    Second, Wikipedia regularly makes their full content set available for analysis as well, via direct FTP download or BitTorrent. I use this myself– every month, I download a dump file with all the articles in English Wikipedia, in order to run programs over them that derive data for my Forward to Libraries service. That’s over 5 million articles I get every month, or over 100 times as many articles per month as Elsevier lets researchers download, if Ross Mounce’s figures above are correct.

    In other words, a nonprofit with an annual budget of under $70 million supports full data downloads and still allow its users to “continue to read, search and download articles and not have their service interrupted in any way.” If a company with over $3 billion in annual revenue won’t do the same, it’s not for service-continuity or other technical reasons.

  7. I hate to be the devil’s advocate here, but it seems like Alicia is correct: The API indeed allows full access to subscribed content in a way that doesn’t seem much more restrictive than usual. (Although ‘usual’ is very restrictive, of course.) You can see the registration form here:

    https://www.elsevier.com/__data/assets/pdf_file/0012/102234/TDM-sign-up-short-form.pdf

    That’s my understanding of the terms, anyway. And, of course I have no idea whether the API works technically well enough to be useful.

    • There are many reasons why the API is problematic. The main ones at present are:
      * I have to agree to Elsevier’s terms and conditions (even to look at it)
      * I have disclose personal details about myself andf my research to Elsevier.

      That is before I even know whether the API does what I want it to do.

  8. Pingback: Why Elsevier’s “solution” is the problem | Chris H.J. Hartgerink's Notebook

  9. Pingback: Content-mining; Rights versus Licences | petermr's blog

  10. So, the purpose of this blog post is to paint Chris H.J. Hartgerink as the victim of Elsevier and therefore an open-access hero. Nicely done, Chris. In reality, it’s just a solipsistic essay that reveals the author’s ignorance about data mining. Fail.

    • Solipsism:

      2. Extreme preoccupation with and indulgence of one’s feelings, desires etc; egoistic self-absorption

      Would you mind Jeffrey enlightening us all on API so we might share your vision?

    • Yes, it’s a real shame that content-mining specialist Chris Hartgerink is so ignorant about data mining compared with anti-OA trolling specialist Jeffrey Beall. If only Chris could have had Jeffrey’s skills and experience, all this would have been so much better. Elsevier would never have cut off Jeffrey’s access! Silly Chris.

  11. Pingback: Content-mining; Why do Publishers insist on APIs and forbid screen scraping? | petermr's blog

  12. Pingback: Press and blog review | Blog @HEC Paris Library

  13. Pingback: Corporate censorship of academic research | Pearltrees

  14. Pingback: Copyright Reform: C4C Applauds, Regrets and Opposes | C4C

  15. Pingback: Green Tea and Velociraptors | How to write to your MEPs about European Copyright reform

  16. Pingback: Wiley also stopped me doing my research | Open Notebook Science Network

  17. Pingback: Wiley also stopped my doing my research | Chris H.J. Hartgerink's Notebook

  18. Pingback: Impact of Social Sciences – Announcing OpenCon 2016: Catalyzing collective action for a more open scholarly system.

  19. Pingback: Did I just ‘make’ all of APA Open Access? | Chris H.J. Hartgerink's Notebook

  20. Pingback: Reflections on OpenCon 2016 | PLOS ECR Community

  21. Pingback: Reflections on OpenCon 2016 | PLOS Blogs Network

  22. В нашем интернет магазине мы предлагаем детские площадки, горки, качели по самой низкой цене, с минимальной наценкой. По мимо удовольствия от выгодной покупки и экономии денег мы подберем для Вас детскую площадку, горку, качелю, которая оптимально соответстветсвует Вашим требованиям. Для ЖЭКов универсальные игровые площадки для детей под ключ сделать заказ у легендарного завода-производителЯ игровых оборудования для площадок для детей с ОФП. Ссылка на нас !

  23. In 2016 Elsevier’s not-for-profit Elsevier Foundation committed $ a year, for 3 years, to programmes encouraging diversity in science, technology and medicine and promoting science research in developing countries.

  24. Elsevier is conducting conferences, exhibitions and workshop worldwide, with over 50 conferences a year covering life sciences, physical sciences engineering, social sciences, and health sciences.

  25. It is actually unlawful for a dealer ship to roll again the odometer on any automobile they offer. Even when installed a fresh electric motor in the car, it really is still unlawful. If you suspect which a dealership is not declaring the proper mileage on a vehicle, leave and shop elsewhere.