Category Archives: Miscellaneous

Symbiodinium cp23S Re-PCR

Yesterday I completed some re-do PCRs of Symbiodinium cp23S from the branching Porites samples Sanoosh worked on over the past summer. Some of the samples did not amplify at all, so I reattempted PCR of these samples (107, 108, 112, 116). Sample 105 amplified last summer but the sequence was lousy, so I redid that one too. After the first PCR, I obtained 1 ul of the product and diluted it 1:100 in water. I then used 1 ul of this diluted product as the template for a second round of PCR. PCR conditions were the same Sanoosh and I used last summer (based on Santos et al. 2002):

Reagent Volume (µl)
water 17.2
5X Green Buffer 2.5
MgCl2 25 mM 2.5
dNTP mix 10 mM 0.6
Go Taq 5U/µl 0.2
primer 23S1 10 µM 0.5
primer 23S2 10 µM 0.5
Master Mix volume 24
sample 1
total volume 25

Initial denaturing period of 1 min at 95 °C, 35 cycles of 95 °C for 45 s, 55 °C for 45 s, and 72 °C for 1 min, and a final extension period of 7 min.

Samples were then run on a 1% agarose gel for 30 min at 135 volts.


Surprisingly, the first round of PCR amplified samples 107 and 112 (note: two subsamples of each were run; one that was the original extraction diluted (d) and another that was the original cleaned with Zymo OneStep PCR Inhibitor Removal kit (c)). The cleaned samples were the ones that amplified. I believe Sanoosh had tried these cleaned samples with no success.

The second round of PCR produced faint bands for both of the 108 samples. Sample 116 still did not amplify.

I cleaned the samples with the NEB Monarch Kit and shipped them today to Sequetech. I combined the two 108 samples to ensure enough DNA for sequencing.


RAD library prep

This is a belated post for some RAD library prep I did the week of January 23rd in the Leache Lab. I followed the same ddRAD/EpiRAD protocol I used in August. Samples included mostly Porites astreoides from the transplant experiment, as well as some geoduck samples from the OA experiment, and a handful of green and brown Anthopleura elegantissima. Sample metadata can be found here. The library prep sheet is here. The TapeStation report is here. Below is the gel image from the TapeStation report showing that the size selection was successful. However, the selection produced fragments with a mean size of 519-550 base pairs, as opposed to the size selection in August which produced ~500 bp fragments. While there will obviously be some overlap between libraries, combining samples from the two libraries may be problematic. This occurred despite identical Pippen Prep settings targeting fragments 415-515 bp. Libraries were submitted to UC Berkeley on 1/31/17 for 100 bp paired-end sequencing on the HiSeq 4000.

Library JD002_A-L



Interview Danish Psychology Association responses

Below, I copy my responses to an interview for the Danish Psychology Association. My responses are in italic. I don’t know when the article will be shared, but I am posting my responses here,  licensed CC0. This is also my way of sharing the full responses, which won’t be copied verbatim into an article because they are simply too lengthy.What do you envision that this kind of technology could do in a foreseable future?

What do you mean by “this” kind of technology? If you mean computerized tools assisting scholars, I think there is massive potential in both development of new tools to extract information (for example what ContentMine is doing) and in application. Some formidable means are already here. For example, how much time do you spend as a scholar to produce your manuscript when you want to submit it? This does not need to cost half a day when there are highly advanced, modern submission managers. Same when submitting revisions. Additionally, annotating documents colloboratively on the Internet with is great fun, highly educational, and productive. I could go on and on about the potential of computerized tools for scholars.

Why do you think this kind of computerized statistical policing is necessary in the field of psychology and in science in general?Again, what is “this kind of computerized statistical policing”? I assume you’re talking about statcheck only for the rest of my answer. Moreover, it is not policing — a spell-checker does not police your grammar, it helps you improve your grammar. statcheck does not police your reporting, it helps you improve your reporting. Additionaly, I would like to reverse the question: should science not care about the precision of scientific results? With all the rhetoric going on in the USA about ‘alternative facts’, I think it highlights how dangerous it is to let go of our desire to be precise in what we do. Science’s inprecision has trickle down effects in the policies that are subsequently put in place, for example. We put in all kinds of creative and financial effort to progress our society, why should we let it be diminished by simple mistakes that can be prevented so easily? If we agree that science has to be precise in the evidence it presents, we need to take steps to make sure it is. Making a mistake is not a problem, it is all about how you deal with it.

So far the Statcheck tool is only checking if the math behind the statistical calculations in the published articles are wrong when the null-hypothesis significance testing has been used. What you refer to as reporting errors in your article from December last year published in Behaviour Research Methods. But these findings aren’t problematic as long as the conclusions in the articles aren’t affected by the reporting errors?

They aren’t problematic?—who is the judge of whether errors aren’t problematic? If you consider just statistical significance, there are still 1/8 papers that contain such a problem. Moreover, all errors in reported results affect meta-analyses — is that not also problematic down-the-line? I find it showing of hubris for any individual to say that they can determine whether something is problematic or not, when there can be many things that that person doesn’t realize even can be affected. It should be open to discussion, so information about problems need to be shared and discussed. This is exactly what I aimed to do with the statcheck reports on PubPeer for a very specific problem.

In the article in Behaviour Research Methods you find that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. And that One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. What does this mean? I’m not a mathematician.

You don’t need to be a mathematician to understand this. Say we have a set of eight research articles presenting statistical results with certain conclusions. Four of those eight will contain a result that does not match up to the results presented (i.e., inconsistent), but does not affect the broad strokes of the conclusion. One of those eight contains a result that does not match up to the conclusion and potentially nullifies the conclusions. For example, if a study contains a result that does not match up with the conclusion, but concluded that a new behavioral therapy is effective at treating depression. That means the evidence for the therapy effectiveness is undermined — affecting direct clinical benefits as a result.

Why are these findings important?

Science is vital to our society. Science is based on empirical evidence. Hence, it is vital to our society that empirical evidence is precise and not distorted by preventable or remediable mistakes. Researchers make mistakes, no big deal. People like to believe scientists are more objective and more precise than other humans — but we’re not. The way we build checks- and balances to prevent mistakes from proliferating and propagating into (for example) policy is crucial. statcheck contributes to understanding and correcting one specific aspect of such mistakes we can all make.

Why did you decide to run the statcheck on psychology papers specifically?

statcheck was designed to extract statistical results reported as prescribed by the American Psychological Association. It is one of the most standardized ways of reporting statistical results. It makes sense to apply software developed on standards in psychology to psychology.

Why do you find so many statistical errors in psychology papers specifically?

I don’t think this is a problem to psychology specifically, but more a problem of how empirical evidence is reported and how manuscripts are written.

Are psychologists not as skilled at doing statistical calculations as other scholars?

I don’t think psychologists are worse at doing statistical calculations. I think point-and-click software has made it easy for scholars to compute statistical results, but not to insert them into manuscripts reliably. Typing in those results is error prone. I make mistakes when I’m doing my finances at home, because I have to copy the numbers. I wish I had something like statcheck for my finances. But I don’t. For scientific results, I promote writing manuscripts dynamically. This means that you no longer type in the results manually, but inject the code that contains the result. This is already possible with tools such as Rmarkdown and can greatly increase the productivity of the researcher. It has saved my skin multiple times, although you still have to be vigilant for mistakes (wrong code produces wrong results).

Have you run the Statcheck tool on your own statistical NHST-testing in the mentioned article?

Yes! This was the first thing I did, way before I was running it on other papers. Moreover, I was non-selective when I started scanning other people’s papers — I apparently even made a statcheck report that got posted on PubPeer for my supervisor (see here). He laughed, because the paper was on reporting inconsistencies and the gross inconsistency was simply an example of one in the running text. A false positive, highlighting that statcheck‘s results always need to be checked by a human before concluding anything definitive.

Critics call Statcheck “a new form of harassment” and accuse you of being “a self appointed data police”. Can you understand these reactions?

Proponents of statcheck praise it as a good service. Researchers who study how researchers conduct research are called methodological terrorists. Any change comes with proponents and critics. Am I a self-appointed data policer? To some, maybe. To others, I am simply providing a service. I don’t chase individuals and I am not interested in that at all — I do not see myself as part of a “data police”. That people think these reports is like getting reprimanded highlights to me that there still rests a taboo on skepticism within science. Skepticism is one of the ideals of science, so let’s aim for that.

Why do you find it necessary to send out thousands of emails to scholars around the world informing them that their work has been reviewed and point out to them if they have miscalculated?

It was not necessary — I thought it was worthwhile. Why do some scholars find it necessary to e-mail a colleague about their thoughts on a paper? Because they think it is worthwhile and can help them or the original authors. Exactly my intentions by teaming up with PubPeer and posting those 50,000 statcheck reports.

Isn’t it necessary and important for ethical reasons to be able to make a distinction between deliberate miscalculations and miscalculations by mistake when you do this kind of statcheck?

If I was making accusations about gross incompetence towards the original authors, such a distinction would clearly be needed. But I did not make accusations at all. I simply stated the information available, without any normative or judging statements. Mass-scale post-publication peer review of course brings with it ethical problems, which I carefully weighed before I started posting statcheck reports with the PubPeer team. The formulation of these reports was discussed within our group and we all agreed this was worthwhile to do.

As a journalist I can write and publish an article with one or two factual errors. This doesn’t mean the article isn’t of a general high journalistic standard or that the content of the article isn’t of great relevance for the public- couldn’t you make the same argument about a scientific article? And when you catalogue these errors online you are at the risk of blowing up a storm in a tea cup and turn everybody’s eyes away from the actual scientific findings?

Journalists and scholars are playing different games. An offside in football is not a problem in tennis and the comparison between journalists and scholars seems similar to me. I am not saying that an article is worthless if it contains an inconsistency, I just say that it is worth looking at before building new research lines on it. Psychology has wasted millions and millions of euros/dollars/pounds/etc on chasing ephemeral effects that are totally unreasonable, as several replication projects have highlighted in the last years. Moreover, I think the general opinion of science will only improve if we are more skeptical and critical of each other instead of trusting findings based on reputation, historical precedent, or ease with which we can assimilate the findings.


Manuscript Writing – The “Nuances” of Using Authorea

I’m currently trying to write a manuscript covering our genotype-by-sequencing data for the Olympia oyster using the platform and am encountering some issues that are a bit frustrating. Here’s what’s happening (and the ways I’ve managed to get around the problems).



PROBLEM: Authorea spits out a browser-crashing “unresponsive script” message (actually, lots and lots of them; clicking “Stop script” or “Continue” just results in additional messages) in Firefox (haven’t tried any other browsers). This renders the browser inoperable and I have to force quit. It doesn’t happen all of the time, so it’s hard to pinpoint what triggers this.



SOLUTION: Edit documents in Git/GitHub. I have my Authorea manuscript linked to a GitHub repo, which allows me to write without using This is how I’ll be doing my writing the majority of the time anyway, but I would like to use to insert and manage citations…



PROBLEM: Authorea remains in a perpetual “saving…” state after inserting a citation. It also renders the page strangely, with HTML <br></br> tags (see the “Methods” section in the screen cap below).


SOLUTION: Type additional text somewhere, anywhere. This is an OK solution, but is particularly annoying if I just want to go through and add citations and have no intentions of doing any writing.



PROBLEM: Multi-author citations don’t get formatted with “et al.” By default, Authorea inserts all citations using the following LaTeX format:


Result: (Elshire 2011).

This is a problem because this reference has multiple authors and should be written as: (Elshire et al., 2011).

SOLUTION: Change citation format to:


Other citation formatting options can be found here (including multiple citations within one set of parentheses, and referring in-text author name with only publication year in parentheses):

How to add and manage citations and references in Authorea



PROBLEM: When a citation no longer exists in the manuscript, it still persists in the bibliography.

SOLUTION: A known bug with no current solution. Currently, have to delete them from the bibliography by hand (or, maybe figure out a way to do it programatically)…




PROBLEM: Cannot click-and-drag some references from Mendeley (haven’t tested other reference managers) without getting an error. To my knowledge, the BibTeX is valid, as it appears to be the same formatting as other references that can be inserted via the click-and-drag method. There are some references it won’t work for…


SOLUTION: Use the search bar in the citation insertion dialogue box. Not as convenient and slows down the workflow for citation insertion, but it works…



Chinook salmon and southern resident killer whales occupy similar depths in the Salish Sea

New paper by UW colleagues entitled “Interpreting vertical movement behavior with holistic examination of depth distribution: a novel method reveals cryptic diel activity patterns of Chinook salmon in the Salish Sea” shows some results from Vemco receivers I deployed in the San Juan Islands. Young adult Chinook favor depths less than ~30 meters, with some seasonal variability in their diel activity patterns. Overall, they go deeper and vary more in the depths at night.

Dive profiles for two Salish Sea Chinook salmon during the summer and fall.

Dive profiles for two Salish Sea Chinook salmon during the summer and fall.

Interestingly, according to a report to NOAA/NWFSC by Baird et al, 2003 (STUDIES OF FORAGING IN “SOUTHERN RESIDENT” KILLER WHALES DURING JULY 2002: DIVE DEPTHS, BURSTS IN SPEED, AND THE USE OF A “CRITTERCAM” SYSTEM FOR EXAMINING SUB-SURFACE BEHAVIOR) SRKWs spend >97% of their time at depths of less than 30m.

This suggests any future deployment of horizontal echosounders should aim to ensonify a depth range centered on ~25m (e.g. 5-45m or 10-40 m).  Compared to the estimated orientation and surveyed depth range of our 2008-9 salmon-SRKW echosounder pilot studies, we may want to measure inclination more carefully to (a) center the survey on the mean summertime depth range of Chinook and (b) avoid ping reflections from surface waves, boats, and bubbles (which may have confused interpretations of targets >100 m from the transducer).  Here’s my diagram for the situation in 2008-9 in which we were centered on 15 m and ensonified a maximum depth range of ~0-30m (in other words, we may have been aiming a little high):

Screen grab from the 2009 ASA presentation showing echosounder geometry

Screen grab from the 2009 ASA presentation showing echosounder geometry




Curriculum Testing – Viability of Using Dry Ice to Alter pH

Ran some basic tests to get an idea of how well (or poorly) the use of dry ice and universal indicator would be for this lesson.

Instant Ocean mix (per mfg’s recs): 0.036g/mL

Universal Indicator (per mfg’s recs): 15μL/mL

Played around a bit with different solution volumes, different dry ice amounts, and different Universal Indicator amounts.

Indicator Vol (mL) Solution Solution Vol (mL) Dry Ice (g) Time to Color Change (m) Notes
3 Tap H2O 200 1.5 <0.5
3 Tap H2O 200 0.5 >5 Doesn’t trigger full color change and not much bubbling (not very exciting)
5 Tap H2O 1000 12 <1
3 Instant Ocean 200 1.5 <0.5 Begins at higher pH than just tap water. Full color change is slower than just tap water, but still too quick for timing.
2 1M Na2CO3 200 5 >5 No color change and dry ice fully sublimated.
2 1M Tris Base 200 5 >5 No color change and dry ice fully sublimated.
2 Tap H2O + 20 drops 1M NaOH 200 5 2.75 ~Same color as Na2CO3 and Tris Base solutions to begin. Dry ice gone after ~5m and final pH color is ~6.0.



  • Universal Indicator amount doesn’t have an effect. It’s solely needed for ease-of-viewing color changes. Use whatever volume is desired to facilitate easy observations of color changes.
  • Larger solution volumes should be used in order to slow the rate of pH change, so that it’s easier to see differences in rates of change between different solutions.
  • 1M solutions of Na2CO3 and Tris Base have too much buffering capacity and will not exhibit a decrease in pH (i.e. color change) from simply using dry ice. May want to try out different dilutions.
  • Use of water + NaOH to match starting color of Na2CO3 and/or Tris Base is a good way to illustrate differences in buffering capacity to students.
  • Overall, dry ice will work as a tool to demonstrate effect(s) of CO2 on pH of solutions!

Some pictures (to add some zest to this entry):





False claims of copyright and STM

Recently, I have become interested in the issue of false claims of copyright (i.e., copyfraud) in publishing. I just wrote to the publisher’s association (STM) to ask them what their perspective is on copyfraud is and whether they condone such behavior by their member associations. Read my letter here. I will update this blog when I get a response.

An example of copyright is this index page from the Lancet, published in 1823. Let’s assume copyright for this index page was actively registered and that it received protection under copyright legislation (copyright was not automatic before the 1886 Berne Convention). That would mean the duration of copyright would have to be at least 192 years for this claim to be valid! Even under the current situation, copyright does not last that long for organizations (if I am correct, it is around ~120 years).

Regretfully, it is easy for a rightsholder to legally pursue someone who violates their copyright, but when someone falsely claims to be a rightsholder the public cannot fight back in the same way. This is an inherent asymmetric power relation in copyright. The World Intellectual Property Organization (WIPO) does not provide a way to easily report potential copyfraud it seems and I would like to call on them to make this possible. Opening up a way to reliably report it at least allows everyone to get a better view on how often copyfraud might occur. Even better, form a legislation that empowers the public to fight back against copyfraud.

Copyfraud is a widespread problem that does not only occur with old works, but also with for example works by U.S. federal employees, which are uncopyrightable under United States federal law 17 U.S. Code § 105). Recent articles by the 44th President Barack Obama have been illegally copyrighted and yet all we can do is ask nicely that they remove the copyright notice.


DNA extraction: Anthopleura elegantissima

In the interest of comparing methylation levels between symbiotic states in Anthopleura elegantissima, I extracted DNA from three zooxanthellate and three zoochlorellate individuals. These were anemones that were collected last summer at Point Lawrence, Orcas Island, and had been residing in indoor sea tables at Shannon Point since then. For each specimen, I excised part of the tentacle crown with scissors and deposited the tissue directly into a microfuge tube. I opted to freeze the tissue in the -80ºC freezer since earlier attempts with fresh tissue did not seem as effective (i.e., the tissue seemed resistant to lysis). After a day or two in the freezer, I pulled the samples out, rinsed them with PBS, and proceeded with the Qiagen DNeasy assay.  After addition of proteinase K, I used a small pestle to homogenize the sample. An overnight lysis period at 56ºC was used. DNA was eluted via two passes with 50 µl AE buffer (100 µl total volume). To further clean the DNA, samples were subject to ethanol precipitation using this protocol. Samples were re-eluted in 50 µl AE buffer.

To assess DNA quantity and quality, samples were tested with the Qubit BR DNA assay followed by electrophoresis on a 1% agarose gel with 1X TBE, 135 volts for 25 min.

sample ng/µl ng/µl post-dilution (100 µl buffer) total DNA
Ae-B-1 256 128 12800
Ae-B-2 183 91.5 9150
Ae-B-3 304 152 15200
Ae-G-1 163 81.5 8150
Ae-G-2 166 83 8300
Ae-G-3 274 137 13700



Data Management – Geoduck RRBS Data Integrity Verification

Yesterday, I downloaded the Illumina FASTQ files provided by Genewiz for Hollie Putnam’s reduced representation bisulfite geoduck libraries. However, Genewiz had not provided a checksum file at the time.

I received the checksum file from Genewiz and have verified that the data is intact. Verification is described in the Jupyter notebook below.

Data files are located here: owl/web/nightingales/P_generosa

Jupyter notebook (GitHub): 20161230_docker_geoduck_RRBS_md5_checks.ipynb


Data Received – Geoduck RRBS Sequencing Data

Hollie Putnam prepared some reduced representation bisulfite Illumina libraries and had them sequenced by Genewiz.

The data was downloaded and MD5 checksums were generated.

IMPORTANT: MD5 checksums have not yet been provided by Genewiz! We cannot verify the integrity of these data files at this time! Checksums have been requested. Will create new notebook entry (and add link to said entry) once the checksums have been received and we can compare them.

UPDATE 20161230 – Have received and verified checksums.


Jupyter notebook: 20161229_docker_genewiz_geoduck_RRBS_data.ipynb