Author Archives: kubu4

Data Management – SRA Submission of Ostrea lurida GBS FASTQ Files

Prepared a short read archive (SRA) submission for archiving our Olympia oyster genotype-by-sequencing (GBS) data in NCBI. This is in preparation for submission of the mansucript we’re putting together.

I followed my outline/guideline for navigating the SRA submission process, as it’s a bit of a pain in the neck. Glad my notes were actually useful!

The following two files are currently being uploaded via FTP; the process will take about 3hrs, as each file is ~18GB in size:


They are being submitted under the following accession numbers (note: a final accession number will be provided once this is publicly available; I will update this post when that happens):


Goals – February 2017

First goal is to be the first person in lab to post their goals each month. Props to one of our new grad students, Yaamini Venkataraman on beating me this month!

Next goal is to dominate this year’s Pub-a-thon. I’m working on two different manuscripts, this one and this one, but I still think I can win this!

Stuff that got tackled from last month’s goals:

Freezer organization – This has happened, albeit without much effort on my part. Many thanks to the Big Cheese and [Grace for tackling this project[(!

Data Management Plan – Some progress has been made on this. I improved the instructions on the DMP a bit, but the master spreadsheet on which the DMP revolves around (Nightingales) is still in a massive state of flux that needs a lot of attention.

Sequencing data handling – Thanks to Sean for putting forth a serious dent in automating this. He wrote an R script to handle this sort of thing. I’m not entirely sure if he’s done testing it, but it seems to work so far. Next will be incorporating usage instructions of this R script into the DMP so that others can utilize it. On that note, I need to figure out where Sean is keeping this script (can’t seem to locate in his notebook.


Curriculum Testing – Determination of Most Useful Concentration of Sodium Carbonate Solution

After evaluating whether or not dry ice would be effective to trigger a noticeable change in pH in a solution, I determined which concentration(s) of sodium carbonate (Na2CO3) would be most useful for demonstration and usage within the curriculum. Previously, I used a 1M Na2CO3 solution a the universal pH indicator showed no change in color. What I want is a color change, but one that takes place at a noticeably slower rate than the other solutions that are demonstrated/tested; this will show how sodium carbonate acts as a buffer to CO2-acidification.

Additionally, I tested the difference in rate of pH change between Instant Ocean and sodium chloride (NaCl). The reason for testing this is to use this as a demonstration that salt water (i.e. sea water, ocean water) isn’t just made up of salt. It’s likely that many students simply think of the ocean as salt water and have not considered that the makeup of sea water is much more complex.

Finally, I performed these tests in larger volumes than I did previously to verify that the larger volumes will slow the rate of pH change, thus increasing the time it takes for the universal pH indicator to change color, making it easier to see/monitor/time.

Instant Ocean mix (per mfg’s recs): 0.036g/mL (36g/L)

For the NaCl solution, I used the equivalent weight (36g) that was used to make up the Instant Ocean solution.



  • Use of 0.001M Na2CO3 is passable, but due to the fact that it’s a diprotic base, the pH indicator didn’t progress lower than ~pH 6.0 in my limited tests. Adding additional dry ice (or using an even more dilute solution) are options to drive the pH lower.
  • The comparison between salt water and Instant Ocean will work well as a demonstration to introduce the concept that sea water is more complex than just being salty.
  • Using 1L volumes works well to slow the color changes of the universal pH indicator to improve the ability of the students to observe and measure the rate of color change.

The table below summarizes what I tested.

0.1M Na2CO3 1000 3.0 No color change. Dry ice gone.
0.01M Na2CO3 1000 3.3 No color change. Dry ice gone.
0.001M Na2CO3 1000 3.3 ~20s Dry ice gone, but final color indicated a pH ~6.0.
Instant Ocean 1000 3.3 3m Initial color change noticeable within 10s; full color change after ~3m
NaCl 1000 3.0 instant Immediate, complete color change.
Tap H2O 1000 3.3 3m pH started @ ~7.5. Full color change took place.

Manuscript Writing – The “Nuances” of Using Authorea

I’m currently trying to write a manuscript covering our genotype-by-sequencing data for the Olympia oyster using the platform and am encountering some issues that are a bit frustrating. Here’s what’s happening (and the ways I’ve managed to get around the problems).



PROBLEM: Authorea spits out a browser-crashing “unresponsive script” message (actually, lots and lots of them; clicking “Stop script” or “Continue” just results in additional messages) in Firefox (haven’t tried any other browsers). This renders the browser inoperable and I have to force quit. It doesn’t happen all of the time, so it’s hard to pinpoint what triggers this.



SOLUTION: Edit documents in Git/GitHub. I have my Authorea manuscript linked to a GitHub repo, which allows me to write without using This is how I’ll be doing my writing the majority of the time anyway, but I would like to use to insert and manage citations…



PROBLEM: Authorea remains in a perpetual “saving…” state after inserting a citation. It also renders the page strangely, with HTML <br></br> tags (see the “Methods” section in the screen cap below).


SOLUTION: Type additional text somewhere, anywhere. This is an OK solution, but is particularly annoying if I just want to go through and add citations and have no intentions of doing any writing.



PROBLEM: Multi-author citations don’t get formatted with “et al.” By default, Authorea inserts all citations using the following LaTeX format:


Result: (Elshire 2011).

This is a problem because this reference has multiple authors and should be written as: (Elshire et al., 2011).

SOLUTION: Change citation format to:


Other citation formatting options can be found here (including multiple citations within one set of parentheses, and referring in-text author name with only publication year in parentheses):

How to add and manage citations and references in Authorea



PROBLEM: When a citation no longer exists in the manuscript, it still persists in the bibliography.

SOLUTION: A known bug with no current solution. Currently, have to delete them from the bibliography by hand (or, maybe figure out a way to do it programatically)…




PROBLEM: Cannot click-and-drag some references from Mendeley (haven’t tested other reference managers) without getting an error. To my knowledge, the BibTeX is valid, as it appears to be the same formatting as other references that can be inserted via the click-and-drag method. There are some references it won’t work for…


SOLUTION: Use the search bar in the citation insertion dialogue box. Not as convenient and slows down the workflow for citation insertion, but it works…



Curriculum Testing – Viability of Using Dry Ice to Alter pH

Ran some basic tests to get an idea of how well (or poorly) the use of dry ice and universal indicator would be for this lesson.

Instant Ocean mix (per mfg’s recs): 0.036g/mL

Universal Indicator (per mfg’s recs): 15μL/mL

Played around a bit with different solution volumes, different dry ice amounts, and different Universal Indicator amounts.

Indicator Vol (mL) Solution Solution Vol (mL) Dry Ice (g) Time to Color Change (m) Notes
3 Tap H2O 200 1.5 <0.5
3 Tap H2O 200 0.5 >5 Doesn’t trigger full color change and not much bubbling (not very exciting)
5 Tap H2O 1000 12 <1
3 Instant Ocean 200 1.5 <0.5 Begins at higher pH than just tap water. Full color change is slower than just tap water, but still too quick for timing.
2 1M Na2CO3 200 5 >5 No color change and dry ice fully sublimated.
2 1M Tris Base 200 5 >5 No color change and dry ice fully sublimated.
2 Tap H2O + 20 drops 1M NaOH 200 5 2.75 ~Same color as Na2CO3 and Tris Base solutions to begin. Dry ice gone after ~5m and final pH color is ~6.0.



  • Universal Indicator amount doesn’t have an effect. It’s solely needed for ease-of-viewing color changes. Use whatever volume is desired to facilitate easy observations of color changes.
  • Larger solution volumes should be used in order to slow the rate of pH change, so that it’s easier to see differences in rates of change between different solutions.
  • 1M solutions of Na2CO3 and Tris Base have too much buffering capacity and will not exhibit a decrease in pH (i.e. color change) from simply using dry ice. May want to try out different dilutions.
  • Use of water + NaOH to match starting color of Na2CO3 and/or Tris Base is a good way to illustrate differences in buffering capacity to students.
  • Overall, dry ice will work as a tool to demonstrate effect(s) of CO2 on pH of solutions!

Some pictures (to add some zest to this entry):





Hard Drive Replacement – Microscrope Computer (Dell Optiplex GX620)

Dan noticed that the computer wouldn’t boot, so I looked into it a bit. When attempting to boot, the hard drive (HDD) was making a clicking noise; this is never a good sign.

I replaced the HDD with a clone of the existing (now dead) HDD that I had created back on 20150422 and everything is mostly back to normal.

What hasn’t returned to normal is the usage of Dropbox. Sometime this summer, Dropbox stopped supporting Windows XP and no longer allows usage of the Dropbox app on Windows XP computers. For the time being, this means that all files saved on this computer should be uploaded to Dropbox via a web browser.

Saving files to the Dropbox folder that still exists on this computer will NOT sync! That means they will NOT be backed up.

To resolve this issue, we would need to upgrade to Windows 7. Once I obtain a new backup HDD to create a new clone, I’ll attempt to upgrade this computer to Windows 7. The main reservation I have about this is that the two key pieces of software installed on this computer (Nikon Elements and SPOT) are extremely old and may not function on a newer Windows version. But, I guess we won’t know until we try!

Below are images of the steps I took to replace the dead HDD:









DNA Isolation – Geoduck gDNA for Illumina-initiated Sequencing Project

We were previously approached by Cindy Lawley (Illumina Market Development) for possible participation in an Illumina product development project, in which they wanted to have some geoduck tissue and DNA on-hand in case Illumina green-lighted the use of geoduck for testing out the new sequencing platform on non-model organisms. Well, guess what, Illumina has give the green light for sequencing our geoduck! However, they need at least 4μg of gDNA, so I’m isolating more.

Isolated DNA from ctenidia tissue from the same Panopea generosa individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150811.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from five separate ~60mg pieces of ctenidia tissue according to the manufacturer’s protocol, with the following changes:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • Incubated homogenate at 60C for 1hr
  • No optional steps were used
  • Performed three rounds of 24:1 chloroform:IAA treatment
  • Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 162ng/μL (Quant data is here [Google Sheet]: 20170105_gDNA_geoduck_qubit_quant

Yield is great (total = ~32μg).

Evaluated gDNA quality (i.e. integrity) by running 162ng (1μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).





DNA looks good: bright high molecular weight band, minimal smearing, and minimal RNA carryover (seen as more intense “smear” at ~500bp).

Will send off 10μg (they only requested 4μg) so that they have extra to work with in case they come across any issues.


Goals – January 2017

One of the long-running goals I’ve had is to get this Oly GBS data taken care of and out the door to publication. I think I will finally succeed with this, with the help of Pub-A-Thon. Don’t get too excited, it’s not what you think. It is not the drinking extravaganza that the name implies. Instead, it’s a “friendly” lab competition to get some scientific publications assembled and submitted.

Another goal for this month is to get the -80C organized. We’ve made some major progress on lab organization, with major kudos going to Grace Crandall and her work on cleaning out fridges/freezers and putting together our lab inventory spreadsheet. The -80C organization is the final frontier of getting the lab fully under control and more well-regulated.

Continuing on the organization front, it’d be great if we could get the Data Management Plan finished. Sean Bennett has helped get us much closer to completion. Hopefully this month we can get it finalized and have it be fully functional so that any lab member can easily figure out what to do when they receive new sequencing data.

I’d also like to put together a more automated means of handling our high-throughput sequencing data when we receive it. Ideally, it’d be a Jupyter Notebook and all the user would have to do is enter the desired location (heck, maybe I could even simplify it further by requiring just a species name…) for the files to be stored and then press “play” on the notebook. The files would go through a post-download integrity check, moved to final location, re-check integrity, update checksum files, and update readme files. I have most of the bits here and there in various Jupyter Notebooks already, but haven’t taken the time to put them all together into a single, reusable notebook.


Data Management – Geoduck RRBS Data Integrity Verification

Yesterday, I downloaded the Illumina FASTQ files provided by Genewiz for Hollie Putnam’s reduced representation bisulfite geoduck libraries. However, Genewiz had not provided a checksum file at the time.

I received the checksum file from Genewiz and have verified that the data is intact. Verification is described in the Jupyter notebook below.

Data files are located here: owl/web/nightingales/P_generosa

Jupyter notebook (GitHub): 20161230_docker_geoduck_RRBS_md5_checks.ipynb