Category Archives: BS-seq Libraries for Genewiz

BS-seq Libraries for Genewiz

Computing – Oly BGI GBS Reproducibility Fail (but, less so than last time)…

Well, my previous attempt at reproducing the demultiplexing that BGI performed was an exercise in futility. BGI got back to me with the following message:


Hi Sam,

We downloaded it and it seems fine when compiling. You can compile it with the below command under Linux system.

tar -zxvf ReSeqTools_XXX.tar.gz ; cd iTools_Code; chmod 775 iTools ; ./ iTools -h


I gave that whirl and got the following message:

Error opening terminal: xterm

Some internet searching got me sucked into a useless black hole about 64 bit systems running 32 bit programs and enabling the 64 bit kernel on Mac OS X 10.7.5 (Lion) since it’s not enabled by default and on and on. In the end, I can’t seem to enable the 64 bit kernel on my Mac Pro, likely due to hardware limitations related to the graphics card and/or displays that are connected.

Anyway, I decided to try getting this program installed again, using a Docker container (instead of trying to install locally on my Mac).



It didn’t work again, but for a different reason! Despite the instructions in the readme file provided with iTools, you don’t actually need to run make! All that has to be done is unzipping the tarball!! However, despite figuring this out, the program fails with the following error message: “Warming : sample double in this INDEX Files. Sample ID: OYSzenG1AAD96FAAPEI-109; please renamed it diff” (note: this is copied/pasted – the spelling errors are note mine). So, I think there’s something wrong with the formatting of the index file that BGI provided me with.

I’ve contacted them for more info.

See the Jupyter notebook linked below to see what I tried.

Jupyter notebook (GitHub): 20170314_docker_Oly_BGI_GBS_demultiplexing_reproducibility.ipynb


Data Received – Bisulfite-treated Illumina Sequencing from Genewiz

Received notice the sequencing data was ready from Genewiz for the samples submitted 20151222.

Download the FASTQ files from Genewiz project directory:

wget -r -np -nc -A "*.gz"

Since two species were sequenced (C.gigas & O.lurida), the corresponding files are in the following locations:


In order to process the files, I needed to identify just the FASTQ files from this project and save the list of files to a bash variable called ‘bsseq':

bsseq=$(ls | grep '^[0-9]{1}_*' | grep -v "2bRAD")


  • This initializes a variable called “bsseq” to the values contained in the command following the equals sign.
$(ls | grep '^[0-9]{1}_*' | grep -v "2bRAD")
  • This lists (ls) all files, pipes them to the grep command (|), grep finds those files that begin with (^) one or two digits followed by an underscore ([0-9{1}_*), pipes those results (|) to another grep command which excludes (-v) any results containing the text “2bRAD”.


1_ATCACG_L001_R1_001.fastq.gz 1NF11 O.lurida
2_CGATGT_L001_R1_001.fastq.gz 1NF15 O.lurida
3_TTAGGC_L001_R1_001.fastq.gz 1NF16 O.lurida
4_TGACCA_L001_R1_001.fastq.gz 1NF17 O.lurida
5_ACAGTG_L001_R1_001.fastq.gz 2NF5 O.lurida
6_GCCAAT_L001_R1_001.fastq.gz 2NF6 O.lurida
7_CAGATC_L001_R1_001.fastq.gz 2NF7 O.lurida
8_ACTTGA_L001_R1_001.fastq.gz 2NF8 O.lurida
9_GATCAG_L001_R1_001.fastq.gz M2 C.gigas
10_TAGCTT_L001_R1_001.fastq.gz M3 C.gigas
11_GGCTAC_L001_R1_001.fastq.gz NF2_6 O.lurida
12_CTTGTA_L001_R1_001.fastq.gz NF_18 O.lurida


I wanted to add some information about the project to the readme file, like total number of sequencing reads generated and the number of reads in each FASTQ file.

Here’s how to count the total of all reads generated in this project

totalreads=0; for i in $bsseq; do linecount=`gunzip -c "$i" | wc -l`; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads

Total reads = 138,530,448

C.gigas reads: 22,249,631

O.lurida reads: 116,280,817

Code explanation:

  • Creates variable called “totalreads” and initializes value to 0.
for i in $bsseq;
  • Initiates a for loop to process the list of files stored in $bsseq variable. The FASTQ files have been compressed with gzip and end with the .gz extension.
do linecount=
  • Creates variable called “linecount” that stores the results of the following command:
`gunzip -c "$i" | wc -l`;
  • Unzips the files ($i) to stdout (-c) instead of actually uncompressing them. This is piped to the word count command, with the line flag (wc -l) to count the number of lines in the files.
  • Divides the value stored in linecount by 4. This is because an entry for a single Illumina read comprises four lines. This value is stored in the “readcount” variable.
  • Adds the readcount for the current file and adds the value to totalreads.
  • End the for loop.
echo $totalreads
  • Prints the value of totalreads to the screen.

Next, I wanted to generate list of the FASTQ files and corresponding read counts, and append this information to the readme file.

for i in $bsseq; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4)); printf "%st%sn%sttn" "$i" "$readcount" >>; done

Code explanation:

for i in $bsseq; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4));
  • Same for loop as above that calculates the number of reads in each FASTQ file.
printf "%st%snn" "$i" "$readcount" >>;
  • This formats the the printed output. The “%st%snn” portion prints the value in $i as a string (%s), followed by a tab (t), followed by the value in $readcount as a string (%s), followed by two consecutive newlines (nn) to provide an empty line between the entries. See the readme file linked above to see how the output looks.
>>; done
  • This appends the result from each loop to the file and ends the for loop (done).



Illumina Methylation Library Quantification – BS-seq Oly/C.gigas Libraries

Re-quantified the libraries that were completed yesterday using the Qubit3.0 dsDNA HS (high sensitivity) assay because the library concentrations were too low for the normal broad range kit.


Qubit Quants and Library Normalization Calcs: 20151222_qubit_illumina_methylation_libraries

1NF11 2.42
1NF15 1.88
1NF16 2.74
1NF17 2.54
2NF5 2.72
2NF6 2.44
2NF7 2.38
2NF8 1.88
M2 2.18
M3 2.56
NF2_6 2.5
NF_18 2.66


Things look pretty good. The TruSeq DNA Methylation Library Kit (Illumina) suggests that the libraries produced should end up with concentrations >3ng/μL, but we have plenty of DNA here to make a pool for running on the HiSeq2500.


Illumina Methylation Library Construction – Oly/C.gigas Bisulfite-treated DNA

Took the bisulfite-treated DNA from 20151218 and made Illumina libraries using the TruSeq DNA Methylation Library Kit (Illumina).

Quantified the completed libraries using the Qubit 3.0 dsDNA BR Kit (ThermoFisher).

Evaluated the DNA with the Bioanalyzer 2100 (Agilent) using the DNA 12000 assay. Illumina recommended using the High Sensitivity assay, but we don’t have access to that so I figured I’d just give the DNA 12000 assay a go.

SampleName IndexNumber BarCode



Library Quantification (Google Sheet): 20151221_quantification_illumina_methylation_libraries

Test Name Concentration (ng/μL)
1NF11 Out of range
1NF15 2.14
1NF16 2.74
1NF17 2.64
2NF5 2.92
2NF6 Out of range
2NF7 2.42
2NF8 2.56
M2 Out of range
M3 2.1
NF2_6 2.38
NF2_18 Out of range


I used the Qubit’s BR (broad range) kit because I wasn’t sure what concentrations to expect. I need to use the high sensitivity kit to get a better evaluation of all the samples’ concentrations.



Bioanalyzer Data File (Bioanalyzer 2100): 2100_20expert_DNA_2012000_DE72902486_2015-12-21_16-58-43.xad


Ha! Well, looks like you definitely need to use the DNA High Sensitivty assay for the Bioanalyzer to pick up anything. Although, I guess you can see a slight hump in most of the samples at the appropriate sizes (~300bp); you just have to squint. ;)


Bioanalyzer – Bisulfite-treated Oly/C.gigas DNA

Following the guidelines of the TruSeq DNA Methylation Library Prep Guide (Illumina), I ran 1μL of each sample on an RNA Pico 6000 chip on the Seeb Lab’s Bioanalyzer 2100 (Agilent) to confirm that bisulfite conversion from earlier today worked.


Data File 1(Bioanlyzer 2100): 2100 expert_Eukaryote Total RNA Pico_DE72902486_2015-12-18_21-05-04.xad

Data File 1(Bioanlyzer 2100): 2100 expert_Eukaryote Total RNA Pico_DE72902486_2015-12-18_21-42-55.xad



Firstly, the ladder failed to produce any peaks. Not sure why this happened. Possibly not denatured? Seems unlikely, but next time I run the Pico assay, I’ll denature the ladder aliquot I use prior to running.

Overall, the samples look as they should (see image from TruSeq DNA Methylation Kit manual below), albeit some are a bit lumpy.


Bisulfite Treatment – Oly Reciprocal Transplant DNA & C.gigas Lotterhos DNA for BS-seq

After confirming that the DNA available for this project looked good, I performed bisulfite treatment on the following gDNA samples:

  • 1NF11
  • 1NF15
  • 1NF16
  • 1NF17
  • 2NF5
  • 2NF6
  • 2NF7
  • 2NF8
  • NF2_6
  • NF2_18
  • M2
  • M3

Sample names breakdown like this:


1 = Fidalgo Bay outplants

NF = Fidalgo Bay broodstock origination

= Sample number


Same as above, but:

2 = Oyster Bay outplants

NF2_# (Oysters grown in Oyster Bay; DNA provided by Katherine Silliman)

NF2 = Fidalgo Bay broodstock origination, family #2

= Sample number

M2/M3 = C.gigas from Katie Lotterhos


Followed the guidelines of the TruSeq DNA Methylation Library Prep Guide (Illumina).

Used the EZ DNA Methylation-Gold Kit (ZymoResearch) according to the manufacturer’s protocol with the following changes/notes:

  • Used 100ng DNA (per Illumina recs; Zymo recommends at least 200ng for “optimal results”).
  • Thermal cycling was performed in 0.5mL thin-wall tubes in a PTC-200 (MJ Research) using a heated lid
  • Centrifugations were performed at 13,000g
  • Desulphonation incubation for 20mins.

DNA quantity calculations are here (Google Sheet): 20151218_oly_bisulfite_calcs

Samples were stored @ -20C. Will check samples via Bioanalyzer before proceeding to library construction.


Agarose Gel – Oly gDNA for BS-seq Libraries, Take Two

The gel I ran earlier today looked real rough, due to the fact that I didn’t bother to equalize loading quantities of samples (I just loaded 1μL of all samples regardless of concentration). So, I’m repeating it using 100ng of DNA from all samples.

Additionally, this gel also includes C.gigas samples that Katie Lotterhos sent to us to see how they look.

Ran a 0.8% agarose, low-TAE gel, stained with ethidium bromide.



Look at that! The samples look MUCH nicer when they’re not overloaded and uniformly loaded!

Most have a prominent high molecular weight band (the band that’s closes to the top of the ladder, not the DNA visible in the wells). All exhibit smearing, but 2NF1 shows a weird accumulation of low molecular weight DNA.

Katie’s C.gigas samples (M1, M2, M3) look similar to the Olympia oyster gDNA, however her samples appear to have residual RNA in them (the fuzzy band ~500bp).

Will discuss with Steven which samples he wants to use for bisulfite treament and library construction.


Agarose Gel – Oly gDNA for BS-seq Libraries

Ran 1μL of each sample from yesterday’s DNA isolation on a 0.8% agarose, low-TAE gel, stained with ethidium bromide.





Since I didn’t load equal quantities of DNA, the intensities across the various samples is highly variable.

Those samples with high degree of smearing are also those with the highest concentrations. Thus, one would expect to be able to visualize a greater range of DNA sizes in a gel (because more DNA is present). Notice the samples with nice, high molecular weight bands and little smearing (1NF16, 1NF17). These are less than half the concentrations of all the samples that exhibit extensive smearing (2NF3, 2NF8, 1NF12). So, I think all samples will be fine for proceeding with bisulfite conversion and subsequent library construction.

However, I should re-run this gel using equalized DNA quantities for all samples…



DNA Isolation – Oly gDNA for BS-seq

Need DNA to prep our own libraries for bisulfite-treated high-throughput sequencing (BS-seq).

Isolated gDNA from the following tissue samples stored in RNAlater (tissue was not weighed) using DNAzol:


The sample coding breaks down as follows (see the project wiki for a full explanation):


2 = Oysters outplanted in Fidalgo Bay

NF = Broodstock originated in Fidalgo Bay

= Sample number


1 = Oysters outplanted in Oyster Bay

NF = Broodstock originated in Fidalgo Bay

= Sample number


DNA was isolated in the following manner:

  • Homogenized tissues in 500μL of DNAzol (Molecular Research Center; MRC).
  • Added additional 500μL of DNAzol.
  • Added 10μL of RNase A (10mg/mL, ThermoFisher); incubated 10mins @ RT.
  • Added 300μL of chloroform and mixed moderately fast by hand.
  • Incubated 5mins @ RT.
  • Centrifuged 12,000g, 10mins, RT.
  • Transferred aqueous phase to clean tube.
  • Added 500μL of 100% EtOH and mixed by inversion.
  • Pelleted DNA 5,000g, 5mins @ RT.
  • Performed 3 washes w/70% EtOH.
  • Dried pellet 3mins.
  • Resuspended in 100μL of Buffer EB (Qiagen).
  • Centrifuged 12,000g, 10mins, RT to pellet insoluble material.
  • Transferred supe to clean tube.

The samples were quantified using the Qubit dsDNA BR reagents (Invitrogen) according to the manufacturer’s protocol and used 1μL of sample for measurement.


Qubit data (Google Sheet): 20151216_Oly_gDNA_qubit_quants

2NF1 76.4
2NF2 175
2NF3 690
2NF4 11.7
2NF5 142
2NF6 244
2NF7 25
2NF8 456
1NF11 182
1NF12 432
1NF13 155
1NF14 21
1NF15 244
1NF16 112
1NF17 25.2
1NF18 278


Will run samples on gel tomorrow to evaluate gDNA integrity.