# DNA Isolation – Geoduck gDNA for Illumina-initiated Sequencing Project

We were previously approached by Cindy Lawley (Illumina Market Development) for possible participation in an Illumina product development project, in which they wanted to have some geoduck tissue and DNA on-hand in case Illumina green-lighted the use of geoduck for testing out the new sequencing platform on non-model organisms. Well, guess what, Illumina has give the green light for sequencing our geoduck! However, they need at least 4μg of gDNA, so I’m isolating more.

Isolated DNA from ctenidia tissue from the same Panopea generosa individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150811.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from five separate ~60mg pieces of ctenidia tissue according to the manufacturer’s protocol, with the following changes:

• Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
• Incubated homogenate at 60C for 1hr
• No optional steps were used
• Performed three rounds of 24:1 chloroform:IAA treatment
• Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 162ng/μL (Quant data is here [Google Sheet]: 20170105_gDNA_geoduck_qubit_quant

Yield is great (total = ~32μg).

Evaluated gDNA quality (i.e. integrity) by running 162ng (1μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).

Results:

DNA looks good: bright high molecular weight band, minimal smearing, and minimal RNA carryover (seen as more intense “smear” at ~500bp).

Will send off 10μg (they only requested 4μg) so that they have extra to work with in case they come across any issues.

# Data Received – Geoduck RRBS Sequencing Data

Hollie Putnam prepared some reduced representation bisulfite Illumina libraries and had them sequenced by Genewiz.

IMPORTANT: MD5 checksums have not yet been provided by Genewiz! We cannot verify the integrity of these data files at this time! Checksums have been requested. Will create new notebook entry (and add link to said entry) once the checksums have been received and we can compare them.

UPDATE 20161230 – Have received and verified checksums.

Jupyter notebook: 20161229_docker_genewiz_geoduck_RRBS_data.ipynb

# RNAseq Data Receipt – Geoduck Gonad RNA 100bp PE Illumina

Received notification that the samples sent on 20150601 for RNAseq were completed.

Downloaded the following files from the GENEWIZ servers using FileZilla FTP and stored them on our server (owl/web/nightingales/P_generosa):

Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz
Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz
Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz
Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz

Generated md5 checksums for each file:

$for i in *; do md5$i >> checksums.md5; done

# Sample Submission – Geoduck Gonad for RNA-seq

Prepared two pools of geoduck RNA for RNA-seq (Illumina HiSeq2500, 100bp, PE) with GENEWIZ, Inc.

I pooled a set of female and a set of male RNAs that had been selected by Steven based on the Bioanalyzer results from Friday.

The female RNA pool used 210ng of each sample, with the exception being sample #08. This sample used 630ng. The reason for this was due to the fact that there weren’t any other female samples to use from this developmental time point. The two other developmental time points each had three samples contributing to the pool. So, three times the quantity of the other individual samples was used to help equalize the time point contribution to the pooled sample. Additionally, 630ng used the entirety of sample #08.

The male RNA pool used 315ng of each sample. This number differs from the 210ng used for the female RNAs so that the two pools would end up with the same total quantity of RNA. However, now that I’ve typed this, this doesn’t matter since the libraries will be equalized before being run on the Illumina HiSeq2500. Oh well. As long as each sample in each pool contributed to the total amount of RNA, then it’s all good.

The two pools were shipped O/N on dry ice.

• Geo_pool_M
• Geo_pool_F

# Bioinformatics – Trimmomatic/FASTQC on C.gigas Larvae OA NGS Data

Previously trimmed the first 39 bases of sequence from reads from the BS-Seq data in an attempt to improve our ability to map the reads back to the C.gigas genome. However, Mac (and Steven) noticed that the last ~10 bases of all the reads showed a steady increase in the %G, suggesting some sort of bias (maybe adaptor??):

Although I didn’t mention this previously, the figure above also shows an odd “waves” pattern that repeats in all bases except for G. Not sure what to think of that…

Quick summary of actions taken (specifics are available in Jupyter notebook below):

• Trim first 39 bases from all reads in all raw sequencing files.
• Trim last 10 bases from all reads in raw sequencing files
• Concatenate the two sets of reads (400ppm and 1000ppm treatments) into single FASTQ files for Steven to work with.

Raw sequencing files:

Notebook Viewer: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC

Jupyter (IPython) notebook: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC.ipynb

### Output files

Trimmed, concatenated FASTQ files
20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz

Example of FASTQC analysis pre-trim:

Example FASTQC post-trim (from 400ppm data):

Trimming has removed the intended bad stuff (inconsistent sequence in the first 39 bases and rise in %G in the last 10 bases). Sequences are ready for further analysis for Steven.

However, we still see the “waves” pattern with the T, A and C. Additionally, we still don’t know what caused the weird inconsistencies, nor what sequence is contained therein that might be leading to that. Will contact the sequencing facility to see if they have any insight.

# Illumina RNAseq Library Construction – 32 C.gigas Individuals

Took heat-fragmented RNA provided by Emma (see Emma’s Notebook, 7/3/2011) and proceeded to make first strand cDNA, as described in the Eli Meyer protocol for Illumina HiSeq. Master mix calcs are here. Samples were stored @ -20C after the reverse transcription and library construction will be continued tomorrow.

# Oligo Reconstitution – Illumina RNAseq Library Oligos and Barcodes

Reconstituted all of the oligos and barcodes for library construction in TE (pH = 8.0) to a final concentration of 100uM. Created 10uM working stocks of all oligos and barcodes. All samples (stocks and working stocks) are stored @ -80C in their own box (Illumina Library Oligos & Barcodes) due to the fact that one of the oligos is an RNA oligo and requires storage at -80C.