# Sanger Sequencing – pCR2.1/OsHV-1 ORF117 Sequencing Data

Received the Sanger sequencing data back from Genewiz for the samples I submitted last week.

AB1 files were downloaded as a zip file and stored in the Friedman Lab server: backupordie/lab/sequencing_data/Sanger/30-19717124_ab1.zip

Files were analyzed using Geneious 10.2.3.

Geneious analysis was exported (compatible with version 6.0 and up) and saved to the Friedman Lab server:

backupordie/lab/sam/Sequencing_Analysis/Sanger/20170821_oshv_orf117_sanger.geneious

Results:

After vector ID and trimming, all sequences from both colonies were aligned, resulting in an 867bp contig. The size of this contig jives perfectly with the bright PCR band at ~1000bp I saw when screening the two colonies (the ~1000bp includes 300bp of vector sequence from using the M13 primers).

The alignment above shows that there were no gaps in the sequencing between the two sequencing primers (M13 forward and M13 reverse). I point this out because the insert in this plasmid was supposed to be the full-length OsHV-1 ORF117 (which is ~1300bp), as described in: Detection of undescribed ostreid herpesvirus 1 (OsHV-1) specimens from Pacific oyster, Crassostrea gigas. Martenot et al. 2015. As the sequencing shows, that is not what is cloned in this vector.

To determine what was actually cloned in this vector, I performed a BLASTx against the nr database, using the consensus sequence generated from the alignment above:

BLASTx generated a total of six matches, five of which match OsHV-1 ORF117 (the hypothetical and RING finger proteins listed above actually have alternate accession numbers that all point to ORF117). However, notice in the one alignment example provided at the bottom of the above image, the Query (i.e. our consensus sequence) only starts aligning at nucleotide 109 and matches up with the NCBI OsHV-1 ORF117 beginning at amino acid 158.

The results clearly show that the insert in this vector is OsHV-1 ORF117, but it is not the entire thing. To confirm this, I aligned the consensus sequence to the OsHV-1 genome (GenBank: AY509253.2) using Geneious:

In the image above, I have zoomed into the region in which our sequencing consensus aligned within the OsHV-1 genome. In order to see in more detail, please click on the image above. There are two noticeable things in this alignment:

1. The insert we sequenced doesn’t span the entire ORF117 coding sequence (the yellow annotation in the image above).

2. There’s a significant amount of sequence mismatch (112bp; indicated by black hash marks) between the sequenced insert and the OsHV-1 ORF117 genomic sequence from GenBank, at the 5′ end of the insert.

Will pass this info along to Carolyn and Tim to see how they want to proceed.

# Data Received – Geoduck RRBS Sequencing Data

Hollie Putnam prepared some reduced representation bisulfite Illumina libraries and had them sequenced by Genewiz.

IMPORTANT: MD5 checksums have not yet been provided by Genewiz! We cannot verify the integrity of these data files at this time! Checksums have been requested. Will create new notebook entry (and add link to said entry) once the checksums have been received and we can compare them.

UPDATE 20161230 – Have received and verified checksums.

Jupyter notebook: 20161229_docker_genewiz_geoduck_RRBS_data.ipynb

# Data check on Oly BS-Seq samples

For the 12 samples

Select 4 samples from 1NF gel take 2
Select 4 samples from 2NF gel take 2

Select 2 from gel take 2 Lotterhos
M1
M2
M3

Select 2 from the following sent to Katie (do not have to run on gel)
NF2 14
NF2 6
NF2 18
NF2 15
NF2 17

## Short term will just check out the first 8.¶

These are samples outplanted at Oyster Bay and Fidalgo, and in both cases parents from Fidalgo.

The hypothesis is that Epigenetic pattern will differ (and we can attribute to Environment)

## Quick look at raw data¶

Sequencing Platform: Illumina HiSeq2500

Read Type/Length: 50bp single-end, single index

11_GGCTAC_L001_R1_001.fastq.gz    10933121

12_CTTGTA_L001_R1_001.fastq.gz    10816647

1_ATCACG_L001_R1_001.fastq.gz    9402890

2_CGATGT_L001_R1_001.fastq.gz    11954873

3_TTAGGC_L001_R1_001.fastq.gz    11817358

4_TGACCA_L001_R1_001.fastq.gz    11606618

5_ACAGTG_L001_R1_001.fastq.gz    12589609

6_GCCAAT_L001_R1_001.fastq.gz    12489766

7_CAGATC_L001_R1_001.fastq.gz    10295293

8_ACTTGA_L001_R1_001.fastq.gz    14374642

## Unzip¶

In [1]:
cd /Volumes/Histidine/hectocotylus/whole-BS-01

/Volumes/Histidine/hectocotylus/whole-BS-01

In [2]:
%%bash
for f in *.gz
do
STEM=$(basename "${f}" .gz)
gunzip -c "${f}" > /Volumes/Histidine/hectocotylus/whole-BS-01/fq/"${STEM}"
done


## FastQC¶

In [3]:
!/Applications/bioinfo/FastQC/fastqc
-o /Volumes/Histidine/hectocotylus/whole-BS-01/fq/
-t 4
/Volumes/Histidine/hectocotylus/whole-BS-01/fq/*

Started analysis of 1_ATCACG_L001_R1_001.fastq
Started analysis of 2_CGATGT_L001_R1_001.fastq
Started analysis of 3_TTAGGC_L001_R1_001.fastq
...


this unusual pattern seem to hold true..

In [ ]: