Sequence Data Analysis – pCR2.1/Clam RLO 16s, EHR, EUB

Sequencing data was received back from GENEWIZ on Friday. The ZIP file containing all six sequence trace files (.ab1) was moved to the lab server:

backupordie/lab/Sequencing Data/Sanger/

The data files were copied to Geneious (v.7.1.7; Biomatters Ltd.) for initial manipulation.

The Geneious files are here (Geneious archive format):


Quality trimming and vector sequence identification was performed.
All trimmed pairs of files were aligned using the built-in Geneious aligner. Default settings were used, except “Automatically determine sequence direction” box was checked. The alignments were visually inspected for mis-called bases and corrected where necessary. The resulting consensus sequences from each clone were exported to separate files, as well as a single, multi-FASTA file:


Resulting sequence lengths:

 16s_consensus_sequence  1507
 EHR_consensus_sequence  198
 EUB_consensus_sequence 1532

These consensus sequences were aligned to each other using the MUSCLE alignment in Geneious, using default settings (click on images below to enlarge).


The alignments below show two things:

  1. Similarity (identity) between the sequences being aligned. This is represented as the green bar(s) above the alignments. The more green, the more sequence identity is shared between the two sequences.

  2. The alignments between the two sequences are represented as black bars next to the corresponding sequence name. A black bar/box indicates exact sequence matches between the two sequences. A black line is indicates region(s) where the sequences do not match.


16s vs. EHR

Similarity: 11.25%


16s vs. EUB

Similarity: 85.18%



Similarity: 12.37%


The EHR sequence shares little similarity to the other two sequences.

The 16s & EUB sequences are highly similar, but not identical.


Each of the three sequences (using the multi-FASTA file referenced above) was BLAST’d (blastn) against the NCBI nr database.



The sequence produced using the 16s primers is clearly amplifying the 16s sequence of Vibrio tapetis, a pathogen of cultured clams.



The sequenced captured by the EHR primers has no matches at all in the NCBI nr database. This is likely due to the length of the sequence (only 198bp), however, it’s still long enough that I feel it should match something. Also, just putting this here as a reminder, the EHR primer set is the only set that didn’t produce amplification in the no template controls (NTC).



The product of the EUB primers matches very well to the 16s sequence of a variety of uncultured bacteria species.


I will relay the results to Carolyn and see how she’d like to proceed. Due to the nature of what’s being done here (using universal 16s bacterial primers), I think it would be good to sequence additional clones from each of the three cloning reactions to see if we pick up additional sequences.