Attempted to improve results for virus sequence detection/identification in the abalone NGS RNAseq samples (Black Ab Exp 1).
Previously had tried to use the trimmed short reads to BLAST against all virus sequences in GenBank. Results were mediocre at best, producing many “matches” that were usually comprised of very low complexity (i.e. highly repetitive), thus reducing confidence in whether or not the matches were “real.”
To try to improve on this, de novo assembled contigs from all four Black Ab Exp 1 SOLiD runs (Carmel control, exposed; San Nick control, exposed). All four runs were de novo assembled together by Steven (FASTA file is here. Scroll down ’til you find the text file “Consensus_SanCar_5854.fa”).
I BLASTed this file in GenBank (megablast, nr/nt, Viruses) against all virus sequences in the nucleotide database.
Initial results were a bit confusing due to the fact that the output results from the BLAST contained a large number of bacterial genes, despite the BLAST criteria having been limited to just viruses. However, after masking bacterial hits in the BLAST results, there were only 9 total virus matches, all matching to “Stealth virus 1 clone 3B43″ sequences (either full genome or the T3/T7 genes). The BLAST results, with bacteria matches excluded, can be seen here.
Potential reasons for the low number and low diversity of the virus hits are all likely due to a number of reasons:
Library prep and source seqs – Library was made from RNA samples and, thus, was made using an olido dT (i.e. poly A) selection. Our target may not be an RNA virus AND since it is a phage, it won’t have poly A tails on any RNA it does produce, even if it was a DNA virus.