Tag Archives: trinity

Second look at Geoduck transcriptome

Last week I popped out a quick assembly and annotation on our geoduck gonadal transcriptome. A second assembly was also done using Trinity.


Updates
August 3 – Confirmed // in file location had no impact on assembly.
July 14 – TransDecoder protein annotations
10:40am – added TransDecoder results
10:29am – added Stats via Trinity


Trinity.pl 
--seqType fq 
-JM 24G 
--left /Volumes/web/cnidarian/Geo_Pool_F_GGCTAC_L006_R1_001_val_1.fq /Volumes/web/cnidarian/Geo_Pool_M_CTTGTA_L006_R1_001_val_1.fq 
--right /Volumes/web/cnidarian//Geo_Pool_F_GGCTAC_L006_R2_001_val_2.fq /Volumes/web/cnidarian//Geo_Pool_M_CTTGTA_L006_R2_001_val_2.fq 
--CPU 16 

trinity_out_dir_1B54203C.png

Output

0:999   127840
1000:1999   18164
2000:2999   5321
3000:3999   1817
4000:4999   762
5000:5999   291
6000:6999   135
7000:7999   73
8000:8999   22
9000:9999   29
10000:10999     4
11000:11999     5
12000:12999     3
13000:13999     4
14000:14999     4
15000:15999     3
16000:16999     0
17000:17999     2
18000:18999     1

Total length of sequence:   101862868 bp
Total number of sequences:  154480
N25 stats:          25% of total sequence length is contained in the 8095 sequences >= 2045 bp
N50 stats:          50% of total sequence length is contained in the 26158 sequences >= 1014 bp
N75 stats:          75% of total sequence length is contained in the 64574 sequences >= 446 bp
Total GC count:         37657770 bp
GC %:               36.97 %
hummingbird:Geo-trinity steven$ /Users/gilesg/compile/trinityrnaseq_r20131110/util/TrinityStats.pl /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta 


################################
## Counts of transcripts, etc.
################################
Total trinity transcripts:  154480
Total trinity components:   100155
Percent GC: 36.97

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 3444
    Contig N20: 2385
    Contig N30: 1766
    Contig N40: 1343
    Contig N50: 1014

    Median contig length: 371
    Average contig: 659.39
    Total assembled bases: 101862868


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

    Contig N10: 2999
    Contig N20: 2026
    Contig N30: 1462
    Contig N40: 1067
    Contig N50: 768

    Median contig length: 321
    Average contig: 553.88
    Total assembled bases: 55473621

Rerunning to see if double slash was a problem- did not see anything in error. Also running TransDecoder


TransDecoder Results

Ran the following

/Users/gilesg/compile/trinityrnaseq_r20131110/trinity-plugins/TransDecoder_r20131110/TransDecoder -t  /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta

This provided a peptide file with 36003 sequences.

!head /Volumes/web-1/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep

>cds.comp100047_c0_seq2|m.5982 comp100047_c0_seq2|g.5982 ORF comp100047_c0_seq2|g.5982 comp100047_c0_seq2|m.5982 type:internal len:142 (-) comp100047_c0_seq2:3-425(-)
NAECRDLYKIFTQILSVRSQEGKIVIPDEFATKIRNWLGNKEELFKEAHNQKIITFYNEY
TREENTFNPIRGKRPMSVPDMPERKYIDQLSRKTQSQCDFCKYKTFTAEDTFGRIDSNFS
CSASNAFKLDHWHALFLLKTH


Running blastp on Trinity.fasta.transdecoder.pep

!blastp 
-query /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep 
-db /usr/local/bioinformatics/dbs/uniprot_sprot.fasta 
-evalue 1e-5 
-max_target_seqs 1 
-max_hsps 1 
-outfmt 6 
-num_threads 4 
-out /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep-blastp-uniprot-2.out

results: http://eagle.fish.washington.edu/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep-blastp-uniprot-2.out

Share

Transcriptome Assembly

 

Trinity

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  117729
Total trinity transcripts:  145222
Percent GC: 40.87

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 3392
    Contig N20: 2262
    Contig N30: 1685
    Contig N40: 1268
    Contig N50: 946

    Median contig length: 335
    Average contig: 617.51
    Total assembled bases: 89676218


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

    Contig N10: 3252
    Contig N20: 2047
    Contig N30: 1465
    Contig N40: 1065
    Contig N50: 767

    Median contig length: 305
    Average contig: 547.96
    Total assembled bases: 64511364


#!/bin/bash

TRIN="/home/ggoetz/compile/trinityrnaseq_r20140413p1"

export PATH=~/compile/rsem-1.2.3/sam:${PATH}

${TRIN}/Trinity 
    --seqType fq 
    --JM 44G 
    --left left.fq 
    --right right.fq 
    --CPU 6 
    --normalize_reads 
    --min_kmer_cov 2 
    --quality_trimming_params 
    "LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLENGTH:36"
In [3]:
!head //Volumes/web/cnidarian/SeaStar/trinity_assemblies/run1/Trinity.fasta
>c2_g1_i1 len=233 path=[20:0-32 20:33-65 20:66-98 52:99-232]

TCTTGGTCTTGGACGTGGACTTGCTGGTCTTGGTCTTGGTCTTGGACGTGGACTTGCTGG

TCTTGGTCTTGGTCTTGGACGTGGACTTGCTGGTCTTGGTCTTGTTCTTGGTCTTGTTCT

TTGTCTTGTTCTTGTTCTTGTTTATGTCCTTGTTTAGGGTTGTTGTTGGGTTTGTTGCTG

TGTTTTGGCGGGTTGTTGTTGTTTTGGGGGTTTTGGTTGTTTGTTTGTTTGTG

>c108_g1_i1 len=239 path=[1:0-131 133:132-238]

CACTTCGTATATGCTTTATAGACTTCTTGTACGATGTAAAACTCAGACTTTTAAAATCTT

TTCTCATTTTTTGTAAAACTTTATAGAATAATTTTTTCTCTCTTGGGATATATCTACACT

TTCAACTTGCTTAAAAAAAATATAGATAGTGTATGGTGTATGGAGGATTGTGTATTTCAC

ATGTGAGGTACTGTGTTACTAAATTTAGTTGTCGTGACAGAGAGAGGAACAGAGCAGGG


In [5]:
!fgrep -c ">" /Volumes/web/cnidarian/SeaStar/trinity_assemblies/run1/Trinity.fasta
145222


In [6]:
!fgrep -c ">" /Volumes/web/cnidarian/SeaStar/trinity_assemblies/run2/Trinity.Cufffly.fasta
160038


CLC

Trimming

greenbird_1977290F.png

de novo assembly

greenbird_19772A25.png

summary stats

greenbird_19772955.png

In [8]:
!head /Volumes/web/cnidarian/SeaStar_transc_v2.fa
>3291_5903_10007_H94MGADXX_V_CF71_ATCACG_R1_(paired)_trimmed_(paired)_contig_1

CAAATATATGAACGGTTGATTGTCAACGATTAGTACATGTTTTCATTGTTCCCCACGCCC

GCCCCCCCCCACTCAAACATTTAAAGTGTGAAATATTATTTATCCACAAATTTCCTTAAA

CCTGCAAACTTGTCTGCTGTCTCTTATTGGAAGTTATGAAAAAGAACAACGGGTTTTCTT

TAAAGGGTCTGCGTGCGATTTTCAACCTTTTGAGTAATAGCAGTTATTTTGATAACCGAT

TTTTTTCAAAGCTCAACAGCTTTTTAAAATAAGGAATCCTATAATGGCCAAACGAATACT

ATAAAAATAAGGGTTCTCTTAATTGTATAAAACGTATAATTTTATCAATTTTGGGACCGT

GTAATTTTTTAAAGACCACAAGAATGTTACATACAACAAATAGACGAAACTCGTAGCTTT

GGAAACTACGTCATGGGCGTTTGGTCAAAAGCTGGAGAGAAAGAGAGGTGGGGTGCCAGA

CTTAAGTAGTCACGTGATCTGACCAACGCACATCGGAAGCTCGATCGGATGAAATCTTCT


In [9]:
!fgrep -c ">" /Volumes/web/cnidarian/SeaStar_transc_v2.fa
30578


In []:

 

Share