Tag Archives: clc

Getting back on tracks

Yesterday I uploaded v0.0.1 of the Geoduck genome to CoGe.

Now I want to start adding tracks. To do this I used CLC to create RNA-seq tracks from our male and female gonad transcriptome data.

As would be expected only a small amount of reads mapped. This is as we are limiting the genome to the 22 scaffolds with length > 100k.

Males
CLC_Genomics_Workbench_8_5_1_1CAD8B6E.png

Females
CLC_Genomics_Workbench_8_5_1_1CAD8BCB.png

One thing to point out (and will have to be followed up on) is that many more Female reads mapped back.

I took the Reads data and exported to BAM.
CLC_Genomics_Workbench_8_5_1_1CAD8C40.png

Then uploaded to CoGe.
I called this Version 1, and interestingly I got some cool options.. so I selected them.

CoGe__My_Data_1CAD8C90.png

This included saving as a Notebook.

CoGe__My_Data_1CAD8CC2.png


This was Finished in less than 5 minutes!
CoGe__My_Data_1CAD8CEB.png

The SNP view.
CoGe__My_Data_1CAD8D42.png

Voila – we have it in a Browser.
JBrowse_scaffold860_1__221360_and_Getting_back_on_track_md_1CAD9095.png

and you can zoom in
JBrowse_scaffold4546_56623__57472_1CAD8E29.png

Here we have a Notebook view
CoGe__My_Data_1CAD8E81.png
It is now public, though not quite sure if there is a url.

Everything is public so please give it a look / twirl.
CoGe__My_Data_1CAD8F10.png

Share

Passing Flanks

A first look at population differences at qPCR primer sites for three population of Olympia oysters


Plate 1 (samwhite_112381) included, BMP2, CARM, HSPb11, and PGEEP4. At the bottom is a full list of qPCR primers.

BMP2

Limited coverage

CARM

Better coverage

conflicts were ambigs (ie S,W,R)

HSPb11

Missed qPCR primer (R did not seem to work)

PGEEP4

Nothing assembled – everything under 100 bp.


Plate 2 (samwhite_112404) included, H2A, H2AV, p291N, CRAF, GABABR, GRB2, H3-3

H2A

comp23253_c0_seq1
One primer not covered

H2AV

comp25000_c0_seq1
Not much coverage

p291N

comp22144_c0_seq1
Not much coverage

CRAF

comp25313_c0_seq1
Decent coverage, only conflict = ambig, SNP!

SNP

GABABR

comp19002_c0_seq1
Great coverage, did find some SNPs. Missed qPCR primer

SNPs

GRB2

comp10127_c0_seq1
Not great coverage

H3-3

comp19571_c0_seq1


List of QPCR Primers

QPCR Primer sequence Protein
HSP70c_FWD AGGAAAGGTCGGGAGAGGAA Heat shock 70 kDa protein 12A
HSP70c_REV ACCTCGGACTTTGGACGAAC Heat shock 70 kDa protein 12A
p29ING4_FWD TACCTTTGGGCTTCACCGTC Inhibitor of growth protein 4 (p29ING4)
p29ING4_REV GTCCATCACACACCCCTCAG Inhibitor of growth protein 4 (p29ING4)
CerS2_FWD TTGTCGGTCTCCTCCTGCTA Ceramide synthase 2 (CerS2) (LAG1 longevity assurance homolog 2)
CerS2_REV CCGTCTTCTGAGCCATCGTT Ceramide synthase 2 (CerS2) (LAG1 longevity assurance homolog 2)
GABABR1_FWD CCGAGGAGGACACGAAACTC Gamma-aminobutyric acid type B receptor subunit 1 (GABA-B receptor 1) (GABA-B-R1) (GABA-BR1) (GABABR1) (Gb1)
GABABR1_REV CGGACAGGTTCTGGATTCCG Gamma-aminobutyric acid type B receptor subunit 1 (GABA-B receptor 1) (GABA-B-R1) (GABA-BR1) (GABABR1) (Gb1)
HSP70d_FWD TTTGTCTCACCGGCTTTGTG Heat shock 70 kDa protein 6 (Heat shock 70 kDa protein B’)
HSP70d_REV GACATGAGACCAAAGACGCC Heat shock 70 kDa protein 6 (Heat shock 70 kDa protein B’)
THRa_FWD GACACTATCCTCACTCGGCG Thyroid hormone receptor alpha (Nuclear receptor subfamily 1 group A member 1)
THRa_REV GGGTGCCGAGTAAACAAGGA Thyroid hormone receptor alpha (Nuclear receptor subfamily 1 group A member 1)
Defensin_FWD TCTAGCGGAGTTTGTTGGGG Big defensin
Defensin_REV ATGGCTGTCGGAGGAGGATT Big defensin
GRB2_FWD AACTTTGTCCACCCAGACGG Growth factor receptor-bound protein 2 (Adapter protein GRB2) (Protein Ash) (SH2/SH3 adapter GRB2)
GRB2_REV CCAGTTGCAGTCCACTTCCT Growth factor receptor-bound protein 2 (Adapter protein GRB2) (Protein Ash) (SH2/SH3 adapter GRB2)
H3.3_FWD CACGCTCTCCTCGAATCCTC Histone H3.3
H3.3_REV AAGTTGCCTTTCCAGCGTCT Histone H3.3
H2A.V_FWD TGCTTTCTGTGTGCCCTTCT Histone H2A.V (H2A.F/Z) (Fragment)
H2A.V_REV TATCACACCCCGTCACTTGC Histone H2A.V (H2A.F/Z) (Fragment)
H2A_FWD GCTGGGGTTTTTCTGGGTCT Histone H2A
H2A_REV GGAACTACGCCGAGAGAGTG Histone H2A
Hspb11_FWD ATGTTTCCTGGTCTCCGTCA Heat shock protein beta-11 (Hspb11) (Placental protein 25) (PP25)
Hspb11_REV CATCAACGCCAGGGGAACTT Heat shock protein beta-11 (Hspb11) (Placental protein 25) (PP25)
GDF-8_FWD CCGTGGATGTCGCAGAAAGA Growth/differentiation factor 8 (GDF-8) (Myostatin) (Myostatin-1) (zfMSTN-1) (Myostatin-B)
GDF-8_REV CTGCTTTCTCCGTCCCCTTT Growth/differentiation factor 8 (GDF-8) (Myostatin) (Myostatin-1) (zfMSTN-1) (Myostatin-B)
HSP70b_FWD AAGTACCTTGGGGAGCTTGC Heat shock 70 kDa protein 12B
HSP70b_REV TCCACAGACTTTCCTCCCCA Heat shock 70 kDa protein 12B
GRP-78_FWD GAGAAACCACGCAGGGAGAA 78 kDa glucose-regulated protein (GRP-78) (Heat shock 70 kDa protein 5) (Immunoglobulin heavy chain-binding protein) (BiP)
GRP-78_REV CATCAGCATCGAAGGCAACG 78 kDa glucose-regulated protein (GRP-78) (Heat shock 70 kDa protein 5) (Immunoglobulin heavy chain-binding protein) (BiP)
CARM1_FWD TGGTTATCAACAGCCCCGAC Histone-arginine methyltransferase CARM1 (EC 2.1.1.-) (EC 2.1.1.125) (Coactivator-associated arginine methyltransferase 1) (Protein arginine N-methyltransferase 4)
CARM1_REV GTTGTTGACCCCAGGAGGAG Histone-arginine methyltransferase CARM1 (EC 2.1.1.-) (EC 2.1.1.125) (Coactivator-associated arginine methyltransferase 1) (Protein arginine N-methyltransferase 4)
BMP-2_FWD TGAAGGAACGACCAAAGCCA Bone morphogenetic protein 2 (BMP-2) (Bone morphogenetic protein 2A) (BMP-2A)
BMP-2_REV TCCGGTTGAAGAACCTCGTG Bone morphogenetic protein 2 (BMP-2) (Bone morphogenetic protein 2A) (BMP-2A)
PGE/EP4_FWD ACAGCGACGGACGATTTTCT Prostaglandin E2 receptor EP4 subtype (PGE receptor EP4 subtype) (PGE2 receptor EP4 subtype) (Prostanoid EP4 receptor)
PGE/EP4_REV ATGGCAGACGTTACCCAACA Prostaglandin E2 receptor EP4 subtype (PGE receptor EP4 subtype) (PGE2 receptor EP4 subtype) (Prostanoid EP4 receptor)
CRAF1_FWD AGCAGGGCATCAAACTCTCC TNF receptor-associated factor 3 (EC 6.3.2.-) (CD40 receptor-associated factor 1) (CRAF1) (TRAFAMN)
CRAF1_REV ACAAGTCGCACTGGCTACAA TNF receptor-associated factor 3 (EC 6.3.2.-) (CD40 receptor-associated factor 1) (CRAF1) (TRAFAMN)
NFKBina_FWD GATGGCGGTGCATGTGTTAG NF-kappa-B inhibitor alpha (I-kappa-B-alpha) (IkB-alpha) (IkappaBalpha) (REL-associated protein pp40)
NFKBina_REV CGAGGAGAACCTTGTGCAGT NF-kappa-B inhibitor alpha (I-kappa-B-alpha) (IkB-alpha) (IkappaBalpha) (REL-associated protein pp40)
PGRP-S_FWD GAGACTTCACCTCGCACCAA Peptidoglycan recognition protein 1 (Peptidoglycan recognition protein short) (PGRP-S)
PGRP-S_REV AACTGGTTTGCCCGACATCA Peptidoglycan recognition protein 1 (Peptidoglycan recognition protein short) (PGRP-S)
TLR2.1_FWD ACAAAGATTCCACCCGGCAA Toll-like receptor 2 type-1
TLR2.1_REV ACACCAACGACAGGAAGTGG Toll-like receptor 2 type-1
GDF-8b_FWD AACTGATTCTGCTCGTCGCA Growth/differentiation factor 8 (GDF-8) (Myostatin)
GDF-8b_REV TGTTCTTCCACCCACCACTG Growth/differentiation factor 8 (GDF-8) (Myostatin)
Share

Quick Carmalign

The first batch of sequencing came into today to verify sequence of Olympia oyster qPCR primers.

1) imported .ab1 files into CLC,

2) trimmed “CARM” sequences

Remove old trimming = Yes
Quality trimming = Yes
Quality limit = 0.05
Ambiguity trimming = Yes
Ambiguity limit = 2
Vector trimming = No
User vector trimming = No

3) aligned to comp7220_c0_seq2

img
fwd
rev

Share

Wayback to just-MBD

Prior to bisulfite sequencing we did do a couple of MBD enrichment libraries to describe DNA methylation in oysters. Results even were snuck into this perspective.

mbd

While I am sure there are genome tracks around, I am ending up #doingitagain.

In short I took the raw Solid reads, align to Crassostrea_gigas.GCA_000297895.1.26.dna.genome in CLC, exported bam, converted to bedgraph, converted to tdf.


In long:
The raw files
raw

1) Imported into CLC v8.0.1

          Discard read names = Yes
          Discard quality scores = No
          Original resource = /Users/sr320/data-genomic/tentacle/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_METH/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SB_MOTH.qual
          Original resource = /Users/sr320/data-genomic/tentacle/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_METH/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_MOTH.csfasta

(yes the core called them MOTH)

2) Reads were mapped

mapped

3) Exported as BAM.

4) Converted to bedgraph

!/Applications/bioinfo/bedtools2/bin/genomeCoverageBed 
-bg 
-ibam /Users/sr320/data-genomic/tentacle/solid0078_moth.bam 
-g /Volumes/web/halfshell/qdod3/Cg.GCA_000297895.1.25.dna_sm.toplevel.genome 
> /Users/sr320/data-genomic/tentacle/MBD-meth.bedgraph          

5) Converted to toTDF

tdf


Rinse and repeat with unmethylated fraction (UNMOTH) and import tdf into IGV!

Share

Transcriptome Assembly

 

Trinity

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  117729
Total trinity transcripts:  145222
Percent GC: 40.87

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 3392
    Contig N20: 2262
    Contig N30: 1685
    Contig N40: 1268
    Contig N50: 946

    Median contig length: 335
    Average contig: 617.51
    Total assembled bases: 89676218


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

    Contig N10: 3252
    Contig N20: 2047
    Contig N30: 1465
    Contig N40: 1065
    Contig N50: 767

    Median contig length: 305
    Average contig: 547.96
    Total assembled bases: 64511364


#!/bin/bash

TRIN="/home/ggoetz/compile/trinityrnaseq_r20140413p1"

export PATH=~/compile/rsem-1.2.3/sam:${PATH}

${TRIN}/Trinity 
    --seqType fq 
    --JM 44G 
    --left left.fq 
    --right right.fq 
    --CPU 6 
    --normalize_reads 
    --min_kmer_cov 2 
    --quality_trimming_params 
    "LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLENGTH:36"
In [3]:
!head //Volumes/web/cnidarian/SeaStar/trinity_assemblies/run1/Trinity.fasta
>c2_g1_i1 len=233 path=[20:0-32 20:33-65 20:66-98 52:99-232]

TCTTGGTCTTGGACGTGGACTTGCTGGTCTTGGTCTTGGTCTTGGACGTGGACTTGCTGG

TCTTGGTCTTGGTCTTGGACGTGGACTTGCTGGTCTTGGTCTTGTTCTTGGTCTTGTTCT

TTGTCTTGTTCTTGTTCTTGTTTATGTCCTTGTTTAGGGTTGTTGTTGGGTTTGTTGCTG

TGTTTTGGCGGGTTGTTGTTGTTTTGGGGGTTTTGGTTGTTTGTTTGTTTGTG

>c108_g1_i1 len=239 path=[1:0-131 133:132-238]

CACTTCGTATATGCTTTATAGACTTCTTGTACGATGTAAAACTCAGACTTTTAAAATCTT

TTCTCATTTTTTGTAAAACTTTATAGAATAATTTTTTCTCTCTTGGGATATATCTACACT

TTCAACTTGCTTAAAAAAAATATAGATAGTGTATGGTGTATGGAGGATTGTGTATTTCAC

ATGTGAGGTACTGTGTTACTAAATTTAGTTGTCGTGACAGAGAGAGGAACAGAGCAGGG


In [5]:
!fgrep -c ">" /Volumes/web/cnidarian/SeaStar/trinity_assemblies/run1/Trinity.fasta
145222


In [6]:
!fgrep -c ">" /Volumes/web/cnidarian/SeaStar/trinity_assemblies/run2/Trinity.Cufffly.fasta
160038


CLC

Trimming

greenbird_1977290F.png

de novo assembly

greenbird_19772A25.png

summary stats

greenbird_19772955.png

In [8]:
!head /Volumes/web/cnidarian/SeaStar_transc_v2.fa
>3291_5903_10007_H94MGADXX_V_CF71_ATCACG_R1_(paired)_trimmed_(paired)_contig_1

CAAATATATGAACGGTTGATTGTCAACGATTAGTACATGTTTTCATTGTTCCCCACGCCC

GCCCCCCCCCACTCAAACATTTAAAGTGTGAAATATTATTTATCCACAAATTTCCTTAAA

CCTGCAAACTTGTCTGCTGTCTCTTATTGGAAGTTATGAAAAAGAACAACGGGTTTTCTT

TAAAGGGTCTGCGTGCGATTTTCAACCTTTTGAGTAATAGCAGTTATTTTGATAACCGAT

TTTTTTCAAAGCTCAACAGCTTTTTAAAATAAGGAATCCTATAATGGCCAAACGAATACT

ATAAAAATAAGGGTTCTCTTAATTGTATAAAACGTATAATTTTATCAATTTTGGGACCGT

GTAATTTTTTAAAGACCACAAGAATGTTACATACAACAAATAGACGAAACTCGTAGCTTT

GGAAACTACGTCATGGGCGTTTGGTCAAAAGCTGGAGAGAAAGAGAGGTGGGGTGCCAGA

CTTAAGTAGTCACGTGATCTGACCAACGCACATCGGAAGCTCGATCGGATGAAATCTTCT


In [9]:
!fgrep -c ">" /Volumes/web/cnidarian/SeaStar_transc_v2.fa
30578


In []:

 

Share