Bedgraph – Olympia oyster transcriptome (FAIL)

Progress on generating bedgraphs from our Olympia oyster transcriptome continues.

Transcriptome assembly with Trinity completed 20180919.

Then, aligned the assembled transcriptome to our genome using Bowtie2.

Finally, I used BEDTools to convert the BAM to BED to bedgraph.

This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.

SBATCH script file:

## Job Name
#SBATCH --job-name=20180924_oly_bedgraphs
## Allocation Definition 
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=5-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs

# Load Python Mox module for Python module availability

module load intel-python3_2017

# Document programs in PATH (primarily for program version ID)

date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log

# Set genome assembly FastA

# Set indexed genome assembly file

# Set sorted transcriptome assembly bam file

# Set program paths

# Index genome FastA
${samtools} faidx ${oly_genome_fasta}

# Format indexed genome for bedtools
## Requires only two columns: namelength
awk -v OFS='\t' {'print $1,$2'} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome

# Create bed file
${bedtools}/bamToBed \
-i ${oly_transcriptome} \
> 20180924_oly_RNAseq.bam.bed

# Create bedgraph
## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
## Screens for portions of reads coming from exons (-split).
## Add genome browser track line to header of bedgraph file.
${bedtools}/genomeCoverageBed \
-i ${PWD}/20180924_oly_RNAseq.bed \
-g Olurida_v081.fa.fai.genome \
-bga \
-split \
-trackline \
> 20180924_oly_RNAseq.bed

Alignment was done using the following version of the Olympia oyster genome assembly:


Output folder:

Indexed and formatted genome file:

Bedgraph file (for IGV):

This doesn’t appear to have worked properly. Here’s a view of the bedgraph file:

track type=bedGraph
Contig0 0   116746  0
Contig1 0   87411   0
Contig2 0   139250  0
Contig3 0   141657  0
Contig4 0   95692   0
Contig5 0   130522  0
Contig6 0   94893   0
Contig7 0   109667  0
Contig8 0   95943   0

I’d expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).

I would expect to see something li

I’ll look into this further and see where this pipeline goes wrong.

Leave a Reply

Your email address will not be published. Required fields are marked *

e.g. 0000-0002-7299-680X

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>