Tag Archives: racon

Genome Assembly – Olympia oyster Redundans/Canu vs. Redundans/Racon

Decided to compare the Redundans using Canu as reference and Redundans using Racon as reference. Both reference assemblies were just our PacBio data.

Jupyter notebook (GitHub): 20171005_docker_oly_redundans.ipynb

Notebook is also embedded at the end of this post.


It should be noted that the paired reads for each of the BGI mate-pair Illumina data did not assemble, just like last time I used them:

  • 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_2.fq.gz

Redundans with Canu is better, suggesting that the Canu assembly is the better of the two PacBio assemblies (which we had already suspected).

QUAST comparison using default settings:

Interactive link:http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_10_06_22_21_06/report.html

QUAST comparison using –scaffolds setting:

Interactive link: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_10_06_22_27_26/report.html

Genome Assembly – minimap/miniasm/racon Overview

Previously, I used the following three tools to do quick assembly of our Olympia oyster PacBio data:

I’m just posting this quick overview to make it easier to follow what was actually done without having to read through three different notebook entries and corresponding Jupyter notebooks.

When I say “quick assembly”, I mean it. The entire assembly process probably takes about an hour on the computer I used – that seems fast.

Here’s the quick and dirty of what was done:

1 Run minimap:

This uses a pre-built set of defaults (the ava-pb in the code below) for analyzing PacBio data. Minimap only accepts two FASTQ files and you need to map your FASTQ file against itself. So, if you have multiple FASTQ sequencing files, you have to concatenate them into a single file prior to running minimap.

minimap2 -x ava-pb -t 23 \
20170911_oly_pacbio_cat.fastq \
20170911_oly_pacbio_cat.fastq \
> 20170911_minimap2_pacbio_oly.paf

2 Run miniasm:

This uses your concatenated FASTQ file and the PAF file output from the miniasm step. The code below is taken from the example provided in the miniasm documentation; there are other options available.

miniasm \
-f \
/home/data/20170911_oly_pacbio_cat.fastq /home/data/20170911_minimap2_pacbio_oly.paf > /home/data/20170918_oly_pacbio_miniasm_reads.gfa

3 Convert miniasm output GFA to FASTA

The FASTA file is needed to re-run minimap in Step 4 below.

awk '$1 ~/S/ {print ">"$2"\n"$3}' 20170918_oly_pacbio_miniasm_reads.gfa > 20170918_oly_pacbio_miniasm_reads.fasta

4 Run minimap with default settings

Using the default settings maps the FASTQ reads back to the contigs (the PAF file) created in the fist step. These mappings are required for Racon assembly (Step 5).

minimap2 \
-t 23 \
20170918_oly_pacbio_miniasm_reads.fasta 20170905_minimap2_pacibio_oly.paf > 20170918_minimap2_mapping_fasta_oly_pacbio.paf

5 Run racon

The output file is the FASTA file listed below.

racon -t 24 \
20170911_oly_pacbio_cat.fastq \
20170918_oly_pacbio_minimap_mappings.paf \
20170918_oly_pacbio_miniasm_assembly.gfa \

Genome Assembly – Olympia oyster PacBio minimap/miniasm/racon

In this GitHub Issue, Steven had suggested I try out the minimap/miniasm/racon pipeline for assembling our Olympia oyster PacBio data.

I followed the pipeline described by this paper: http://matzlab.weebly.com/uploads/7/6/2/2/76229469/racon.pdf.

Previously, ran the first part of the pipeline: minimap

This notebook entry just contains the miniasm execution. Will follow with racon.

Jupyter Notebook (GitHub): 20170918_docker_pacbio_oly_miniasm0.2.ipynb