Tag Archives: minimap

Genome Assembly – minimap/miniasm/racon Overview

Previously, I used the following three tools to do quick assembly of our Olympia oyster PacBio data:

I’m just posting this quick overview to make it easier to follow what was actually done without having to read through three different notebook entries and corresponding Jupyter notebooks.

When I say “quick assembly”, I mean it. The entire assembly process probably takes about an hour on the computer I used – that seems fast.

Here’s the quick and dirty of what was done:

1 Run minimap:

This uses a pre-built set of defaults (the ava-pb in the code below) for analyzing PacBio data. Minimap only accepts two FASTQ files and you need to map your FASTQ file against itself. So, if you have multiple FASTQ sequencing files, you have to concatenate them into a single file prior to running minimap.

minimap2 -x ava-pb -t 23 
> 20170911_minimap2_pacbio_oly.paf

2 Run miniasm:

This uses your concatenated FASTQ file and the PAF file output from the miniasm step. The code below is taken from the example provided in the miniasm documentation; there are other options available.

/home/data/20170911_oly_pacbio_cat.fastq /home/data/20170911_minimap2_pacbio_oly.paf > /home/data/20170918_oly_pacbio_miniasm_reads.gfa

3 Convert miniasm output GFA to FASTA

The FASTA file is needed to re-run minimap in Step 4 below.

awk '$1 ~/S/ {print ">"$2"n"$3}' 20170918_oly_pacbio_miniasm_reads.gfa > 20170918_oly_pacbio_miniasm_reads.fasta

4 Run minimap with default settings

Using the default settings maps the FASTQ reads back to the contigs (the PAF file) created in the fist step. These mappings are required for Racon assembly (Step 5).

-t 23 
20170918_oly_pacbio_miniasm_reads.fasta 20170905_minimap2_pacibio_oly.paf > 20170918_minimap2_mapping_fasta_oly_pacbio.paf

5 Run racon

The output file is the FASTA file listed below.

racon -t 24 

Genome Assembly – Olympia oyster PacBio minimap/miniasm/racon

In this GitHub Issue, Steven had suggested I try out the minimap/miniasm/racon pipeline for assembling our Olympia oyster PacBio data.

I followed the pipeline described by this paper: http://matzlab.weebly.com/uploads/7/6/2/2/76229469/racon.pdf.

This notebook entry just contains the initial minimap execution. Followed up with miniasm and then racon.

Jupyter Notebook (GitHub): 20170907_docker_pacbio_oly_minimap2.ipynb