- Contigs >10kbp
- Contigs >30kbp
I used pyfaidx on Roadrunner and the following commands:
faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta
faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta
Ran Quast afterwards to get stats on the new FastA files just to confirm that the upper cutoff value was correct and didn’t get rid of the largest contig(s).
faidx Output folder: 20180512_geoduck_fasta_subsets/
10kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_10k_plus.fasta
30kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_30k_plus.fasta
Quast output folder: results_2018_05_14_06_26_26/
Quast report (HTML): results_2018_05_14_06_26_26/report.html
Everything looks good. The main thing I wanted to confirm by running Quast was that the largest contig in each subset was the same as the original PGA assembly (95,480,635bp.