OK, so things have improved since the last attempt at getting this BGI script to run and demultiplex the raw data.
I played around with the index.lst file format (based on the error I received last time, it seemed like a good possibility that the file formatting was incorrect) and actually got the script to run to completion! Granted, it took over 16hrs (!!), but it completed!
See the Jupyter notebook link below.
Well, although the script finished and kicked out all the demultiplexed FASTQ files, the contents of the FASTQ files don’t match (the read counts differ between these results and the BGI files) the original set of demultiplexed files. I’m not entirely sure if this is to be expected or not, since the script allows for a single nucleotide mismatch when demultiplexing. Is it possible that the mismatch could be interpreted slightly differently each time this is run? I’m not certain.
Theoretically, you should get the same results every time…
Maybe I’ll re-run this again over the weekend and see how the results compare to this run and the original BGI demultiplexing…
Jupyter notebook (GitHub): 20170314_docker_Oly_BGI_GBS_demultiplexing_reproducibility.ipynb
Jupyter notebook (may be easier to view in GitHub link above):