Tag Archives: epiRAD

Data Received – Jay’s Coral epiRADseq – Not Demultiplexed

Previously downloaded Jay’s epiRADseq data that was provided by the Genomic Sequencing Laboratory at UC-Berkeley. It was provided already demultiplexed (which is very nice of them!). To be completionists on our end, we requested the non-demultiplexed data set.

Downloaded the FASTQ files from the project directory to Owl/nightingales/Porites_spp:

time wget -r -np -nc --ask-password ftp://gslftp@gslserver.qb3.berkeley.edu/160830_100PE_HS4KB_L4

 

It took awhile:

FINISHED --2016-09-19 11:39:21--
Total wall clock time: 4h 26m 21s
Downloaded: 11 files, 36G in 4h 17m 18s (2.39 MB/s)

Here are the files:

  • JD001_A_S1_L004_R2_001.fastq.gz
  • JD001_A_S1_L004_R1_001.fastq.gz
  • JD001_A_S1_L004_I1_001.fastq.gz
  • 160830_100PE_HS4KB_L4_Stats/
    • AdapterTrimming.txt
    • ConversionStats.xml
    • DemultiplexingStats.xml
    • DemuxSummaryF1L4.txt
    • FastqSummaryF1L4.txt

 

Generated MD5 checksums for each file:

for i in *.gz; do md5 $i >> checksums.md5; done

 

 

Calculate total number of reads for this sequencing run:

totalreads=0; for i in *S1*R*.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads

Total reads: 662,868,166 (this isn’t entirely accurate, as it is counting all three files; probably should’ve just counted the R1 and R2 files…)

 

 

Calculate read counts for each file and write the data to the readme.md file in the Owl/web/nightingales/Porites_spp directory:

for i in *S1*R*.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4)); printf "%st%sn" "$i" "$readcount" >> readme.md; done

 

See this Jupyter notebook for code explanations.

 

Added sequencing info to Next_Gen_Seq_Library_Database (Google Sheet) and the Nightingales Spreadsheet (Google Sheet) and Nightingales Fusion Table (Google Fusion Table).

Share

Data Received – Jay’s Coral epiRADseq

We received notice that Jay’s coral (Porites spp) epiRADseq data was available from the Genomic Sequencing Laboratory at UC-Berkeley.

Downloaded the FASTQ files from the project directory to Owl/nightingales/Porites_spp:

time wget -r -np -nc -A "*.gz" --ask-password ftp://gslftp@gslserver.qb3.berkeley.edu/160830_100PE_HS4KB/Roberts

 

Generated MD5 checksums for each file:

for i in *.gz; do md5 $i >> checksums.md5; done

 

 

Calculate total number of reads for this sequencing run:

totalreads=0; for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads

Total reads: 573,378,864

 

 

Calculate read counts for each file and write the data to the readme.md file in the Owl/web/nightingales/Porites_spp directory:

for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4)); printf "%st%sn" "$i" "$readcount" >> readme.md; done

 

See this Jupyter notebook for code explanations.

 

Added sequencing info to Next_Gen_Seq_Library_Database (Google Sheet) and the Nightingales Spreadsheet (Google Sheet) and Nightingales Fusion Table (Google Fusion Table).

Share