Tag Archives: sqlite3

Data Aggregation – Ava’s Complete Sample List

I received Ava’s master sheet of all the samples she collected for this project. I needed to aggregate a full list of the samples I’ve previously extracted DNA from, so that I can compare to her master sample list and generate a list of the remaining samples that I need to extract DNA from..

Here are the files I needed to work with (Google Sheets):

The files required multiple formatting steps in order to produce accession numbers that were formatted in the same fashion across all three sheets. This was needed in order to be able to successfully merge all of the sheets into a single sheet containing all of the data, which will make it easy to sort, and generate a list of samples that need to be extracted.

Text file manipulations were performed in a Jupyter notebook, which is linked below. All files were downloaded from Google Sheets as tab-delimited files prior to working on them.

Jupyter Notebook file: 20170831_ava_ab_samples_aggregation.ipynb
Jupter Notebook on NBviewer: 20170831_ava_ab_samples_aggregation.ipynb

Now that we have the tables formatted, we can use the accession number as a common field by which to combine the two tables. This will allow easy sorting and identification of the remaining samples that I need to extract. I’ll do this by using SQLite3.

Use SQLite3 (in Linux Ubuntu):

Change to directory containing files:

cd ~/Dropbox/Sam Friedman Lab/tmp

Start SQLite3:

sqlite3

Set field separator as tab-delimited:

.separator "t"

Create databases by importing files and providing a name for corresponding databases:

.import ava_master_ab_list_formatted.tsv master_list
.import Ava_WS_Transmission_DNA_Extractions_all.tsv extracted_list

Set output display mode to tabs:

.mode tabs

Set output display to include column headers:

.headers on

Set the output to write to a file instead of the screen:

.output 20170905_master_extraction_list.tsv

SELECT statement to combine the two tables:

SELECT * FROM (SELECT * FROM master_list UNION ALL SELECT * FROM extracted_list) s GROUP BY accession_number ORDER BY accession_number;

The SELECT statement above works in the following fashion:

Uses a sub-query (contained in the parentheses) that combines all of the rows in both tables and creates an intermediate table (that’s the s after the sub-query). Then, all of the columns in that intermediate table are selected by the initial SELECT * FROM and organized by the GROUP BY clause (which combines any rows with identical values in the accession_number column) and then sorts them with the ORDER BY clause.

After that’s finished, we want to reset the output to the screen so we don’t overwrite our file:

.output stdout

The output file is here (Google Sheet): ava_abalone_master_extraction_list

Share

Data Aggregation – Black Abalone DNA for RLO, RLOv, & Prophage Portal qPCRs

I need to identify samples from the 1st and 2nd black abalone experiments in order to run qPCRs on them, using the primers mentioned in the title of this post. However, I only want to use samples that are RLO+. The existing qPCR data is all over the place (in multiple spreadsheets and split across multiple tabs within those spreadsheets) and is a serious pain in the neck to track down.

Here are the three different spreadsheets that I’ve found that have existing withering syndrome RLO (RLO) qPCR data:

I took the time to aggregate all of this data into a somewhat messy spreadsheet that contains the “raw” qPCR data from all of the black abalone qPCR data. I also calculated the mean copy number for all of the replicates:

Google Sheet: 20160420_BlackAb_RLO_qPCR_compilation_raw

 

 

 

 

Since that spreadsheet isn’t pleasant to look at, I’ve slimmed it down with just the mean qPCR data:

Google Sheet: 20160420_BlackAb_RLO_qPCR_compilation_means

 

 

Additionally, I’ve also added the mean RLO qPCR data to Lisa’s Black Abalone Expt 1 spreadsheet. This spreadsheet is currently the most comprehensive aggregation of black abalone data, since it also contains histology scoring of various tissues, as well as qPCR data for a slew of black abalone genes:

Google Sheet: Black Abalone: Expt 1 – WS & Phage

This will greatly simplify locating qPCR data for any black abalone samples in the future!

I will select samples from this list for qPCR.

 

For posterity, here’s how I slimmed down the messy “raw” spreadsheet using SQLite.

Created a sqlite database using GitBash for Windows:
Change to directory where file is located:

$cd Downloads

Start sqlite:

$sqlite3

Tell sqlite that the field separator will be commas (i.e. CSV file):

sqlite>.separator ","

Import the CSV file and provide a name for the resulting database:

sqlite>.import 20160420_BlackAb_RLO_qPCR_compilation_raw.csv BlackAbRLOqPCR

Set output display mode to column for easier reading:

sqlite>.mode column

Set output display to include column headers:

sqlite>.headers on

Change output from screen to designated .csv file:

sqlite> .output 20160420_BlackAb_RLO_qPCR_compilation_means.csv 

Change mode to comma separated:

sqlite> .mode csv

SELECT statement:

sqlite> SELECT Sample, qPCR_date, mean_copies, data_source FROM BlackAbRLOqPCR WHERE mean_copies>=0;

The SELECT statement above selects the columns (Sample, qPCR_date, etc..) from the database (i.e. table) we created earlier and only picks the rows where the value in the “mean_copies” column is greater than or equal to 0. This ensures that only the rows with values are selected and gets rid off all the “mess” that we don’t want in the final spreadsheet.

 

Change output back to screen (so we don’t continue to write to the csv file we made a few steps ago):

sqlite> .output stdout

 

Share

Sample ID- XenoCal Prophage Portal Tests

Now that the XenoCal prophage portal primers appear to be in working order, Carolyn wants me to test them out on 10 samples with the following status':

  • RLO-/RLOv-
  • RLO-/RLOv+
  • RLO+/RLOv-
  • RLO+/RLOv+

In order to quickly identify samples with these qualifications, I ran a SQL query on the following spreadsheet that contain qPCR data for both withering syndrome (RLO) and the phage (RLOv):

I saved the following worksheets from the above Google Sheet as CSV files:

  • water 2010
  • water 2011

These were imported to SQLite as I’ve previously done.

The two sheets were renamed for use in SQLite, respectively:

  • AbEndoWater2010
  • AbEndoWater2011

Here are the four queries I ran to obtain the four combinations of RLO/RLOv samples listed above

RLO-/RLOv-

sqlite> SELECT '2011_H2O', "DNA Tube Label", "Mean Cq", "RLOv_mean_Cq" FROM AbEndoWater2011 WHERE "Mean Cq"=0 AND "RLOv_mean_Cq"=0

 

RLO-/RLOv+

sqlite> SELECT '2011_H2O', "DNA Tube Label", "Mean Cq", "RLOv_mean_Cq" FROM AbEndoWater2011 WHERE "Mean Cq"=0 AND "RLOv_mean_Cq">0

 

RLO+/RLOv-

sqlite> SELECT '2011_H2O', "DNA Tube Label", "Mean Cq", "RLOv_mean_Cq" FROM AbEndoWater2011 WHERE "Mean Cq">0 AND "RLOv_mean_Cq"=0

 

RLO+/RLOv+

sqlite> SELECT '2011_H2O', "DNA Tube Label", "Mean Cq", "RLOv_mean_Cq" FROM AbEndoWater2011 WHERE "Mean Cq">0 AND "RLOv_mean_Cq">0

 

Results:

It looks like we do not currently have 10 samples that are RLO+/RLOv-. I will contact Carolyn to see if she happens to know of any samples that are RLO+, but do not contain (or, should not) any RLOv.

The full list of results can be seen in the Google Sheet below.

Google Sheet: 20160322_RLO_RLOv_pos_negs

Share

Sample ID – Black Abalone DNA for RLOv qPCRs

Carolyn & Stan Langevin have agreed that the DNA helicase qPCR should be tested on 10 black abalone DNA extractions that fall into multiple levels of the Friedman Lab’s withering syndrome histology scoring.

Downloaded the (Google Sheet) Black Abalone: Expt 1 – WS & Phage as a CSV file. After downloading, I renamed the file (Black_Abalone.csv) to facilitate easier usage in the following steps.

Created a sqlite database using GitBash for Windows:
Change to directory where file is located:

$cd Downloads

Start sqlite:

$sqlite3

Tell sqlite that the field separator will be commas (i.e. CSV file):

sqlite>.separator ","

Import the CSV file and provide a name for the resulting database:

sqlite>.import Black_Abalone.csv BlackAbs

Set output display mode to column for easier reading:

sqlite>.mode column

Set output display to include column headers:

sqlite>.headers on

 

To select all the samples that have scores of 0 in both PE and DG RLO fields (screen cap does not show entire output list):

 

To select all the samples that have scores of 1 in both PE and DG RLO fields:

 

To select all the samples that have scores of 2 in both PE and DG RLO fields:

 

Here are the full set of results in a table

RLO/RLOv 0 RLO/RLOv 1 RLO/RLOv 2
06:5-03 06:5-35A 06:5-31
06:5-04 06:50-08 06:5-32B
06:5-08 06:50-10 06:6-46
06:5-09 06:6-32 06:6-49
06:5-10 06:6-39 08:3-05
06:5-11 06:6-42 08:3-07
06:5-14 06:6-44 08:3-15
06:5-16 06:6-52 08:3-16
06:5-18 06:6-54
06:5-20 07:12-18
06:5-21 08:3-08
06:5-22 08:3-10
06:5-24
06:5-30
06:50-04
06:50-05
06:50-11
06:50-12
06:50-13
06:50-15
06:50-16
06:6-01
06:6-02
06:6-03
06:6-05
06:6-08
06:6-11
06:6-12
06:6-13
06:6-15
06:6-16
06:6-17
06:6-18
06:6-20
06:6-21
06:6-22
06:6-23
06:6-24
06:6-25
06:6-26
06:6-27
06:6-28
07:12-01
07:12-02
07:12-03
07:12-04
07:12-05
07:12-06
07:12-07
07:12-09
07:12-10
07:12-13
07:12-19
08:3-01
08:3-02
08:3-03
08:3-04
08:3-13
08:4-01
08:4-02
08:4-03
08:4-04
08:4-05
08:4-06
08:4-07
08:4-08
08:4-09
08:4-10
08:4-11
08:4-12
08:4-13
08:4-14
08:4-15
08:4-16
08:4-17
08:4-18
08:4-19
08:4-20
08:4-21
08:4-22
08:4-23
08:4-24
08:4-25
08:5-06

Will select just 10 of those in the RLO/RLOv 0 column for use in qPCR.

I was able to track down the boxes where are these DNAs were stored (see images below).

Boxes that were not labeled with accession numbers of the samples contained therein are now labeled.

Boxes that contained samples that belonged in other boxers were transferred to the appropriate box.

All boxes were located, and returned, to the big -20C in 240 on Lisa’s shelf.

Share