After using SQL share to join the results of our BLAST file to Steven’s GOSlim terms, we were able to get an idea of what the different sequences did and what organisms they were associated with. The sequences can be divided up by molecular function, cellular components, or biological processes. We chose to look into biological processes (denoted as P).
Here’s what we did in SQLShare to join the two tables based on Column 3 in our table and the GO_ID column in Steven’s table:
SELECT * FROM [email@example.com].[table_seastar_clc_uniprot_sprot_separated.tab]blast
blast.Column3=unp.Entry #which column in table “blast” to which column in table “unp”
where #terms of the join, sorting by column
Length >500 and Organism like ‘%frog%’ or Organism like ‘%Human%’
[firstname.lastname@example.org].[SPID and GO Numbers]go
blast.Column3=go.SPID #matched the blast column to the SPID
aspect = ‘P’ (only select records where aspect = P)
<matplotlib.axes.AxesSubplot at 0x108b332d0>
Here’s the same data in excel:
We then put our joined table into Galaxy, which is great because it allows very quick sorting of sequences by almost any taxonomic level. I chose to break it down by phylum, and about half of the sequences that were picked up in the sea star transcriptome belonged to the phylum Echinodermata, which makes a lot of sense. Galaxy is also great because it allows very fast graphics generation, although it doesn’t have the same sort of customization as graphing within iPython. Personally, I think the bar graphs and pie charts they offer look quite nice and are very easy to save for a presentation.