Let’s make graphs

After using SQL share to join the results of our BLAST file to Steven’s GOSlim terms, we were able to get an idea of what the different sequences did and what organisms they were associated with. The sequences can be divided up by molecular function, cellular components, or biological processes. We chose to look into biological processes (denoted as P).

Here’s what we did in SQLShare to join the two tables based on Column 3 in our table and the GO_ID column in Steven’s table:

SELECT FROM [moritsch@washington.edu].[table_seastar_clc_uniprot_sprot_separated.tab]blast

Left Join
  blast.Column3=unp.Entry #which column in table “blast” to which column in table “unp”
  where #terms of the join, sorting by column
  Length >500 and Organism like ‘%frog%’ or Organism like ‘%Human%’

(Now make a join on a join – 3 tables together)
SELECT FROM [moritsch@washington.edu].[table_seastar_clc_uniprot_sprot_separated.tab]blast
  Left Join
  [sr320@washington.edu].[SPID and GO Numbers]go
  blast.Column3=go.SPID #matched the blast column to the SPID
(now need to match this to the GO ID)
  Left join
  aspect ‘P’ (only select records where aspect = P)
(use whatever column you need, but specify the table that has that column name)
Then we exported the table into a iPython and Excel to make a visual displaying relative GOslims (processes only or all GOslim types). About half of the genes had to do with cell-cell signaling, death, and other metabolic or biological processes.
iPython work produced nice bar graphs, but it’s hard to make more than one color or label axes without having to look up the codes. For the record, their pie graphs don’t look that great.
#did our joins in SQL Share, now need to make graphs
#must import pandas as pd
import pandas as pd
pylab inline #displays your graph in the notebook instead of outside
#for more graphing resources, go to http://pandas.pydata.org/
jslim = pd.read_csv(‘mmm_GOSlim_query_P.csv’)
jslim.groupby(‘GOSlim_bin’).Column1.count().plot(kind=’bar’) #make a bar graph, not displayed in the wordpress blog
<matplotlib.axes.AxesSubplot at 0x108b332d0>

Here’s the same data in excel:

Processes output 8-5-14

We then put our joined table into Galaxy, which is great because it allows very quick sorting of sequences by almost any taxonomic level. I chose to break it down by phylum, and about half of the sequences that were picked up in the sea star transcriptome belonged to the phylum Echinodermata, which makes a lot of sense. Galaxy is also great because it allows very fast graphics generation, although it doesn’t have the same sort of customization as graphing within iPython. Personally, I think the bar graphs and pie charts they offer look quite nice and are very easy to save for a presentation.

Leave a Reply