Author Archives: Allison Tracy

Two recent tasks, including some R code!

In addition to hashing out the writing plan and starting the writing itself, which has been a terrific group effort, the past couple days have consisted of two main tasks.

First, I went through the transcriptome that had been blasted against the nucleotide database to remove any bacterial sequences with an e-value of less than 1e-200. We had agreed that these sequences would most likely represent contamination as the sequencing method targeted mRNA with polyA tails, which would not include bacteria. It turns out that there was an e-value threshold around 1e-179 before all the remaining, lowest e-values were zeros. I removed 1102 sequences with the following R code:

phel5<-read.csv(“Phel_clc_blastn_nt_edit.csv”) # this is the nucleotide blasted transcriptome that I edited to contain fewer columsn – the contig names are the key result from this code anyway so it’s fine
length(phel5$e) # 10768 total

# How many to take out? 1102 – same as ipython. This code makes a data frame of the REMOVED contigs. <- subset(phel5, tax == “Bacteria” & e < 1e-200)


# This code makes a data frame of the final file (final data frame) without the REMOVED values: all those that aren’t bacteria + those that are bacteria but have e > 1e-200 <- subset(phel5, tax !=(“Bacteria”))
summary($tax) <- subset(phel5, tax ==”Bacteria” & e > 1e-200)<-rbind(,
length($tax) # 9666

#length of #4 should = 10768-1102 = 9666

# Write final file to directory

Steven then checked the fasta file to see whether this process had removed the unusually large contains, which was mostly the case. Ultimately, this process led to removing approximately 1000 contains from the transcriptome file of ~ 30,000 contains that we have been using as the basis of our analyses.

  • This will affect our characterization of the transcriptome (e.g. species distributions, gene annotations), which is good timing as we’re working on that now
  • It will not affect the enrichment analyses based on DEGs with SPIDs as none of the bacterial sequences removed belonged to the SPID-annotated DEGs

The second major thing I’ve been working on is sorting through the differentially expressed genes in the enriched processes that map to the Toll pathway. This has been a good way to learn a lot more about the details of the Toll pathway, and above all the variability between species as well as the gaps in the knowledge!

All that glitters is not enrichment

This afternoon we started a coordinated approach to the differentially expressed genes in the sea star transcriptome. While we have ideas lined up for additional analyses of the transcriptome that are bound to prove fruitful, we are currently concentrating efforts on understanding the genes that belong to enriched processes.

After constructing a detailed filtering process for prioritizing which genes to look at, we divided up the 3 enriched processes among the group of people working on the transcriptome. I at first thought it would be important to coordinate, but it was pointed out that we could learn a lot by the different stories and interpretations people bring to the table after analyzing the same subset of genes. I started an in-depth look at the genes involved in the “cytokines” enriched process.  The other two groups are immune function and cellular adhesion.

I’m still in the process of going through genes, but it has been a really interesting exercise. My comparative immunity cap is on! I think we will be best served by finding the most specific information available on the listed genes in closely related species. Already we are finding some of them in echinoderms and even sea stars, so that will help ground our coming interpretations.

This is our work flow for a standardized sorting methodology:

1) Find the genes in your process from a master spreadsheet

2) Ensure that, for each contig, all 3 samples in one condition have values for at least one of the conditions

3) Sort by log fold change – most positive AND most negative

4) Assess if p=0.01 is necessary to give you a manageable number to work with

5) Take SPID and put it into (+your SPID)

6) Note biological functions the gene is involved with and begin literature search; offer possible interpretation(s) of the involvement in the broader biological process (i.e. immune, cell adhesion, or cytokines)


Learning more about sea star physiology

I spent today diving into the literature on sea star physiology including immunity, wound healing, and structural processes. We’ve found a lot of helpful information by working as a team to dig up papers, but we know it’s only just begun.

I’ve added the papers I was reading through to a shared folder so that we can save time by reviewing what other’s have found as well.

Exploring the literature will help provide context for the genes in the seastar transcriptome and the DEGs. Another interesting way to use the information from past work on sea stars and other echinoderms would be to provide further detail on genes that are present in the transcriptome, yet not differentially expressed.

PCR mastery and ISH prep

Today we started off troubleshooting our gels from late last night (thanks Sarah and Ruth!) This is one of the most helpful processes for me. For example, how unexpected results from PCR could be from…

  • Using too many cycles so that the reagents are used up and the amplified product degrades
  • Incorrect temperatures
  • Primer dimers and how to tell the difference (make your product bigger!)
  • Nonspecific amplification

We started off with a bunch of PCR. The others started making DIG probes using the primers that had worked for the target pathogens in seastars and corals yesterday and the PCR DIG Probe Synthesis Kit: Roche cat. no 11 636 090 910. Their notes should have more detail on how the abalone RLO protocol was modified.

I re-ran the CAR-1 (coral-associated rickettsia) primers with a different protocol – the same touchdown PCR I described yesterday but it’s nested. This means I ran it once, then I took 1 uL of product from each of the 5 sample tubes from this first run and used that as the template for a second run.

I also ran the Ehrlichia primers with an annealing temperature of 56.5 (otherwise our standard PCR steps and reagents). Yesterday it ran best at 55 and we got this improved protocol from Lisa.

Sarah photographed the gel. Overall, the Ehrlichia primers did not amplify anything while the CAR-1 primers amplified a clear, bright band in all the coral samples and not the negative control. However, that band was not around 170 bp as we expected based on Casas et al. so we’re not sure what these primers are amplifying. Looking forward to more troubleshooting discussion tomorrow!

Step “-1″ : Checking primers pre-ISH

I spent the majority of today with the group figuring out the PCR primers. We ran so many! It was good to take the time in the morning to really figure out what we were doing – annealing temperatures, how many samples to run, which primer sets could be run together, etc. We were also dealing with two different organisms, corals and sea stars, so clarity was key.

I ended up with one set of primers to target the SSWD putative pathogen in sea star DNA and one set of primers to target the putative pathogen in coral samples with White Band. The coral primers and the PCR conditions were derived from Casas et al. 2004. I learned an entire new way to run PCR via the touchdown method they describe and I successfully programmed it into the machine – score. The successively decreasing annealing temperatures in each of the 30 cycles is meant to increase specificity.

  1. 10 min at 95 C
  2. 30 sec at 95 C
  3. 30 sec at 65 C (subtract -.5 degrees every cycle)
  4. 60 seconds at 72 C
  5. Repeat #2-4 30 times with decreasing temperature each cycle
  6. 10 min at 72 C
  7. Hold at 4 C

I planned to run 14 coral samples and a blank but only had enough master mix for 9 samples and a blank after some faulty pipette misadventures where the pipette itself was inaccurate, but that’s fine because it should still give us a good idea of whether these primers work for a good number of samples.

I planned to run 12 sea star samples but ended up with 4 (2 healthy and 2 diseased so it was at least balanced) due to the same pipette issue.

Sea star master mix:

  • 187.5 uL GoTaq (12.5*15)
  • 22.5 uL BSA (1.5*15)
  • 12 uL FWD primer SS1 (.8*15)
  • 12 uL REV primer SS1 (.8*15)
  • “111” uL nuclease free water (7.4*15) – This is the pipette issue step. A functional one was used for the other additions.
  • Added 23 uL to all PCR tubes with 2 uL DNA

Coral CAR1 primer master mix:

  • 212.5 uL GoTaq (12.5*17)
  • 25.5 uL BSA (1.5*17)
  • 13.6 uL FWD primer SS1 (.8*17)
  • 13.6 uL REV primer SS1 (.8*17)
  • “142.8” uL nuclease free water (8.4*17) – This is the pipette issue step. A functional one was used for the other additions.
  • Added 23 uL to all PCR tubes with 1 uL DNA

Morgan, Ruth, and I made up some 1% agarose gels for the first run before dinner and Sarah and Ruth are running them now – it will be interesting to see what we get!

I hope they ALL work.

If some do, we’ll run the probe PCR and further steps for ISH tomorrow.

Could we speak in code now?

iPython, Revigo, mini eyes, DESeq, iPath, github, galaxy, SQLshare, WGNWCWGNCWGC… we’ve learned a lot of new words.

By this point, we’ve looked at the transcriptome and the differentially expressed genes in many different ways! This morning Casey and I finished off our attempts with iPath, so now we know how to use that. Check out the interesting figure in my notebook. I’ve also included notes on how to use it, though of course it has limitations. Maybe more of a viewing/ brainstorming tool?

It was also really interesting to learn a new R package from Lauren for analyzing genes that are expressed together!

I worked with the transcriptome itself in the afternoon. My goal is to graph the proportion of contains that match to GO slim terms in iPython for molecular functions, cellular processes, and biological processes for both the whole transcriptome and the ones that are annotated as echinoderms to see what we have. We’ll see if this fits into the overall plan down the line, but I thought it was an interesting way to delve further into the genes that are present. I’ve started that in my script at the process level and it actually is very similar to the division of processes for the whole transcriptome that we did on Tuesday –


David –> Revigo –> “playing”

Today I took my file and explored it a lot more in DAVID in the morning. There are so many cool functions! Check out my notebook for some figures on KEGG networks of genes in enriched processes.

In the PM I took my file and analyzed it in Revigo. The class was much more free form today and I feel like there was a glut of information and analyses we achieved! Therefore, I will be devoting more time to reviewing other people’s notebook to see what they did.

In short, I put the GO terms from DAVID on enriched processes into Revigo to visualize it in a couple ways. Then Casey and I spent the remainder of the afternoon messing around to get data into iPath to visualize both Up and down regulated differentially expressed genes on a KEGG network… to be continued but we’re close.

P.S. Today was a convenient day for an invert immunity lecture! We’re starting to think about it a lot more with the data that’s emerging.


Working with 6 libraries and David

Today we extended the skills we learned working with the full transcriptome to a file of the contig counts for the six sea stars.

Here’s my notebook, which I think will update when the mirror is refreshed –

In short, this recounts our work in the morning looking at the counts file and conducting analysis for significantly differentially expressed genes using an awesome new R package – DESeq2. In the afternoon, I joined the DESeq results of differentially expressed genes to their Swiss Prot IDs and input this information into DAVID to begin looking at enrichment. We were all fascinated to see the enrichment of many immune-related processes!

Screen Shot 2014-08-06 at 5.07.01 PM

Still, the homology is a key consideration as highlighted in my list of 10 genes below.

We  worked with the transcriptome to explore which genes are included. I made a different graph of the data (an alternative to yesterday’s pie chart) in Google Fusion of the various categories:

This should be a public link.

Here are 10 genes I thought were interesting in the total transcriptome:

1. This is from a finch = Geospiza fortis immunoglobulin mu binding protein 2 (IGHMBP2), mRNA

2. But at least this immune gene is from a sea urchin = Strongylocentrotus purpuratus immunoglobulin-binding protein 1-like (LOC580592), mRNA

3. And another – Strongylocentrotus purpuratus LIM and senescent cell antigen-like-containing domain protein 2-like (LOC588224), mRNA

4. And an urchin apoptosis regulator (!) – Strongylocentrotus purpuratus apoptosis regulator BAX-like (LOC586236), mRNA

5. This is a rodent immune gene – Spermophilus tridecemlineatus progesterone immunomodulatory binding factor 1 (Pibf1), mRNA

What are these adaptive immunity homologs doing?

6. A stress-related gene from bees – Apis florea stress-activated protein kinase JNK-like (LOC100872795), mRNA

7. An interesting one from Firmicutes bacteria – Staphylococcus aureus antibiotic resistance island carrying fusB: SaRIfusB, strain CS6-EEFIC

8.  A complete viral genome – Propionibacterium phage P105, complete genome

9. A bony fish apoptosis inhibitor – Salmo salar apoptosis inhibitor 5 (api5), mRNA

10. Frog apoptosis inhibitor – Xenopus tropicalis apoptosis antagonizing transcription factor, mRNA (cDNA clone MGC:197699 IMAGE:9040231), complete cds