Data, data, data! That is the theme for this month.
Olympia oyster data – Going to assess and continue working on assembly the Olympia oyster genome we have from our BGI sequencing project, along with the PacBio data we have. To-date, it’s been difficult to get these two datasets to play nicely with one another. I’ll take a look to see what’s worked, and what hasn’t, as well as try out some other means by which to get a decent assembly.
Geoduck data – We just got back an insane amount of RNAseq and genome sequencing data from an Illumina pilot project. This data needs to be properly documented and catalogued.
Hackweek 2017 – Although this is only a brief period during this month, it’s designed to tackle some lower priority tasks that will help streamline lab operations. Check out the GtiHub repo issues for Hackweek 2017 to get an idea of some of the projects.
Well, my previous goal was to tidy up an existing manuscript and get it re-submitted to PeerJ. That’s pretty much done, as Steven will be giving a final once over and formatting the rebuttal letter prior to resubmission.
June will be a bit of a short month for me, due to some travel, but here’re some things on the agenda:
Update the Oly Genome Wiki to accurately reflect the most recent PacBio Sequencing we had done.
Related to the above goal is updating Nightingales to house just the raw sequencing data files for the Oly PacBio sequencing, while archiving the associated meta data (QC files, reports, etc).
Related to THAT goal is then updating our Nightingales spreadsheet to reflect, and provide links to, the raw sequencing files.
Establish (and build out) an “On Boarding” repo in the Roberts Lab GitHub Page. This should make it easier for new lab members to find the various resources they need. More importantly, it should make it easier for us to direct people to find that info!
A day late, but definitely not a dollar short!
No goals posted last month because I didn’t want anyone to think they were just an April Fool’s joke.
This month my goal is to continue my domination of Pub-a-Thon 2017!
I plan on doing so by getting a second paper submitted this month! That’s right! I’m working on getting the following paper re-submitted:
Differential response to stress in Ostrea lurida (Carpenter 1864)
Goal, singular: Get Oly GBS manuscript completed/submitted.
Oh, actually, there is another, smaller goal that will be very difficult to achieve: win Pub-a-Thon. Jay’s taken a massive lead and has a nearly complete manuscript ready for submission. His manuscript is pretty well fleshed out, so it’ll be very difficult to surpass him at this point. However, I’m always up for a challenge, so I’ll see what I can do…
Anyway, back to my main goal of completing my manuscript.
This should be do-able. I’ve completed the SRA submission process for the raw sequencing data. The stuff that remains is as follows:
- Generate FASTQC analysis on FASTQ files (this is currently running – takes awhile)
- Try to replicate BGI’s FASTQ demultiplexing pipeline to verify that it is functional
- Make decisions with Steven (and Brent?) about what information tables should contain
The beauty of submitting this to the journal Scientific Data, is that it doesn’t require in-depth analysis of your data sets. It merely requires an examination of the data to ensure its integrity, as well as a cursory assessment of the data to evaluate it’s usefulness to the scientific community. No need to delve deeper into the data and attempt to interpret, or draw conclusions about, what the data might mean; that can be left to other researchers who deem this data worthwhile to explore.
First goal is to be the first person in lab to post their goals each month. Props to one of our new grad students, Yaamini Venkataraman on beating me this month!
Next goal is to dominate this year’s Pub-a-thon. I’m working on two different manuscripts, this one and this one, but I still think I can win this!
Stuff that got tackled from last month’s goals:
Freezer organization – This has happened, albeit without much effort on my part. Many thanks to the Big Cheese and [Grace for tackling this project[(https://genefish.wordpress.com/2017/01/28/80-organization)!
Data Management Plan – Some progress has been made on this. I improved the instructions on the DMP a bit, but the master spreadsheet on which the DMP revolves around (Nightingales) is still in a massive state of flux that needs a lot of attention.
Sequencing data handling – Thanks to Sean for putting forth a serious dent in automating this. He wrote an R script to handle this sort of thing. I’m not entirely sure if he’s done testing it, but it seems to work so far. Next will be incorporating usage instructions of this R script into the DMP so that others can utilize it. On that note, I need to figure out where Sean is keeping this script (can’t seem to locate in his notebook.
One of the long-running goals I’ve had is to get this Oly GBS data taken care of and out the door to publication. I think I will finally succeed with this, with the help of Pub-A-Thon. Don’t get too excited, it’s not what you think. It is not the drinking extravaganza that the name implies. Instead, it’s a “friendly” lab competition to get some scientific publications assembled and submitted.
Another goal for this month is to get the -80C organized. We’ve made some major progress on lab organization, with major kudos going to Grace Crandall and her work on cleaning out fridges/freezers and putting together our lab inventory spreadsheet. The -80C organization is the final frontier of getting the lab fully under control and more well-regulated.
Continuing on the organization front, it’d be great if we could get the Data Management Plan finished. Sean Bennett has helped get us much closer to completion. Hopefully this month we can get it finalized and have it be fully functional so that any lab member can easily figure out what to do when they receive new sequencing data.
I’d also like to put together a more automated means of handling our high-throughput sequencing data when we receive it. Ideally, it’d be a Jupyter Notebook and all the user would have to do is enter the desired location (heck, maybe I could even simplify it further by requiring just a species name…) for the files to be stored and then press “play” on the notebook. The files would go through a post-download integrity check, moved to final location, re-check integrity, update checksum files, and update readme files. I have most of the bits here and there in various Jupyter Notebooks already, but haven’t taken the time to put them all together into a single, reusable notebook.
Well, I’ve finally progressed with the Olly GBS analysis!
I’m nearly finished with the analysis of the de novo PyRad data. Next, I’ll run PyRad using our Oly partial genome that we have from BGI. This will allow a more descriptive evaluation of SNP loci, since we’ll actually be able to associate the SNPs with various gene annotations, thus providing more meaningful insight.
On the Oly genome front, I also need to submit samples for PacBio sequencing. This will be an attempt to fill in the gaps of the Oly genome scaffold we currently have.
Finally, if all goes well, I’ll get something written up and submitted to Scientific Data.
Well, I’m serious this time. My goal for this month is to complete the Oly GBS data analysis and, get the data sets and data analysis prepared/placed in satisfactory repositories in preparation for publication in Scientific Data.
Additionally, I feel like I need to better document what I spend (waste?) my time on. For example, last month, I certainly got sidetracked trying to help/troubleshoot working with Docker. Here are just some of the issues that were encountered:
Despite having that list, I really should have notebook entries for each day I’m in lab, even if my day is spent struggling to get software installed and I don’t have any “product” for the day. Having the documentation of what I tried, what worked/didn’t work, will be helpful for future troubleshooting, and will provide some evidence that I actually did stuff.
So, I guess that’s a second goal for the month: Improve notebook documentation for days when I don’t generate a “product.”
Last month’s goals, as it turns out, were way too ambitious. This month’s goal will be to get the Oly GBS data analysis fully completed (currently have individuals data, but need summary of the three populations data). I’ll also get the data sets and data analysis prepared/placed in satisfactory repositories in preparation for publication in Scientific Data.
Whoops! It’s already September 6th! The 1st of the month came and went without me noticing.
One goal for this month: Write up and submit Olympia oyster genotype-by-sequencing (GBS) data to Scientific Data for publication.