Wednesday, August 6, 2014

EnGGen bioinformatics computer

Bioinformatic processing is an essential part of any lab that produces or simply utilizes any NGS data.  The grant that funded our MiSeq also funded acquisition of a new computer to provide some bioinformatic power.  The process was a learning experience for me, albeit an enjoyable one as I like playing with hardware and getting software to work even if it means a lot of time spent.  I find that most of the time, the computer is very reliable and provides the needed power for most questions we might have.  Occasionally something happens and then some time is needed to find the problem and work it out.  This can take 5 minutes or it can take a week and you must be prepared to persevere so that that hardware you have invested money into can continue to function as desired.  Hopefully with every rebuild you will become more adept at setting up your system and see ways in which better performance can be achieved.  I am writing this blog not only to illustrate to others how to set up a decent bioinformatic system but also for myself (or my successor) so that I have all the necessary details in one place.

The system is a Mac Pro tower from late 2012.  It has dual Intel Xeon processors (E5645 @2.4 GHz offering 24 virtual cores) and 64GB RAM.  It came with Mac OSX on a single 1TB drive, but we purchased 4 3TB drives (has 4 SATA bays) and I initially installed Ubuntu Linux 12.04 on it over a RAID 10 configuration.  This means that the drives were paired up to create two 6TB volumes and then each 6TB volume mirrored the other (redundancy).  Externally we also have a Synology DS1813+ RAID enclosure containing 8 4TB NAS drives on Synology Hybrid RAID with 2 drive fault tolerance.

Last week I came back from vacation and my external RAID had finished expanding as I had added drives to it just before leaving.  The computer also wanted to reboot to finish installing updates so I figured it was an ideal time for a restart and then I would finish administering the external RAID.  Unfortunately, the computer failed to reboot and refused every other attempt to do so.  It could have been a number of things (I was immediately suspicious of the software updates), but I eventually learned one of the drives failed.  Turns out that my initial install on RAID 10 was a smart move, but I had to overcome some other problems as well which I will detail here.

Desktop computers are moving toward something called EFI (Extensible Firmware Interface) or UEFI (Universal EFI).  It makes sense to update hardware sometimes, but we are in that painful time right now when most computers in use still use BIOS (Basic Input Output System) instead.  Our Mac is EFI while all other computers in lab are still BIOS-based.  Thus, when I tried to make a new bootable USB Ubuntu drive I failed every time since the Mac hardware was incompatible with the boot partition set up by BIOS-based computers.  Luckily I still had the old 1TB OSX drive laying around and I swapped it in to boot OSX and produced a compatible USB drive that way (Ubuntu 14.04 AMD64/Mac version).  That problem solved I was finally able to get the computer functioning and attempted to examine the RAID array.  Since I had a RAID 10 I decided to simply install the OS over the first drive though in retrospect this is a bit of Russian roulette and I should have simply worked from the USB install.  The first thing I did was install gparted (sudo apt-get install gparted).  Then run gparted to examine the state of each disk (sudo gparted).  This showed there was a problem with volume sdb which was the second drive in the array.  Volumes sdc and sdd still had healthy RAID partitions on them.  To administer RAID in linux, you should install something called mdadm (sudo apt-get install mdadm).  To get things going I first had to stop the RAID (puzzling, but nothing worked until this was done: sudo mdadm --stop /dev/md0).  md0 is the name of the RAID array (multidisk array).  Then a simple command to assemble the RAID got things functioning immediately (sudo mdadm --assemble --scan).  This started up the remnant RAID 10 with 2 of the 4 original drives and mounted it automatically.  I was lucky that the drive used for the OS and the failed drive together constituted a RAID0 pair and not two identical components of the RAID1 (RAID 10 is sometimes called RAID1+0) or some data would have been lost.

Before I could back up the contents of the remaining RAID I had some other things to do first.  I planned to backup the data to the external Synology RAID, but all it had done was expand the RAID array which doesn't change the addressable size of the RAID volume so despite now having over 20TB RAID, the volume as addressed by linux was still only about 7TB.  To top it off, now that I had a fresh OS install, I no longer had the software set up to communicate between the Synology box and the linux system.  So I went to the Synology website and downloaded SynologyAssistant (global 64bit).  You then unzip the files into some directory (I used a subdirectory within Downloads) and run the install script.  However, the install script doesn't come executable, so you have to change this first (sudo chmod a+x install.sh).  Now you can run the script (sudo ./install.sh).  Do what the script tells you and when finished, start SynologyAssistant from command line (type SynologyAssistant).  It should automatically detect the active array (did for me).  Click on it and then click "connect."  The web interface opens in a browser and you need to enter username and password (if you ever lose these....).  From here I was able to administer the RAID to expand the existing volume to fill out the drives.  Unfortunately the first step is a parity check and with so many large drives it took about 30 hours.  Once that is done it is a simple click job to expand the volume and this step only takes a few minutes.  You next need to tell linux how to communicate with the Synology RAID.  If you go to the Synology website, their instructions are very outdated for Ubuntu 10.04 and uses cifs rather than nfs plus some extra credential files that may represent a security risk.  Some others have listed the correct way to do this elsewhere (this is a good post http://www.ryananddebi.com/2013/01/15/linuxmint-or-ubuntu-how-to-automount-synology-shares/).  First you need to install nfs software (sudo apt-get install nfs-common nfs-kernel-server --note sure if you need both or not, but just to be safe).  I had a hard time with nfs first until I realized it is a client service.  First, nfs needs the directory to be used by the OS to address the external RAID to be identified in the /etc/exports file.  I added the following two lines to the end of the exports file (use sudo nano /etc/exports):

#Synology nfs directory
/home/enggen/external ip.address.here.from.synology.interface(rw,sync,no_root_squash,no_subtree_check)

You can then start the nfs service (sudo service nfs-kernel-server start).

Next you need to add a line to /etc/fstab to tell the computer how to mount your external RAID so add the following to the end of fstab:

# Automount Synology RAID device
synology.ip.address.here:/volume1/homes /home/enggen/external nfs rw,hard,intr,nolock 0 0

To make things go you can either reboot or type: sudo mount -a

Now I had a functioning RAID with over 20TB space and I archived any folders from my desktop or a few other database locations to a backup directory using tar command to gzip everything (tar -czvf /path/to/external/RAID/device/archivename.tar.gz /path/to/folder/to/archive.  I opened multiple terminals and set everything to archiving simultaneously which can slow performance, but it was now Sunday and I didn't plan to be in lab all day long.  Once that was going I left and all my data was safe and backed up by Monday morning.

However, all was not well.  I spent the next day or two trying to get the old drives prepared for a clean install.  I found that the remnant file systems were causing me problems so I used gparted to eliminate the various partitions from each drive until I was left with a complete volume of unallocated space.  One drive was less cooperative and it wasn't until I connected it to a windows computer in an external adapter and tried to format it from command line with all zeros (some windows code here).  Windows told me it had over 2000 bad sectors which means this piece of hardware was probably responsible for my non-booting status and even if I had managed to recover it it likely would have failed again soon after.

So, I was down to only 3 3TB drives which sounds like a lot, but I have one project for instance that quickly bloated to nearly 6TB in size.  So I need space.  I looked around the existing lab computers for a drive to harvest and found a 500GB drive.  As a bonus it also spins at 7200rpm so it should be ideal for running Ubuntu OS.  Next I plan to establish a RAID0 (striped RAID) with the existing, healthy 3TB drives which will offer a 9TB capacity and speed benefits from spreading the data across three physical volumes.  To protect this data I will set up daily backups to the external RAID using cron.  To make this easy, I will do this via a service called webmin.  More on that once it is set up.

Wednesday, April 2, 2014

Reliable Sanger sequencing with 0.2uL BigDye

     Sanger sequencing might be the way of the past, but it remains an essential tool for many applications.  Researchers can submit samples for processing as raw DNA (requiring PCR amplification), plasmid, or PCR product for sequencing.  However, as costs for pretty much everything continue to rise, the only way you can control your Sanger costs is to become proficient at this technique and roll back the amount of bigdye that you use per reaction.  List price on the Life Tech website is now about $1100 for 800uL of the stuff, and that doesn't include tax, handling, or dry ice charges.  But we need our sequences, and some of us just don't have the budget to produce sequence according to the ABI protocol.  Here I present a brief protocol for producing high quality sequences using just 0.2uL bigdye per reaction.


1) PCR a clean product (no extra bands)

2) ExoSAP your reactions:
     Combine (adjust volumes to maintain ratios) 50uL H2O, 5uL SAP (1U/ul), 0.5uL ExoI (10U/ul).  Add 2uL to each reaction per about 5uL volume, mix and spin down, and run cycler program (37C 40min, 80C 20min, 10C forever).  Alternatively, you can add less exosap (1uL perhaps), mix and spin down, then let reactions sit on the bench overnight.  In the morning kill the enzymes with 20min at 80C.

3) Prepare sequencing reactions:
     First, I do this in 384well plates.  This means I have very little headspace into which to evaporate any sample volume, and this presumably keeps my chemistry much more stable than if you were to do this in a 96well plate.  That said, I have done many many 5uL reactions in 96well plates with no problems, but I very much prefer 384well plates these days.
     Start with a high concentration primer working dilution (20uM is good, 15uM is easier for fewer reactions).  For the following calculations, I use these solutions: BigDye v3.1, 5X BigDye sequencing buffer, 50mM MgCl2, 20uM primer.
     Each reaction contains:
0.2uL BigDye
1uL Sequencing buffer (final at 1X)
0.15uL MgCl2 (final at 1.5mM extra)
0.75uL primer (final at 3uM)
2uL template
0.9uL H2O

Multiply by the number of samples you have and add 10% for pipetting error.  Distribute 3uL of this mixture to each well, and follow with 2uL of template.  I prefer to seal PCR plates for any thermal cycling applications with reusable silicone mats (http://www.phenixresearch.com/products/smx-pcr384-sealing-mat.asp; http://www.phenixresearch.com/products/mpcs-3510-sealing-mat-pressure-fit-lid.asp) since microseals gave me some grief many years ago (mostly edge evaporation).  You just need to wash these with water.  Making yourself crazy with bleach and autoclaving will shorten their life substantially, plus it's pretty much a waste of time.  Run the following thermal cycle: 95C 2min; 60 cycles of 95C 10s, 50C 10s, 60C 2min; 10C forever.

4) Retrieve your plate and get ready for cleanup.  For 384well plates there is not enough space for an ethanol cleanup, so I use a modification of the Rohland and Reich bead cleanup (http://enggen-nau.blogspot.com/2013/03/bead-cleanups.html).  Make a higher percentage PEG solution (25% instead of 18%) with this recipe (see other post for part numbers):

2650uL H2O
50uL 10% Tween-20
100uL 1M Tris (pH 7)
2000uL 5M NaCl
5000uL 50% PEG8000
200uL carboxylated beads


Mix solution very well, and careful pipetting the PEG as it is like honey.  Add 15uL to each sequencing reaction.  Seal thoroughly with adhesive foil and mix by inversion.  Spin solution down gently.  Just fast/long enough to get the solution into the bottom of the well.  If you see pelleted beads, you need to mix again and spin down more gently.  This may take some experimenting with your centrifuge.  I use an Eppendorf 5804R with an A-2-MTP rotor, and I let it spin up to about 1000 and hit stop to get things into the wells.  Let stand for ~45 min.  The precipitation is somewhat time-dependent as well as concentration dependent, so the longer you wait (to a point), the more sequence you will see close to the primer.  When your timer goes off, or you think you waited long enough, apply your plate to a magnet stand (http://www.alpaqua.com/Products/MagnetPlates/384PostMagnetPlate.aspx).  Tape it in place on either end to keep it from moving.  Separation should take about 5 min, but waiting another 5 min doesn't hurt.  Now, you can pipette the waste volume out or you can do some inverted centrifuging and save a lot of time and tips in the process.  With my centrifuge, 1 min inverted spins on 3 folded paper towels (if plate is full) at 400rpm works well.  Very important that acceleration and deceleration are set to 1.  Any brown you see on the paper towel afterward is usually residual beads that didn't make it to the magnet.  I like to "lube" each well by adding 5uL 70% ethanol before the first spin.  This reduces the viscosity and eases the solution from each well.  Multichannel repeat dispensing electronic pipettes are very useful here.  After the first inverted spin, take the magnet/plate back to your bench.  Add 25uL 70% EtOH to each well.  No need to wait, take the plate right to the centrifuge and spin inverted again.  Repeat the 70% wash twice more.  After the third wash, allow beads to dry (~30 min at room temp, or 3 min in vacuum centrifuge at 60C, mode D-AL so rotor does not turn).  Note that over-dried beads can be very hard to resuspend, and this translates into samples where the DNA doesn't want to go back into solution.  Once dry, resuspend samples in 20uL sterile water.  It helps to seal with foil so you can vortex the plate.  Samples should look like mud during resuspension.  If you see any that look clear with brown flakes, keep vortexing.  Once samples have had the appearance of mud for ~2-5 min, place plate back on magnet and transfer 10uL to a 96 well plate for sequencing.  A little bead carry over will make no difference.  Just spin the plate down hard to pellet the beads before submission.  If you need to reinject any samples, you still have 10uL of backup sequence product.  Also note that you do NOT need to denature.  Cycle sequencing produces only single stranded products.  Put them right on the instrument.

5) Enjoy your data!!

A note on sequencer usage, our lab has both a 3130 (4 cap) and a 3730xl (96 cap).  They do the same thing, and yet their stock protocols were not equal.  I wondered at first if this had to do with something else in the instrument, but then I noticed I got crappy, low-signal peaks on the 3130 when I ran the same product on both instruments.  I checked, and it injected samples for less time, and at a lower voltage.  Further, it cut off sequences after about 600 bases.  Ask your sequencing lab about the module they use.  If at all possible, have them set the injection voltage to 1.5 kV and the injection time to 20sec as this fixed all my problems.  I also extended the run time on the 3130 from 1200 to 1800s, and now I get 1000 bases of sequence.  We have many many runs on both instruments (just replaced the array on the 3130 after 1200+ injections), so this protocol adjustment isn't going to shorten instrument life at all.

And finally, some notes on my sequencing recipe.  We had some difficult samples last year and did a lot of troubleshooting.  First we did a MgCl2 gradient (on ABI control plasmid) to see how this affected results.  Addition of 1.5mM MgCl2 seemed to give an extra 20-40 bases of high quality data.  However, it was addition of copious amounts of primer that made the real difference.  This allowed us to sequence very dilute samples with high success and get nice long read lengths (900+ bases).  We experimented with lower volumes of bigdye and got nice data with as little as 0.05uL/reaction, though the signal dropped off after a few hundred bases.  Perhaps more cycles would improve this, but evaporation can become a challenge when you start doing 100 cycles, and then the cycler runs forever and you can't get anything else done.  I have also found that faster cycling works OK with Bigdye (try cutting extension time to 1 min and raising temp to 68C), but I am not confident enough to use it as a general protocol yet.

Wednesday, January 22, 2014

My electronic lab book

You may or may not have any experience with electronic lab books.  Many of the "better" ones are meant to be integrated with some sort of LIMS (Laboratory Information Management System/Software) which may or may not cost a lot of money and may or may not be useful to your specific needs (for a real joke check out LabBook Beta on Play Store).  Personally, I have tried to digitize various parts of my lab life for year, but I always come back to paper and pen, and securely taping important items (product inserts, gel photos, etc) into my notebook.  As a result, I now have numerous notebooks that span all the way back to 2001.  Since my notes are organized by date, I can usually recall approximately when something was done that I need to reference, but it can take some time to go through everything to find what I need.  I also have witnessed several other people do one task or another on the computer and find their lab notes scattered among excel files, google docs, and the traditional lab book.  So I have been looking for an electronic notebook that is as similar to paper and pen as possible, and may allow for better organization.  Most importantly, it has to feel natural.  If I am forcing myself into the e-notebook exercise, it isn't going to work well and I will be back to paper pretty soon.

I've had a smartphone for about a year now, so I am familiar with the Android OS.  I also have an ipod video that ran faithfully from 2006 until recently, and occasionally help people out who prefer Mac OS.  Given the various issues getting the ipod to play nice with Windows and Linux, and my recent positive experience with Android, I was pretty sure I should go for Android.  Also, it hurts the pocketbook less.

The tablet.  Elegant-looking Samsung hardware.
I settled on a Samsung Galaxy Tab 3 10.1.  I got a refurbished device off Newegg for about $300 with shipping, and simultaneously purchased a cover, stylus, and screen protector.  The cover was another $15, the stylus $25, and the screen protector (pack of 3) was $6.  I had played around with some software using my phone, and planned to use the popular free app Papyrus (looks like a paper notebook) to test drive the new tablet.

Then everything arrived, and I learned a few things...  First, the tablet I purchased has only a capacitive screen.  These are far better than their resistive predecessors, but do not have the stylus functionality of a Galaxy Note series tablet (a few other manufacturers as well).  The note has an integrated stylus called an S-pen which is a digitizing device.  When you enable S-pen functionality in your handwriting software, the screen no longer responds to your finger touch as a means of "palm rejection."  Unfortunately, I had purchased an S-pen stylus that was totally incompatible with my capacitive screen.  And how was I going to make this thing work anyway?  I went to Staples and picked up a Wacom Bamboo Alpha stylus for $15 which seemed to have a finer point than most other capacitive styluses, was think like a pen, and had decent user feedback online.

The cover.  Wakes and sleeps your device when opened or closed.
Unfortunately, I could use my chosen app (Papyrus) for writing only if I also kept a small piece of bubble wrap present to insulate my hand from the screen.  As I wrote across the screen I would have to stop and adjust the position of the bubble wrap.  This is not practical, and I was doubting myself already.  So I went through Google Play store and downloaded free versions of other possibly useful handwriting apps with decent reviews.  If they didn't have a free version to test, I just ignored them since I can't spend university money on apps that might be completely useless (hint hint, Devs).  I tested Papyrus, FreeNote, INKredible, and LectureNotes.  As I mentioned previously, Papyrus lacked palm rejection for a capacitive screen.  Same with FreeNote and INKredible, although INKredible definitely felt really nice when writing.  Hard to explain, but you need an app that lets your brain respond like it would to the physical act and immediate feedback (seeing your written strokes) of writing on paper.  The ONLY app I tested that has a useful palm rejection function is LectureNotes.  Luckily it writes well also.  There are a lot of people online disparaging the use of a capacitive screen, or even the functionality of palm rejection in LectureNotes, but I tell you it works very well.  Many people online suggested downloading a free app called TouchscreenTune to adjust the sensitivity of the screen to improve the palm rejection, but all this app does for me is open briefly before crashing, so it was no help whatsoever.  I did need to go out and purchase another stylus.  For $30, I picked up a Jot Pro by Adonit.  This is the only capacitive stylus you will find that has a usefully small tip.  It is embedded in a plastic disc that allows you to see your writing and not damage your screen.  A little strange at first, but you forget it's there pretty fast.  Adonit has a new stylus called the Touch which has Bluetooth functionality and an onboard accelerometer to yield pressure sensitivity and better palm rejection, but the advanced functions don't work for Android (yet), only for iOS (ipad).  It is unclear if the company (or other app Devs) have any intention to port these functions to Android.

The stylus.  It's magnetic and sticks to the back of the cover.
Almost all the pieces were in place, but I still didn't have a completely functional electronic lab book.  I do DNA work, so I run a lot of gels that I am accustomed to taping into my notebooks.  Also, I wanted the ability to export my notes to the cloud so that I could share specific notebooks with collaborators.  This turned out to be pretty easy.  LectureNotes ($4.50 or so for full version) has a splendid amount of available customizations.  I can export each notebook as a pdf, and specify the destination folder in my directory structure.  Then I use a second app called FolderSync ($2.50 or so to get rid of ads) to sync the contents of that directory with a cloud service.  I chose Dropbox since I got 50GB free for purchasing the tablet, but I would probably use my Ubuntu One or Google Drive account instead if I hadn't had that resource.  FolderSync can use each of these services and many more.  After adding the computer I use to take gel photos to dropbox, I can now import gel photos by telling LectureNotes to import photo from...  Then I choose Dropbox and browse to the new photo, resize, and move it to the position on the page I want and done!!  In order to upload my notebook to the cloud, I still have to physically choose "export" in LectureNotes, but this goes pretty fast.

And now I have something that is working.  Certainly a Note series tablet (or other device with active

stylus capability) would be better suited to my needs, but they are still pretty expensive.  I find myself already coveting the 12" Note that Samsung recently announced for release in the next few months, both for the increased real estate as well as the active stylus functionality (S-pen), but I expect this device to cost at least $700.  So to recap, you absolutely can use a capacitive screen and stylus for your lab book (detractors, please sit down!).  The tablet hardware may be important to my success, so I wouldn't count on a much cheaper device functioning as well.  I am using a Samsung Galaxy Tab 3 10.1 with LectureNotes (using heuristic palm rejection at 6000ms delay) and Adonit Jot Pro stylus.  With FolderSync my notes are synced as pdf files to my Dropbox account for sharing with collaborators.  Happy e-notebooking, scientists!!

A shot of some notes I took today in LectureNotes, complete with gel photo.  My handwriting isn't much worse on the tablet than on paper.  I was also able to import the pages I had managed to produce previously in Papyrus by exporting them as .jpg and importing them as images to new pages I placed before my existing page

A shot of my old lab book for comparison.