Wednesday, August 6, 2014

EnGGen bioinformatics computer

Bioinformatic processing is an essential part of any lab that produces or simply utilizes any NGS data.  The grant that funded our MiSeq also funded acquisition of a new computer to provide some bioinformatic power.  The process was a learning experience for me, albeit an enjoyable one as I like playing with hardware and getting software to work even if it means a lot of time spent.  I find that most of the time, the computer is very reliable and provides the needed power for most questions we might have.  Occasionally something happens and then some time is needed to find the problem and work it out.  This can take 5 minutes or it can take a week and you must be prepared to persevere so that that hardware you have invested money into can continue to function as desired.  Hopefully with every rebuild you will become more adept at setting up your system and see ways in which better performance can be achieved.  I am writing this blog not only to illustrate to others how to set up a decent bioinformatic system but also for myself (or my successor) so that I have all the necessary details in one place.

The system is a Mac Pro tower from late 2012.  It has dual Intel Xeon processors (E5645 @2.4 GHz offering 24 virtual cores) and 64GB RAM.  It came with Mac OSX on a single 1TB drive, but we purchased 4 3TB drives (has 4 SATA bays) and I initially installed Ubuntu Linux 12.04 on it over a RAID 10 configuration.  This means that the drives were paired up to create two 6TB volumes and then each 6TB volume mirrored the other (redundancy).  Externally we also have a Synology DS1813+ RAID enclosure containing 8 4TB NAS drives on Synology Hybrid RAID with 2 drive fault tolerance.

Last week I came back from vacation and my external RAID had finished expanding as I had added drives to it just before leaving.  The computer also wanted to reboot to finish installing updates so I figured it was an ideal time for a restart and then I would finish administering the external RAID.  Unfortunately, the computer failed to reboot and refused every other attempt to do so.  It could have been a number of things (I was immediately suspicious of the software updates), but I eventually learned one of the drives failed.  Turns out that my initial install on RAID 10 was a smart move, but I had to overcome some other problems as well which I will detail here.

Desktop computers are moving toward something called EFI (Extensible Firmware Interface) or UEFI (Universal EFI).  It makes sense to update hardware sometimes, but we are in that painful time right now when most computers in use still use BIOS (Basic Input Output System) instead.  Our Mac is EFI while all other computers in lab are still BIOS-based.  Thus, when I tried to make a new bootable USB Ubuntu drive I failed every time since the Mac hardware was incompatible with the boot partition set up by BIOS-based computers.  Luckily I still had the old 1TB OSX drive laying around and I swapped it in to boot OSX and produced a compatible USB drive that way (Ubuntu 14.04 AMD64/Mac version).  That problem solved I was finally able to get the computer functioning and attempted to examine the RAID array.  Since I had a RAID 10 I decided to simply install the OS over the first drive though in retrospect this is a bit of Russian roulette and I should have simply worked from the USB install.  The first thing I did was install gparted (sudo apt-get install gparted).  Then run gparted to examine the state of each disk (sudo gparted).  This showed there was a problem with volume sdb which was the second drive in the array.  Volumes sdc and sdd still had healthy RAID partitions on them.  To administer RAID in linux, you should install something called mdadm (sudo apt-get install mdadm).  To get things going I first had to stop the RAID (puzzling, but nothing worked until this was done: sudo mdadm --stop /dev/md0).  md0 is the name of the RAID array (multidisk array).  Then a simple command to assemble the RAID got things functioning immediately (sudo mdadm --assemble --scan).  This started up the remnant RAID 10 with 2 of the 4 original drives and mounted it automatically.  I was lucky that the drive used for the OS and the failed drive together constituted a RAID0 pair and not two identical components of the RAID1 (RAID 10 is sometimes called RAID1+0) or some data would have been lost.

Before I could back up the contents of the remaining RAID I had some other things to do first.  I planned to backup the data to the external Synology RAID, but all it had done was expand the RAID array which doesn't change the addressable size of the RAID volume so despite now having over 20TB RAID, the volume as addressed by linux was still only about 7TB.  To top it off, now that I had a fresh OS install, I no longer had the software set up to communicate between the Synology box and the linux system.  So I went to the Synology website and downloaded SynologyAssistant (global 64bit).  You then unzip the files into some directory (I used a subdirectory within Downloads) and run the install script.  However, the install script doesn't come executable, so you have to change this first (sudo chmod a+x install.sh).  Now you can run the script (sudo ./install.sh).  Do what the script tells you and when finished, start SynologyAssistant from command line (type SynologyAssistant).  It should automatically detect the active array (did for me).  Click on it and then click "connect."  The web interface opens in a browser and you need to enter username and password (if you ever lose these....).  From here I was able to administer the RAID to expand the existing volume to fill out the drives.  Unfortunately the first step is a parity check and with so many large drives it took about 30 hours.  Once that is done it is a simple click job to expand the volume and this step only takes a few minutes.  You next need to tell linux how to communicate with the Synology RAID.  If you go to the Synology website, their instructions are very outdated for Ubuntu 10.04 and uses cifs rather than nfs plus some extra credential files that may represent a security risk.  Some others have listed the correct way to do this elsewhere (this is a good post http://www.ryananddebi.com/2013/01/15/linuxmint-or-ubuntu-how-to-automount-synology-shares/).  First you need to install nfs software (sudo apt-get install nfs-common nfs-kernel-server --note sure if you need both or not, but just to be safe).  I had a hard time with nfs first until I realized it is a client service.  First, nfs needs the directory to be used by the OS to address the external RAID to be identified in the /etc/exports file.  I added the following two lines to the end of the exports file (use sudo nano /etc/exports):

#Synology nfs directory
/home/enggen/external ip.address.here.from.synology.interface(rw,sync,no_root_squash,no_subtree_check)

You can then start the nfs service (sudo service nfs-kernel-server start).

Next you need to add a line to /etc/fstab to tell the computer how to mount your external RAID so add the following to the end of fstab:

# Automount Synology RAID device
synology.ip.address.here:/volume1/homes /home/enggen/external nfs rw,hard,intr,nolock 0 0

To make things go you can either reboot or type: sudo mount -a

Now I had a functioning RAID with over 20TB space and I archived any folders from my desktop or a few other database locations to a backup directory using tar command to gzip everything (tar -czvf /path/to/external/RAID/device/archivename.tar.gz /path/to/folder/to/archive.  I opened multiple terminals and set everything to archiving simultaneously which can slow performance, but it was now Sunday and I didn't plan to be in lab all day long.  Once that was going I left and all my data was safe and backed up by Monday morning.

However, all was not well.  I spent the next day or two trying to get the old drives prepared for a clean install.  I found that the remnant file systems were causing me problems so I used gparted to eliminate the various partitions from each drive until I was left with a complete volume of unallocated space.  One drive was less cooperative and it wasn't until I connected it to a windows computer in an external adapter and tried to format it from command line with all zeros (some windows code here).  Windows told me it had over 2000 bad sectors which means this piece of hardware was probably responsible for my non-booting status and even if I had managed to recover it it likely would have failed again soon after.

So, I was down to only 3 3TB drives which sounds like a lot, but I have one project for instance that quickly bloated to nearly 6TB in size.  So I need space.  I looked around the existing lab computers for a drive to harvest and found a 500GB drive.  As a bonus it also spins at 7200rpm so it should be ideal for running Ubuntu OS.  Next I plan to establish a RAID0 (striped RAID) with the existing, healthy 3TB drives which will offer a 9TB capacity and speed benefits from spreading the data across three physical volumes.  To protect this data I will set up daily backups to the external RAID using cron.  To make this easy, I will do this via a service called webmin.  More on that once it is set up.