Cloud-POA: A cloud-based map only implementation of PO-MSA on Amazon multi-node EC2 Hadoop Cluster

Neehal, Nafis; Karim, Dewan Ziaul; Islam, Ashraful

DSpace Home
→
DIU Faculty Publication
→
Proceedings
→
View Item

Cloud-POA: A cloud-based map only implementation of PO-MSA on Amazon multi-node EC2 Hadoop Cluster

Neehal, Nafis; Karim, Dewan Ziaul; Islam, Ashraful

URI: http://hdl.handle.net/123456789/69

Date: 2018-02-08

Abstract:

Sequence alignment in bioinformatics and computational biology has always been a challenging task. With Next Generation Sequencing (NGS) techniques in hand, researchers are now capable of studying biological systems at a level never been possible before. Scientists now have billions of bytes of biological data to work with, trillions of sequences to align. But this comes at a cost of requiring computing machines having a tremendous amount of computational and analytical power. Purchasing this huge amount of hardware and setting up a standalone infrastructure would not only cost an unnecessarily massive amount of money and labor but also would become troublesome to maintain. Moreover, for aligning a huge number of DNA or Protein sequences a scalable multiple sequence alignment (MSA) algorithms is needed with decent accuracy. In such context, this paper presents a novel implementation of Partial Order Alignment (POA) algorithm on a multi-node Hadoop Cluster running on MapReduce framework. The implementation was done in Amazon AWS platform with multiple EC2 instances. It is a map-only implementation with Hadoop Streaming. The result of this implementation shows a drastic reduction in runtime with no accuracy degradation.

Show full item record