Current Projects
Galaxy CloudMan
I primarily work on the Galaxy CloudMan project: CloudMan is a manager for coordinating the cloud infrastructure required to run the Galaxy application. However, CloudMan is more than that: it provides (1) a framework for enabling applications to run on the cloud and utilize the elasticity of the underlying infrastructure and (2) an easy-to-use, web-based interface for end users to control their personal cloud compute cluster. Internally, CloudMan is designed to be a standalone application capable of orchestrating all the steps required to dynamically create and manage a compute cluster on a remote resource provider while requiring no computational expertise and no infrastructure to setup.
Currently, CloudMan is operating on the Amazon's Elastic Compute Cloud (EC2) infrastructure. The source code is available from here and the instructions on how to use it are available here.
Research benefits realized as part of the development of CloudMan focus around usability and availability of a specific tool (in current case, the entire Galaxy application) to anyone in need of additional or periodic compute power. From the more technical standpoint, CloudMan utilizes scalability offered by cloud computing and will try to optimize the capabilities of resources to the workload demand of the given Galaxy instance. This alleviates users of CloudMan (and used application) to concern themselves with the details of actual compute hardware (i.e., rather than asking "How many compute nodes would you like to start for this cluster?", suggest: "(a) This job (or workload) can be completed in X hours for Y dollars or M hours and N dollars; which do you prefer? (b) User provides threshold values but then CloudMan automatically scales size of the cluster to match workload demand while incorporating tool requirements and resource capabilities).
Related Talks
- "Enabling NGS Analysis with(out) the Infrastructure", Bioinformatics Open Source Conference (BOSC), Vienna, July 15-16, 2011. [Abstract] [PDF]
- "CloudMan: Galaxy on the Cloud", Galaxy Community Conference, Lunteren, the Netherlands, May 26, 2011. [PDF]
- "Dynamically Scalable, Accessible Analysis for High-Throughput Sequence Data", Bio-IT World, Boston, MA, April 13, 2011. [PDF]
- "NGS Analyses with Galaxy on the Cloud", Intelligent Systems for Molecular Biology, Boston, MA, July 12, 2010. (live demo)
- "Deploying Galaxy on the Cloud", Bioinformatics Open Source Conference (BOSC), Boston, MA, July 9, 2010. [PDF]
- "Discovery of human heteroplasmic sites enabled by an accessible interface to Cloud-computing infrastructure", Biology of Genomes, Cold Spring Harbor, NY, May 11-15, 2010. (presented by James Taylor) [PDF]
Related Publications
- Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J., "Galaxy CloudMan: Delivering Cloud Compute Clusters," BMC Bioinformatics, Vol 11, Issue 12, 2010. [PDF]
- H. Goto, B. Dickins, E. Afgan, I. M. Paul, J. Taylor, K. D. Makova, and A. Nekrutenko, "Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study," Genome Biology, vol. 12, issue R59, 2011. [Open Access]
- Afgan E., J. Goecks, D. Baker, N. Coraor, the Galaxy Team, A. Nekrutenko, and J. Taylor, "Galaxy - a Gateway to Tools in e-Science," in Guide to e-Science: Next Generation Scientific Research and Discovery , K. Yang, Ed., ed: Springer, 2011, p. 145-177. [PDF] [Book]
Posters
-
Afgan E., Baker D., The Galaxy Team, Nekrutenko A., Taylor J., “The Elastic Analysis with Galaxy on the Cloud,” Beyond the Genome, Boston, MA, Oct 11-13, 2010. PDF
Other's Tweets
GigaSciene EA: Includes >100 tools from cloud biolinux plus additional NGS tools from galaxy. 700GB reference genome data.
Jul 16th, 2011
GigaSciene EA: Cloudman can deploy a completely configured galaxy cluster in a matter of minutes. Simple wizard-guided setup.
Jul 16th, 2011
simon_andrews Blown away by Galaxy cloud demo. 20 mins to configure a cluster from scratch and run an analysis. #ISMB2010
5:05 PM Jul 12th, 2010 via Echofon
RT @32nm:
http://galaxy.psu.edu/dev2010/
07:35 Enis Afgan: "Deploying Galaxy on the Cloud"
8:38 PM May 15th, 2010 via API
Machine Image Deployment (mi-deployment)
With the continuous materialization of virtualization technologies, there is an increased need to streamline the process of the machine image (MI) configuration and deployment process. Moreover, this process should be customizable and easily realized. This project provides a Pyhton Fabric-based solution that tries to achieve these aims.
More specifically, mi-deployment provides means to automate the process of a machine configuration, including deployment of desired tools and data. Currently, mi-deployment is targeted at the Galaxy CloudMan project. The base operating system it expects is Ubuntu 10.04, but should be easily portable to other similar systems. With time, this project will evolve to be applicable to a wider variety of operating systems supporting the ability to take any (virtualized) system and configure it based on the demand. The repository with the source code for this project is available here.
Previous Projects
A list and details about projects I have worked on in the past are available here.
