Heads in the Cloud: Harnessing the Healing Power of Bioinformatics

September 20, 2016

Drs. Elaine Gee and Frederick Strathmann are leading the charge in clinical bioinformatics encompassing molecular genetics, immunology, hematopathology, anatomic pathology, and infectious diseases.


Drs. Elaine Gee and Frederick Strathmann are leveraging the power of the cloud to bring next-generation sequencing (NGS) to a new frontier. Gee is an actual rocket scientist turned bioinformatician; Strathmann, known for his mastery in mass spectrometry and quality-control processes, is a medical director of toxicology.

Clinical bioinformatics at ARUP is transitioning to deploy scalable pipelines built on best practice strategies and state-of-the-art technology to perform bioinformatics in the cloud.

Since the first draft of the human genome was published in the scientific journal Nature in 2001, current DNA sequencing technology, such as NGS, has ushered in a new era of medicine. A complex pipeline is required to transform raw sequencing data into clinically actionable results.

“It has been 15 years since the draft release of the human genome, and scientists do not have a full grasp on the genetic secrets hidden within the 3 billion base pairs of DNA sequence. However, great strides have been made to extract medical utility from a subset of genes.”

Elaine Gee, PhD
ARUP Director of Bioinformatics
 

ARUP Bioinformatics spans five areas of clinical focus—molecular genetics, immunology, hematopathology, anatomic pathology, and infectious diseases. Application of NGS technology to these areas has resulted in massively complex data needs with issues in high-throughput data storage, workflow management, and standardization of computation. These issues combined are driving high-performance computing to virtual technology solutions such as cloud computing.

Gee, director of Bioinformatics, has taken on the charge of growing clinical bioinformatics at ARUP from its early start as a successful boutique analysis shop into a large-scale analysis engine. She and Strathmann are leading a team of more than 14 members, including Brett Kennedy, PhD (associate director, Bioinformatics), and Mark Monroe (lead data engineer).

“Bioinformatics is truly at the heart of the successful application of NGS technology in medicine. Without the bioinformatics aspect, the sequencing sits in limbo and cannot be fully appreciated. It is an integrated process that involves numerous technologies and highly skilled people to provide actionable information to physicians.”

Frederick Strathmann
Medical Director, Toxicology; Director, High-Complexity Platforms—Mass Spectrometry; Interim Scientific Director, Biocomputing, ARUP Laboratories
 

Below, Drs. Gee (EG) and Strathmann (FS) discuss clinical bioinformatics at ARUP.

Why do we need clinical NGS testing?

EG: It has been 15 years since the draft release of the human genome, and scientists do not have a full grasp on the genetic secrets hidden within the 3 billion base pairs of DNA sequence. However, great strides have been made to extract medical utility from a subset of genes. Trials like NCI-MATCH help expand the national “cancer knowledge network” that enables labs like ARUP to provide state-of-the-art laboratory-developed tests that interrogate medically relevant genes.

FS: NGS is redefining approaches to medicine and is at the forefront of the precision medicine initiative. The amount of data generated per patient is unparalleled, complicating research, test development, and clinical interpretation. With significant efforts being focused on sequencing efficiency and bioinformatics, the need for highly trained sequence analysts and pathologist to provide actionable reports remains a tremendous focus in the field for continued scalable success.

What role does bioinformatics play in clinical NGS testing?

EG: Conversion of NGS data into interpretable data requires a complex analytical process. NGS is a massively parallel technique that generates short sequence “reads” from a library of sheared (fragmented) genomic DNA. The role of the bioinformatician is to reconstruct the genetic information by piecing together these short sequences using various analytical techniques and algorithms. This in essence is the first step of the NGS bioinformatics process. For a majority of our techniques, bioinformaticians build pipelines to convert data derived from genes buried within the 3 billion base pairs of DNA sequence into human interpretable data through three major steps: sequence alignment and polishing, variant calling, and variant annotation.

FS: Bioinformatics is truly at the heart of the successful application of NGS technology in medicine. Without the bioinformatics aspect, the sequencing sits in limbo and cannot be fully appreciated. It is an integrated process that involves numerous technologies and highly skilled people to provide actionable information to physicians.

What does the future entail for bioinformatics at ARUP and possibly elsewhere?

EG: As the cost per genome continues to drop, the sheer volume of sequencing data generated will explode over time, as will the complexity of that data. For example, 60 exome datasets alone will generate approximately 1 TB of input data. Bioinformatics is turning to the cloud to relieve the bottleneck of limited compute capacity and data storage common to traditional enterprise compute systems. The underlying infrastructure we are building to transition to cloud computing is developed in-house using state-of-the art components, including the Snakemake workflow management system, the Docker platform, SaltStack, RabbitMQ messaging, and the Celery-distributed task queue to create a coordinated system that is both standardized and modular for customizability.

FS: The need to scale all aspects of NGS testing is looming for everyone operating in this space. There are currently no generalizable best-in-practice guidelines, but partnerships are forming that will allow a better vision of how these complex data problems will be solved as a community. The infrastructure and modular approach being implemented at ARUP represent an exciting move towards robust bioinformatics solutions to support high-quality, high-volume NGS testing.

Gee and Strathmann in a humorous situation.

Elaine Gee’s path to ARUP includes various academic pit-stops at Caltech, Harvard, and work with professors at MIT and the Wyss Institute where she invented and patented novel peptides for inducing matrix assembly by combining computational biology with studies at the laboratory bench. Frederick Strathmann, the interim Scientific Director of BioComputing, brings his expertise in IT, programming, quality improvement, and IT commercialization. He founded a start-up called 4DQC, a cloud-based quality control company that builds software for real-time presentation of quality metrics that he currently serves as the CTO with Scott McClellan as the CEO.

Peta Owens-Liston, ARUP Science Communications Writer