Algorithms for life

Contact: Diana Hearit
Aug. 1, 2017

Read more about WMU researchers and their ongoing work in the WMU Magazine.

KALAMAZOO, Mich.—What these computer algorithms uncover about genes and proteins may one day advance drug development and individualized medical treatment. As a computational scientist, Dr. Fahad Saeed’s research projects are all about bigness—big numbers, big data and big ambitions.
His latest endeavor, funded by a coveted National Science Foundation CAREER award grant, is no exception, as it builds on some of the biggest breakthroughs in biological sciences.

But what role could a computational scientist play in biology?

First, about those breakthroughs.

In 2003, the Human Genome Project achieved its stunning goal to sequence and map all the genes in the human body, which has about 20,000.
Another spectacular, more recent advance came with the ability to map an organism’s proteome, or its full complement of proteins. Humans may have upwards of one million.

Scientists say both achievements, separately and together, hold keys to how the human body works fundamentally; and not just natural, normal processes, but how and why things go awry and result in disease and dysfunction. Recall that genes give instructions while proteins carry out those orders.
But the massive amount of data produced by sequencing both these “omes” is large and complex; it is not humanly possible to sift through and make sense of the multitude of interactions and pathways that exist to control the thousands of genes and proteins.

“So, while it is very exciting that we have big data from the genome and that we have big data from the proteome, it’s also a challenge because the techniques that allow us to analyze those data sets are lagging behind,” Saeed says.

This is where biological science turns to experts with Saeed’s advanced skillset in computational science to develop computer algorithms that analyze “big data” to do such things as eliminate irrelevant information, pinpoint biological interactions and, particularly in the case of proteomics and genomics, possibly decipher previously unknown or little understood functions of proteins or genes.

“Without computational biologists, the vast amounts of raw data collected by bench scientists would remain meaningless,” says Dr. Jason Hoffert, a National Institutes of Health scientific review officer who has worked alongside Saeed on projects in the past.

“At the same time, bench scientists play a key role in interpreting the cleaned data sets in order to draw meaningful biological conclusions.”
Based in WMU’s College of Engineering and Applied Sciences, Saeed specializes in high-performance, high-speed computer algorithms designed to break down big data sets into discernible information.

He’s an assistant professor both in the department of computer science and in the department of electrical and computer engineering and directs the Parallel Computing and Data Science laboratory at the college.

He’s developing computer algorithms capable of analyzing massive amounts of genomic and proteomic data more efficiently than any previous techniques as well as designing architecture with the capacity to store, manage and transfer this data.

To give a sense of the “bigness” of the biological data this project will grapple with, Saeed explains that currently “some of the data sets we can produce are up to 10 terabytes (1,000 gigabytes), and that is just for one experiment for one species.”

“If you combine data sets (genomic plus proteomic), they get into the petabyte level (1,000 terabytes), and the computational challenges just get exponentially larger and more complex, and that is what the grant proposes to solve.”

The magnitude is so large, it’s difficult to imagine. For perspective, one petabyte is the equivalent of 20 million four-drawer file cabinets filled with text, according to mozy.com.

What Saeed’s algorithmic tools help life sciences researchers tease out about genes and proteins may lead to advances in drug development and individualized medical treatment.

Ultimately, Saeed says, “We want to take this genomic and proteomic science to a place where we are able to do genomic and proteomic profiling of each person who goes to a clinic. That is what we call personal or precision medicine.

“If you are able to profile genomes and proteomes at the individual level, we are able to very specifically know what diseases you might be prone to and what are the things we can do to make sure you do not get those diseases.”

However, he concedes that nature is very complex. “It will take a lot of time to really know in a very systemwide level what is going on with our bodies. But we will reach that.”

Saeed hopes computational tools he is designing will help make “crucial steps toward understanding the genomic, proteomic and evolutionary aspects of species in the tree of life.”