computational biology

Collaborators Prof. Sorin Istrail and graduate student Ryan Tarpine: “Like a Google Earth map of the regulatory genome.”

Credit: Amy Tarbox

A ‘great leap forward’ in gene research

Grad student Ryan Tarpine has created a software program that will help scientists reach a better – and faster – understanding – of why species evolved differently over time.

By Richard C. Lewis  |  March 24, 2010  |  Email to a friend

The screen of computer science graduate student Ryan Tarpine’s laptop looks like a scene from the movie The Matrix. Columns of brightly colored dots and dashes cascade down the screen, connecting at intervals with bands that run horizontally across the screen.

The horizontal lines represent the genomes of two sea urchin species, their DNA sequences much longer than what can be shown. The colored cascading columns – some narrow, others wider – are places where the genetic coding of the sea urchin species appear to match. More specifically, the fourth-year graduate student explains, the matched areas show where in their DNA the species share gene regulation, the DNA region where genes are encoded with instructions to become an eye, a foot, or something else.

The colorful areas are where two sea urchin species’ genetic coding match.: The colorful areas are where two sea urchin species’ genetic coding match. Tarpine’s software system and a mapping tool in sequencing technology debuted in a paper published last month in the prestigious Proceedings of the National Academy of Sciences. Together, they mark a great leap forward for researchers to determine what has been preserved in the genetic coding among species. Extrapolate further, and it’s an important computational step toward understanding how the genomes of species – including humans – have evolved over time, as well as which genes were preserved, and why.

The software at the heart of the computational system, called cisGRN browser, can be seen as a “genome GPS-like system,” says Professor of Computer Science Sorin Istrail, Tarpine’s Ph.D. adviser and head of the Center for Computational Molecular Biology at Brown. “If we think of genes as cities and the genome as the Earth, the cisGRN browser is like a Google Earth map of the regulatory genome.”

Tarpine’s tool, called the Solexa Mapper, is the key to catapulting genomics research from comparing one gene at a time between organisms’ genomes to “high throughput,” Istrail says. Istrail was a member of the consortium that mapped the genome of the purple sea urchin (Strongylocentrotus purpuratus) in 2006, a key moment in the genomics movement. He also worked for the private Celera Genomics when it mapped the human genome in 2000. Now, with Tarpine’s computational software and mapping tool, biologists can take the DNA from one species, compare it with the genome of another species, and discover where the genetic matches are.

“The challenge,” Tarpine says, “is to give biologists something that is not noise but is a statistical match.”

“The computer is now an analytical tool guiding the next experiments. That is the beauty of it,” Istrail adds. “It is showing the next piece of gold.”

In the PNAS paper, titled “Functional cis-regulatory genomics for systems biology,” Tarpine and Istrail join biologist Eric Davidson at the California Institute of Technology to analyze transcription factors, a class of proteins that binds to DNA, which controls which genes are expressed. The difficulty is figuring out which transcription factors are actually engaged in gene regulation. The scientists compared the DNA sequence of Lytechinus variegatus, a pale green urchin of the southeastern coast of the United States and the Caribbean, and the purple sea urchin, with its fully mapped genome.

“For one regulatory region of a gene, it would take years of methodical work to reveal the code,” Istrail says. Tarpine’s cisGRN browser speeds up the analysis. Now, the Brown and Caltech researchers can take DNA segments of L. variegatus and overlay them on the fully mapped genome of the purple sea urchin to narrow where the species may match in gene regulation. “The browser gives you this zoom where you are most likely to have success in discovering the regulatory ‘gates,’” Istrail says.

“When you make those matches, you can be confident that those genes have been passed on, and you can look more closely at the transcription factors,” adds Tarpine. “Ideally, we’ll find a thousand (DNA) letters all in a row that matches one species to another.”