Rodrigo Fonseca
Assistant Professor of Computer Science
Rodrigo Fonseca’s academic career has more or less paralleled the growth and development of the World Wide Web.
“In my first year as an undergraduate back in Brazil, one of my professors said, ‘There’s this thing called the Web. It’s growing rapidly and it’s raising lots of interesting problems.’ So I started reading and studying about it,” Fonseca said, “and my interest grew along with it. That was 1997.”
Not much more than a decade later, the Web and the Internet have come to dominate daily life in ways no one could have fully predicted, from buying groceries to conducting video teleconferences and catalyzing an astonishing range of social exchanges. Fonseca has been studying it every step of the way.
He graduated at the top of his class (B.S., computer science, 2000) at Universidade Federal de Minas Gerais in Brazil, stayed on to earn his master’s in 2002, and then headed for the University of California–Berkeley. There he focused on the problems and potentials of massively distributed systems, also studied technology management at the Haas School of Business, and earned his Ph.D. in 2008. He followed that with a one-year postdoctoral program at Yahoo! Research, doing further work on distributed computing.
“One of the interesting problems in distributed computing is how to make use of commodity components working together, because that is the direction we are going,” he said. “Google, for example, grew so fast and so big because instead of buying really expensive large machines, they decided to buy thousands of cheaper machines and build very clever software to make them all work together and distribute information — even to work around their failures. They made sure that if one data center went down, users would not even notice.”
The trend toward distributed databases — “big data” is now the term of art, Fonseca said — means that several areas of computer science are converging. “To deal with datasets that Yahoo! and Google and Amazon have to deal with, you can’t rely on traditional big boxes running Oracle. You have to go to distributed systems,” he said. “You have to partition the data, run queries on many machines, and then aggregate the results. It starts to require lots of machinery, large networks, and distributed operating systems.”
Many areas of study are now demanding that kind of database environment: the huge datasets of medical informatics, for example, or projects like the Large Hadron Collider. Adding smart phones, handhelds, and other devices to the Internet multiplies demand for data capacity and management. “You have to grow by adding pieces, and that’s where parallel, distributed, and networked computers come in.”
Fonseca did his thesis work on X-Trace, a prototype framework for keeping track of what is going on in a massively distributed system. The simple act of clicking to see an e-mail message, for example, generates a small tag that may cause hundreds of machines to fetch account information, figure out appropriate ads, serve up the ads, manage routing, handle text and so forth — all of those tasks tagged to the original click and all of them happening within milliseconds. A lag time greater than 200 milliseconds might qualify as an error.
“Google would want to know if the ads didn’t show up, but because the system is so large and has so many components, it’s hard to know what was happening,” Fonseca said. “X-Trace is an attempt to aggregate a map of all those bits of information so that it’s possible to see which machine may have been taking too long — and then maybe going back to a map of that machine to see what was going on.”
Could all that data and the ability to trace much of it create further problems? “There are many things society may have to change,” Fonseca said. “There is already a large generation gap about privacy. Teenagers don’t seem to care — they put everything online and don’t anticipate any consequences. But these are delicate questions. The laws are always lagging behind [technology]. One of the responsibilities of computer science is to mediate these discussions, to inform legislation, to protect liberties and rights people have had for so long.
“There is no shortage of interesting problems going forward.”
