Bob Nichol's father was a coal miner, as was his father before him and his father before him. But the youngest of the Nichol clan favored stardust over coal dust and left South Wales to study astronomy at the University of Edinburgh.
| |  |
| | This large, bright galaxy, called NGC 6070, lies 100 million light-years away in the constellation of Serpens. Its image was captured last May by the Sloan Digital Sky Survey telescope at Apache Point Observatory in New Mexico. (SDSS Collaboration) |
Nichol, 32, is now an astrophysicist at Carnegie Mellon University, yet it seems he can't escape his destiny.
He too has become a miner.
Nichol isn't getting his hands dirty, though, because the only thing he's mining is data - computer data generated from new surveys of the heavens. He and Andrew Moore, a Carnegie Mellon computer scientist and fellow Brit, are heading a new $1.6 million National Science Foundation project to create computer software that can sift through the mountains of data spilling out of digitized telescopes.
"Gone are the good old days when an astronomer could just look at the data and make sense of it," Nichol said.
Astronomers anticipate that new discoveries increasingly will come from using computers to look at existing observational data in new ways, rather than from a specific observation or set of observations.
This move toward "virtual astronomy" is about to take a big lurch forward. Just a week ago, the Sloan Digital Sky Survey telescope in the Sacramento Mountains of south-central New Mexico focused on Orion's belt and captured its first "survey-quality" images.
The telescope itself isn't huge by today's standards - its 8-foot-diameter mirror will be dwarfed by the 27-foot goliath sculpted in Wampum by Contraves Brashears Systems and now bound for the Subaru Telescope in Hawaii. But the Sloan is designed to systematically map the northern sky, a task that will begin in earnest in January.
Over the next five years, the Sloan is expected to catalog a staggering 200 million objects that fill 25 percent of the sky; the remainder of the sky is either obscured by the dusty plane of the Milky Way or lies to the south below the Sloan's horizon.
Hundreds of attributes, such as position, size, shape, age and color, will be recorded for each galaxy and quasar that illuminate the telescope's electronic eye.
"For at least 50 years, this will be the field guide to the heavens," said University of Chicago astronomer Michael Turner.
The $77 million project, sponsored largely by the Alfred P. Sloan Foundation and the National Science Foundation, will yield something like 30 trillion bytes of computer data, Turner said, roughly equivalent to the size of the Library of Congress.
"It will be the Sky in a Box," he said. "This will really challenge astronomers to think in a new way. What patterns, what correlations can you discover by looking at the entire sky? How do you rumble through the Library of Congress and make sense of it?"
"It would take scientists thousands of years to look at all the possibilities," Moore guessed. "It's just too much for anyone to comprehend."
That's where data mining comes in. The practice of combing through huge databases to extract particular kinds of information has already caught hold in the banking and telecommunications industries.
Credit card companies use data mining to detect fraud, looking for transactions that don't fit a particular cardholder's buying habits or that typify a crook. Financial institutions use it to manage investments. Marketers track visits to Internet Web sites to assess consumer interest in new products.
Moore already has established a start-up firm, called Schenley Park Research, to develop data mining tools for biotechnology and manufacturing problems.
Astronomers knew they would need similar tools for making sense of the Sloan data. Alex Szalay, an astronomer at Johns Hopkins University and the Sloan's archive director, has worked for five years to develop ways to quickly provide answers to questions that scientists - and eventually the general public - will be asking the database.
"There are something like 3,000 active U.S. astronomers," Szalay said, "and each will be using this data as part of their daily work. Some questions will be simple, others complex." The sheer volume of questions will make swift answers essential. "This is a rather demanding constraint on the performance."
One way to speed this up, Moore said, is by preprocessing the data. He and his research group have developed a technology, called condensed representations, that groups data in such a way that patterns can be rapidly discerned. It ends up consuming more computer memory than the original database, but yields answers almost immediately.
The Carnegie Mellon work is not a formal part of the Sloan project, but represents "some very hot computer science," Szalay said.
In a sense, Nichol explained, the technique involves representing the data as a vast series of histograms - bar charts that plot the frequency of a phenomenon. This yields answers quickly because the computer doesn't have to start counting anew every time a question is posed.
For instance, Nichol has a special interest in clusters of galaxies. He knows older galaxies tend to be found in the cores of these clusters, but he might turn to the Sloan database to quantify this phenomenon. "Even that trivial kind of march through the data can take days," he said. But with Moore's method, the answer will come a thousand times faster, within a matter of minutes.
The difference becomes even more significant when scientists start looking for correlations that might involve three, four or more attributes.
And things could get even more complex as additional databases grow. At the California Institute of Technology, for instance, astronomers are using data mining to search through a sky survey performed by the Palomar Observatory. New telescopes, such as the Carnegie Mellon-built Viper radio telescope now operating at the South Pole, are quickly compiling huge databases.
That's why Szalay has been organizing a grass-roots effort to get astronomers to use similar computer architectures to store future surveys. In the same way that the Internet allows computers to talk to each other, shared architectures would allow astronomical databases - say, an optical database such as the Sloan and a radio astronomy or infrared astronomy database - to be combined or searched in parallel.
"We are building a virtual national observatory," Szalay said, "and that will be more useful than one more telescope."
It's been decades since professional astronomers routinely hunched over telescope eyepieces. Even the practice of using photographic plates has largely gone by the wayside now that electronic cameras can transmit telescope observations around the world or from outer space to an astronomer's desktop computer.
The certainty that astronomers will increasingly mine data to find hidden nuggets rather than just plan observations or build new telescopes is a development that Nichol's father, the miner, will appreciate.
"He'll enjoy the irony of this project," Nichol said.