Petascale data-intensive computing, that is ...
We recently received an NSF Major Research Instrumentation award to acquire and operate a Petascale Active Data Store. To quote from our press release (removing at least some of the fluff):
The Computation Institute, a joint effort of the University of Chicago and the U.S. Department of Energy's Argonne National Laboratory, has received a grant for a computer system that will enable researchers to store, access and analyze massive data sets.
The system is made possible through a $1.5 million National Science Foundation grant, which includes cost-sharing support from the University of Chicago. The new system is called the Petascale Active Data Store (PADS), which has been optimized for rapid data transactions, both on campus and around the globe.
The PADS design resulted from a study of the storage and analysis requirements of groups in astronomy and astrophysics, computer science, economics, evolutionary and organismal biology, geosciences, high-energy physics, linguistics, materials science, neuroscience, psychology and sociology.
For these groups, according to the PADS team, PADS represents a significant opportunity to look at their data in new ways, enabling new scientific insights and collaborations across disciplines. PADS also will serve as a vehicle for computer science research into active data storage systems and will provide rich data to investigate new techniques.
Several nVidia Tesla graphics processing units (GPUs) will be integrated with traditional CPUs in the PADS system. These GPUs are capable of computing certain operations many times faster than general-purpose personal computers.
PADS will be a hybrid system with many layers of storage. These layers range from a large, tape-based system at Argonne to individual computers on campus and elsewhere. The intermediate layer is a rack of computer disks at Argonne containing duplicate data sets as insurance against hard-drive failure.
To University of Chicago scientists, PADS represents a dramatic improvement over current practice, which requires them to quickly analyze data and then remove it from the system to make room for new data sets. With the storage that PADS provides, groups will be able to keep data active for longer periods of analysis.
Comments