I'm at the kickoff meeting for the next phase of the Globus-based Earth System Grid (ESG), a U.S. Department of Energy project developing technology to manage and provide access to large quantities of climate simulation data. The two ESG portals provide access to more than 100 terabytes of output from U.S. and international climate models. The 4000 registered users have so far downloaded more than 130 terabytes of data as they ask questions such as "why are hurricane intensities increasing." Just last year, these users produced more than 300 scientific papers based on ESG data.
In the next phase, we face big challenges as the quantity of data increases (new petaflop/s computers will generate 10-100 more data), data becomes more distributed (it can't all be moved to a central location, as at present), the user population becomes larger and more diverse (including, e.g., policy analysts as well as climate scientists), and the sophistication of the data analyses to be performed increases.
One important trend will be increased focus on server-side analysis: as data volumes increase, users must be able to request that data be processed at the data location rather than downloaded to their local system. They need access to data analysis services as well as data download functions, so that they can ask "compare the power spectrum of sea surface temperature in the Nino-3 region from these 10 models" rather than "download ocean temperature data for those models for a 100-year simulation period." Needless to say, server-side analysis of petabytes of data is not easy. We'll be working in the coming months to add such capabilities to ESG.
If you want to learn more, here is a fairly recent article on ESG architecture and implementation. Globus technology is used for data access, authentication and authorization, distributed system monitoring, and other purposes.
I see ESG as a premier example of service-oriented science--and also a success story for Grid technology.