I spent a wonderful week in Cetraro, Italy, at the 9th HPC Conference. A lovely location, exceptional colleagues, and fascinating presentations and discussions on various topics relating to HPC and Grid. Prof. Lucio Grandinetti organizes a wonderful event.
A focus this year on hardware--people showing off their impressive close-to-petascale computers and talking about plans for exascale. Lest people get too proud, Jack Dongarra reminded us that on average it takes 6-8 years for the number one system on the Top 500 list to fall off the bottom. Some discussion of GPGPUs (but does anyone believe that they won't be replaced by general-purpose multicores within a few years?). Also much discussion concerning power management--but not clear how serious anyone is about the topic. (E.g., no real discussion of what tradeoffs will be made to reduce power consumption, or mention of genuinely low-power systems, such as SiCortex.)
Considerable discussion of "clouds" (whatever they may be) much of it naive and ill-informed. Fascinating to hear the same expansive expectations expressed (without any apparent doubts) for "clouds" as we heard five years ago from the most fullsome proponents off grids. "Soon, applications of all sorts will be hosted in the cloud." Well maybe, but big legal, business model, and sociological barriers have so far hindered hosting of core business applications as services, and those barriers seem slow to fall. "Clouds offer arbitrarily scalable computing." Scalable to infinity, for any value of infinity less than a few hundred (at least at present). "Clouds offer much simpler interfaces than grids." There isn't much to choose between say EC2 and Globus Workspace Service from the interface perspective.
Perhaps the most thoughtful remarks on these topics were from Ignacio Llorente, who is working with relevant technologies via his Globus GridWay and OpenNebula projects. He characterized clouds as "a paradigm for the on-demand provision of virtualized resources as a service" (correctly identifying virtualization rather than interfaces as the key advance) and grids as "the technology that will allow for cloud interoperability." It will be interesting to see whether cloud interoperability emerges as an important requirement. In the grid space, a lack of interoperability has so far proved to be more of an irritant than a real obstacle to progress.
Discussion of the European Grid Initiative, a proposal to create a pan-European grid linking national grids within each EU state. A curious lack of discussion concerning the application drivers for this new infrastructure, or their requirements. Is EGEE (focused on distributing jobs to federated clusters) really the right model? Why no discussion of data federation (a big driver for grid computing in the UK and US), services (at the heart of successful grids such as caBIG, BIRN, and Earth System Grid), or the role of supercomputer centers (surely major "powerplants"?). The software strategy was striking in its lack of vision--keep supporting three separate European middleware platforms (ARC, gLite, Unicore), and attempt to integrate them (to what end?), while ignoring the Condor and Globus software used by so many Europeans. Sounds more like "jobs for the boys" than a strategy for supporting European eScience. Let us hope for an outbreak of vision among European grid leaders.
There are some new interfaces built with/on-top the workspace service that might warrant a fresh look:
http://workspace.globus.org/clouds/clusters.html
Posted by: timf | July 09, 2008 at 05:17 PM
Ian,
you don't mention the Open Science Grid which, as you know, is using both Condor and Globus as the core middleware for our distributed infrastructure - now across more than 70 sites and internal to 4 campus infrastructures - in the US.
We are indeed facing the challenges of robust and usable integration and co-scheduling of the large data sets (LIGO, LHC) with the large job execution runs.
Together with the added challenge of our commitment to dynamically enable both cycles and storage to be used by research communities who do not own the resources, nor have a static allocation to use them - as the resources become and are available.
As well as the data federation services for the projects you mention, they as well as OSG communities are addressing the integration across multiple security infrastructures - enabling implementations such as Shiboleth and OpenID to interface to the distributed infrastructure use of X509 certificates.
Again, the EGEE/EGI model seems to ignore the need to enable interoperation of heterogeneity - needed to allow and promote advancing the technologies and methods while maintaining a working system.
Posted by: Ruth | July 15, 2008 at 12:42 PM