I've recently taken on a new position at the University of Chicago and Argonne National Laboratory, as Director of the Computation Institute. (The Web site will improve soon.) A recent interview published in the Globus Consortium Journal describes what this is about.
A wonderful thing has been happening over the past year: many previously disparate and apparently incompatible threads (PKI, Grid Security Infrastructure, Shibboleth, SAML, etc.) have come together in a consistent "attribute-based access control" architecture, in which access control decisions can be made on the basis of various user attributes in addition to simple identity. Many people have contributed to making this happen, but Frank Siebenlist has been a major contributor on the architecture and standards.
If you want to learn more about this, one good starting point is a draft article that Von Welch, myself, and others have put together describing how this can work within the context of the TeraGrid cyberinfrastructure. See also a recent article by Bo Lang. It's all very exciting.
As someone who frequently works at the intersection of computer science (the "science of computation") and computational science (the "application of computation"), I often encounter confusion as to the relevance of computer science to projects that involve computation. Some users of computational technologies view computer scientists as theorists with little to offer to those working to solve "real problems." At the other extreme, some computer scientists assert that anything that does not involve theorems is "simple programing."
Like many prejudices, these opinions exaggerate a grain of truth. But what is interesting is that even as prejudices, these views are becoming increasingly untenable. As data volumes grow, as computation becomes ever more important in more domains of science and society, and as networks become more complex and farreaching, both computer science and system engineering increase in importance.
I also quote George Djorgovski, who in a wonderful article, wrote: "applied computer science is now playing the role which mathematics did from the seventeenth through the twentieth centuries: providing an orderly, formal framework and exploratory apparatus for other sciences." As a computer scientist, I like that thought (-:
The astronomy community has pioneered "service-oriented science" techniques for some time: see the nice article by Gray and Szalay for the basics. While the fact that their data is fairly simple and of no commercial value simplifies life relative to some other disciplines, it is still remarkable what they have achieved. Basically, they are developing services that provide access to a growing number of digital sky surveys at different wavelengths. Users can then access these services to look for (say) objects that are visible in the infra red but not the optical (=brown dwarfs), to stack up multiple instances of the same sort of obect (e.g., quasars) to improve signal to noise ratios, etc., all without leaving their desks. Furthermore, someone who develops an interesting analysis technique can in turn publish that as a service.
There are by now over a dozen VO projects around the world, and dozens of sky surveys are online. These sky surveys currently total tens of terabytes (10^12 bytes) of data; the next generation of instruments will generate petabytes (10^15 bytes) of data. These developments are rapidly transforming astronomy. It has already led to new scientific discoveries.
What makes this all possible is a small set of relatively simple but very important conventions TheInternational Virtual Observatory Alliance (IVOA), formed in June 2002, has played an important role in developing these.
We should all be studying how this community works, and working to replicate their successes elsewhere.
Science ran a news article on Grid recently. It's a nice piece. My only criticism is its somewhat narrow focus on high energy physics: certainly understandable given the constraints of a short article, but unfortunate if it gives the impression of a narrow user base. Nothing could be further from the truth. There are certainly hundreds--probably thousands--of Grid projects that span a remarkable range of disciplines and countries. This very breadth makes it hard to get one's head around them all.
On August 21, 1996, we received the first funding for work on the open source Globus software,
from what was then the Advanced Research Projects Agency (now DARPA).
We'll be celebrating the 10th anniversary with a party at GlobusWORLD
in Washington, DC., at 7pm on Sepember 11.
Gary Minden and Mike St
Johns were the enlightened program managers who provided that first
support: if you see them, please invite them to the party! (I don't
know where to reach them.)
It has been claimed that "IT doesn't matter," with the implication that IT is now so commoditized that it can no longer be a significant source of competitive advantage. Conversations with senior executives across many Fortune 2000 companies lead me to disagree with this assessment. True, companies are concerned with controlling IT costs. However, I also find a growing recognition that competitiveness depends on a company's ability to innovate. (Steve Jobs says simply: "Innovation distinguishes between a leader and a follower.") I argue here that there are important strategic opportunities in improving enterprise IT infrastructure to accelerate innovation.
These words are from an editorial I penned for LinuxWorld, ahead of a talk that I gave there today on the same topic. I think it's an important message, whether addressed to industry (as here) or to scientists, and also a message that speaks to a growing commonality of concerns between the worlds of science and industry.
For my first post, I'll start in on a topic that I find of great interest, namely "Science 2.0."
The term Web 2.0 is used widely to denote to the technologies, applications, and business models that underlie success stories such as Google, Amazon, eBay, and Flickr. Powerful services (search, maps, product information, ...) accessible via simple network protocols allow clients to construct new services via composition (aka mashups), such as Declan Butler's avian flu map. Clients gain access to a powerful new programming platform (the ensemble of available services), a trend that is arguably revolutionary in terms of its impact on just about every aspect of the computer industry. Particularly impressive is how this development is enabled by massive infrastructure spending: $1.5B in 2006 by Google alone. Presumably all paid for by advertizing.
By a very loose analogy, we may use the term "Science 2.0" to refer to new approaches to research enabled by a quasi-ubiquitous Internet and Internet-based protocols for discovering and accessing services. Pioneering communities such as astronomy have already demonstrated the potential of such approaches, via virtual observatories that provide online access to digital sky surveys and that have enabled both new discoveries and new approaches to education (it seems fun to be a kid today). The early lead of the astronomy community in this space may owe
something to the fact that astronomical data is reasonably simple in
structure and, as Jim Gray has observed, isn't worth anything! But
other fields such as genomics and environmental sciences are not far
What is exciting and empowering is not simply that data are online: after all, the Web has provided us with access to data for a while. What is new is that we now enough uniformity in access protocols, and sufficient server-side computing power, to support access not by people but by programs. Thus, we see an explosion in data access, as scientists write programs that process large quantities of data automatically. Increasingly, scientists are also publishing useful programs as services: service catalogs that list available services both document and encourage the resulting rapid expansion in the scope and power of computational tools.
Science 2.0 raises many challenging methodological, sociological, and technical issues. How much trust can we place in remote services, and how do we validate (and document) a result based on such services? How do we motivate people to build such services, and how do we ensure that they are maintained? How do we build out the increasingly substantial IT infrastructure that will be needed to support thousands of users? (Unfortunately we can't rely on advertizing ...)