My Photo

« July 2006 | Main | September 2006 »

August 31, 2006

Computation Institute

I've recently taken on a new position at the University of Chicago and Argonne National Laboratory, as Director of the Computation Institute. (The Web site will improve soon.) A recent interview published in the Globus Consortium Journal describes what this is about.

August 26, 2006

Attribute-based Authorization

A wonderful thing has been happening over the past year: many previously disparate and apparently incompatible threads (PKI, Grid Security Infrastructure, Shibboleth, SAML, etc.) have come together in a consistent "attribute-based access control" architecture, in which access control decisions can be made on the basis of various user attributes in addition to simple identity. Many people have contributed to making this happen, but Frank Siebenlist has been a major contributor on the architecture and standards.

If you want to learn more about this, one good starting point is a draft article that Von Welch, myself, and others have put together describing how this can work within the context of the TeraGrid cyberinfrastructure. See also a recent article by Bo Lang. It's all very exciting.

August 25, 2006

Computation and Computer Science: A Two Way Street

As someone who frequently works at the intersection of computer science (the "science of computation") and computational science (the "application of computation"), I often encounter confusion as to the relevance of computer science to projects that involve computation. Some users of computational technologies view computer scientists as theorists with little to offer to those working to solve "real problems." At the other extreme, some computer scientists assert that anything that does not involve theorems is "simple programing."

Like many prejudices, these opinions exaggerate a grain of truth. But what is interesting is that even as prejudices, these views are becoming increasingly untenable. As data volumes grow, as computation becomes ever more important in more domains of science and society, and as networks become more complex and farreaching, both computer science and system engineering increase in importance.

I spoke to some of these issues in a recent article in Nature, "A Two-Way Street to Science's Future" (PDF). As I say there:

science is increasingly about information: its collection, organization and transformation. And if we view computer science as "the systematic study of algorithmic processes that describe and transform information", then computing underpins science in a far more fundamental way.

I also quote George Djorgovski, who in a wonderful article, wrote: "applied computer science is now playing the role which mathematics did from the seventeenth through the twentieth centuries: providing an orderly, formal framework and exploratory apparatus for other sciences." As a computer scientist, I like that thought (-:

August 22, 2006

VO in Prague

I'm at the XXVIth Congress of the International Astronomical Union in Prague (a wonderful place), the triennial astronomy extravaganza. While the press coverage is all about whether Pluto gets to stay a planet (it seems that it will, sort of), a lot of the conference content is about virtual observatories (VOs). (I gave an invited talk on "Grid Technology and Multidisciplinary Science," which looked at connections between Grid and the VO world.)

The astronomy community has pioneered "service-oriented science" techniques for some time: see the nice article by Gray and Szalay for the basics. While the fact that their data is fairly simple and of no commercial value simplifies life relative to some other disciplines, it is still remarkable what they have achieved. Basically, they are developing services that provide access to a growing number of digital sky surveys at different wavelengths. Users can then access these services to look for (say) objects that are visible in the infra red but not the optical (=brown dwarfs), to stack up multiple instances of the same sort of obect (e.g., quasars) to improve signal to noise ratios, etc., all without leaving their desks. Furthermore, someone who develops an interesting analysis technique can in turn publish that as a service.

There are by now over a dozen VO projects around the world, and dozens of sky surveys are online. These sky surveys currently total tens of terabytes (10^12 bytes) of data; the next generation of instruments will generate petabytes (10^15 bytes) of data. These developments are rapidly transforming astronomy. It has already led to new scientific discoveries.

What makes this all possible is a small set of relatively simple but very important conventions The International Virtual Observatory Alliance (IVOA), formed in June 2002, has played an important role in developing these.

We should all be studying how this community works, and working to replicate their successes elsewhere.

August 21, 2006

Grid in Science Magazine

Science ran a news article on Grid recently. It's a nice piece. My only criticism is its somewhat narrow focus on high energy physics: certainly understandable given the constraints of a short article, but unfortunate if it gives the impression of a narrow user base. Nothing could be further from the truth. There are certainly hundreds--probably thousands--of  Grid projects that span a remarkable range of disciplines and countries. This very breadth makes it hard to get one's head around them all.

Good examples of discipline-oriented projects from fields other than high energy physics include the National Cancer Institute's Cancer Bioinformatics Grid (caBIG), the NSF's Network for Earthquake Engineering Simulation (NEES), DOE's Earth System Grid (ESG), DOE's Fusion Collaboratory, and the French Grid5000 for computer science research. These are collectively delivering value to thousands of users. (ESG alone has 2000 registered users.)

We should also mention infrastructure projects like the U.S. TeraGrid and Open Science Grid, NAREGI in Japan, the UK National Grid Service, APAC in Australia, and the China National Grid, all of which are supporting large multidisciplinary communities.

These projects are all using Globus software, by the way ...

Globus is 10 Today!

ToolkitbdayhatOn August 21, 1996, we received the first funding for work on the open source Globus software, from what was then the Advanced Research Projects Agency (now DARPA). We'll be celebrating the 10th anniversary with a party at GlobusWORLD in Washington, DC., at 7pm on Sepember 11.

Gary Minden and Mike St Johns were the enlightened program managers who provided that first support: if you see them, please invite them to the party! (I don't know where to reach them.)

August 15, 2006

Information Technology Matters

It has been claimed that "IT doesn't matter," with the implication that IT is now so commoditized that it can no longer be a significant source of competitive advantage. Conversations with senior executives across many Fortune 2000 companies lead me to disagree with this assessment. True, companies are concerned with controlling IT costs. However, I also find a growing recognition that competitiveness depends on a company's ability to innovate. (Steve Jobs says simply: "Innovation distinguishes between a leader and a follower.") I argue here that there are important strategic opportunities in improving enterprise IT infrastructure to accelerate innovation.

These words are from an editorial I penned for LinuxWorld, ahead of a talk that I gave there today on the same topic. I think it's an important message, whether addressed to industry (as here) or to scientists, and also a message that speaks to a growing commonality of concerns between the worlds of science and industry.

August 14, 2006

Science 2.0

For my first post, I'll start in on a topic that I find of great interest, namely "Science 2.0."

The term Web 2.0 is used widely to denote to the technologies, applications, and business models that underlie success stories such as Google, Amazon, eBay, and Flickr. Powerful services (search, maps, product information, ...) accessible via simple network protocols allow clients to construct new services via composition (aka mashups), such as Declan Butler's avian flu map. Clients gain access to a powerful new programming platform (the ensemble of available services), a trend that is arguably revolutionary in terms of its impact on just about every aspect of the computer industry. Particularly impressive is how this development is enabled by massive infrastructure spending: $1.5B in 2006 by Google alone. Presumably all paid for by advertizing.

By a very loose analogy, we may use the term "Science 2.0" to refer to new approaches to research enabled by a quasi-ubiquitous Internet and Internet-based protocols for discovering and accessing services. Pioneering communities such as astronomy have already demonstrated the potential of such approaches, via virtual observatories that provide online access to digital sky surveys and that have enabled both new discoveries and new approaches to education (it seems fun to be a kid today). The early lead of the astronomy community in this space may owe something to the fact that astronomical data is reasonably simple in structure and, as Jim Gray has observed, isn't worth anything! But other fields such as genomics and environmental sciences are not far behind.

What is exciting and empowering is not simply that data are online: after all, the Web has provided us with access to data for a while. What is new is that we now enough uniformity in access protocols, and sufficient server-side computing power, to support access not by people but by programs. Thus, we see an explosion in data access, as scientists write programs that  process large quantities of data automatically. Increasingly, scientists are also publishing useful programs as services: service catalogs that list available services both document and encourage the resulting rapid expansion in the scope and power of computational tools.

Science 2.0 raises many challenging methodological, sociological, and technical issues. How much trust can we place in remote services, and how do we validate (and document) a result based on such services? How do we motivate people to build such services, and how do we ensure that they are maintained? How do we build out the increasingly substantial IT infrastructure that will be needed to support thousands of users? (Unfortunately we can't rely on advertizing ...)

I've explored some of these questions in some recent talks and papers, e.g., my keynote at the 2006 Geoinformatics Conference, and in a 2005 article in Science magazine, "Service-Oriented Science." I'll also be exploring other aspects of Science 2.0 in future posts.