My Photo

« September 2006 | Main | November 2006 »

October 31, 2006

What Do You Do With a Million Books?

I am looking forward to an upcoming symposium at the University of Chicago: What Do You Do With a Million Books? (November 5 and 6: but why on a Sunday?):

In the wake of recent large-scale digitization projects aimed at providing universal access to the world's vast textual repositories, humanities scholars, librarians and computer scientists find themselves newly challenged to make such resources functional and meaningful.
Digitizing "a million books" ... poses far more than just technical challenges. Tomorrow, a million scholars will have to re-evaluate their notions of archive, textuality and materiality in the wake of these developments. How will humanities scholars, librarians and computer scientists find ways to collaborate in the "Age of Google?"

Speakers include John Unsworth, from the University of Illinois, a pioneer in digital humanities, and Gregory Crane, who's March 2006 article perhaps suggested the name for this symposium. (He discusses the challenges of scale, heterogeneity, granularity, noise, audience, and distributors.)

The Computation Institute is a sponsor. We already have a preliminary project underway applying machine learning technology to english language texts, and I hope to see more such projects in the near future.

October 30, 2006

Instant Grid from Germany

Instantgrid_gwdgI've been watching the Instant Grid web develop for a while. Unfortunately, my German isn't too good; fortunately, there is more and more content in English, and from what I can see, it seems really nice.

Instant Grid combines Knoppix Linux, Globus 4.0.2 as Grid middleware, GridSphere (for portals), and some other technology to build a Grid-enabled cluster system that you can boot from CD-ROM onto a collection of PCs.

If anyone has experience using this, let me know.

October 29, 2006

Web 1.0 Revisited

With all of the hype^H^H^H^H excitement around Web 2.0, and the upcoming Web 2.0 conference, it is good to see some people prepared to celebrate the accomplishments of Web 1.0, such as the underappreciated blink tag and the ever-useful <BR>.

October 28, 2006

Durable Nonsense

Sally Floyd cites three principles of durable nonsense in a talk on network simulation:

  1. For every piece of durable nonsense, there is an irrelevant frame of reference in which it makes perfect sense.
  2. Rigorous reasoning from inapplicable assumptions yields the world's most durable nonsense.
  3. The roots of most nonsense are found in the fact that people are more specialized than problems.

This concept of "durable nonsense" is wonderful and I think very useful. I suspect that more computer science research than we would like can be categorized in such terms.

I think the source for these principles is R. A. Rosanoff, "A Survey of Modern Nonsense as Applied to Matrix Computations," April 1969, although Floyd also cites John Spragins, "Computer System Performance Modeling and Durable Nonsense", January 1979--perhaps that's where she saw Rosanoff quoted.

October 27, 2006

Connections Between Grid and P2P

What do Grid and P2P have to do with each other? Adriana Iamnitchi and I wrote a paper on that question a few years ago. The title is "On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing", which we explained as follows:

It has been reported that life holds but two certainties, death and taxes. And indeed, it does appear that any society-and in the context of this article, any large-scale distributed system-must address both death (failure) and the establishment and maintenance of infrastructure (which we assert is a major motivation for taxes, so as to justify our title!).

Continue reading "Connections Between Grid and P2P" »

October 26, 2006

Grimoires Service Registry

Luc Moreau writes to me:

We are happy to announce that Grimoires 1.2.0 and Grimoires-WSRF 0.9.0 (for GT4) have been released.

You can download the Grimoires releases from http://sourceforge.net/projects/grimoires/.

Continue reading "Grimoires Service Registry" »

October 25, 2006

Perspectives on Open Source

The topic of open source arose frequently at the recent GlobusWORLD conference. I find the variety of perspectives on this topic fascinating. I have heard various people opine that:

  1. Open source is diabolical, because it discourages innovation and/or is risky from a legal perspective. Shai Agassi of SAP expressed such views in a much-reported 2005 speech.
  2. Open source is angelic, because it ensures that "speech" (or at least coding) is "free." Richard Stallman is a well-known proponent of this view.
  3. Open source is inevitable, for economic reasons, and, as such, should be embraced as part of the IT ecosystem.

The first two views are familiar; the third is newer, and I think far more interesting, as it permits (at least in principle) a quantitative discussion about when and where it makes sense for software to be open vs. closed

Underlying this third view is an evolving perspective on where value lies in software. For a long time, value was seen in the basic software itself, viewed as intellectual property. Now, the basic software is increasingly seen as a commodity. Of course, companies still need to ensure that the software functions on a daily basis, and they typically don't want to maintain the necessary expertise inhouse. Thus, as Gartner wrote recently:

"open source software is a catalyst that will restructure the industry, producing higher-quality software at lower cost ... it will revolutionize software markets by moving revenue streams to services and support and away from license fees."

Vendors larger and small are taking major positions on these views, betting their future on proprietary software (e.g., SAP, Microsoft), open source software (e.g., RedHat, Novell), or both (e.g., IBM, Oracle). It's a fascinating evolution.

Does this mean that IT itself is a commodity? Not exactly. I was talking to Reagan Moore last week, and he expressed the view that value is increasingly in the (proprietary) policies that govern how (open) software is used. That's a perspective that resonates with my experience.

October 24, 2006

Globus Takes Off in Europe

Google analytics has been providing some nice perspectives on Globus downloads. It is an inspiring reminder of how small a planet we live on: every continent except Antartica shows significant activity.

One thing that I find gratifying is how the release of Globus Toolkit version 4 (GT4) has spurred rapid growth in the Globus user and developer communities in Europe.

Major new deployments have emerged, such as D-Grid, and long-time deployments such as the U.K. National Grid Service and the LHC Computing Grid use many Globus components. There are also numerous neat applications: things like the Gridcast video delivery (for the BBC) and a weather forecasting site at the University of Naples. In the European Union's new BEinGRID project, focused on commercial applications of grid technology, five out of 18 pilots are based on GT4. In addition, OMII Europe has identified Globus as a target platform.

In addition, we see contributions not only from long-time Globus community members such as NeSC and EPCC (source of the OGSA-DAI data access and integration software) but also from new participants via the "dev.globus" incubation process. One nice example is the GridWay scheduler system from Madrid.

There are still some Europeans who think parochially of Globus as "U.S." software, but (if I may make a bad pun) more and more seem to realize it is "us" software, just like Linux, Apache, and other open source systems: an international community effort to develop and support the software, experiences, and operational procedures needed to create Grid systems and applications.

As you can tell, I love to hear about interesting Globus deployments, applications, and research. Please drop me a line to tell me about what you are doing (whether in Europe or otherwise!).

October 23, 2006

Irving, Lick, and Man-Computer Symbiosis

I have written about the (utopian or distopian?) belief that the inevitable march of Moore's Law will result in computers overtaking us stick-in-the-mud humans in intelligence within a decade or two.

I'm a skeptic, not because I don't think computers are going to get faster and more capable, but because I think human intelligence has a fair bit more evolution to do itself. Human "intelligence" has long been more than simple biology: biology, culture, and technology have been co-evolving for a very long time, and there's no reason to think that culture and technology, at least, won't continue. (Perhaps biology, too, but that's another story.)

In this regard, I find a recent post by Irving Wladawsky-Berger refreshing. He writes about (among other things) how the goal of technology should not just be to automate the easy but also to assist people in doing the hard. To that end, we should be working not to remove people from the picture, but working to "integrate people into all aspects of our systems designs."

These sentiments remind me of J.C.R Licklider's wonderful 1960 paper, "Man Computer Symbiosis," in which he proposed "to enable men and computers to cooperate in making decisions and controlling complex situations without inflexible dependence on predetermined programs." (Of course, Doug Engelbart is always worth reading on these topics too.)

Some 50 years ago, Licklider studied his work habits, and noted that "my choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability." I suspect that this observation is still far more true than we would like to believe.

But while the problems and ideas may not be entirely new, we are in a far better position to pursue them, given quasi-ubiquitous personal computers, Internet, and innovative new technologies that build on those platforms. As Irving says, "By integrating people into our system designs, we can leverage these community-based, people-oriented technologies [like advanced collaboration environments] into our complex engineering systems ..." The consequences for the many complex activities that occupy our time nowadays could be very significant.

October 22, 2006

1,000,000 Random Digits

From Bruce Schneier's Cryptogram:

The Rand Corporation published A Million Random Digits with 100,000 Normal Deviates back in 1955, when generating random numbers was hard. I have a copy of the original book; it's one of my library's prize possessions. I had no idea that the book was reprinted in 2002; it's available on Amazon. But even if you don't buy it, go to the Amazon page and read the user reviews. They're hysterical.
For example:

This book does not even come close to delivering on its promise of one million random digits. My expectations were high after reading the first sentence, which contained ten unique digits. However, the author seems to have exhasted his creativity in this initial burst, because the other 99.999% of the book is filler in which those same ten digits are shamelessly reused! If you are looking for a larger offering of numerals in various bases, I highly recommend "Peter Rabbit's ABC and 123."

Thanks to Frank Siebenlist for the pointer.

October 21, 2006

GridFTP is 100x Faster

It's not often that you get to speed something up by a factor of 100: more often, we are working hard to get a 10% improvement. But my colleague John Bresnahan recently achieved that happy result with Globus GridFTP, the Grid data transfer workhorse.

The Globus implementation of the GridFTP protocol has always been fast for large files, achieving in some cases close to 30 gigabit/s over wide area networks.  However, when data is partitioned into small files, GridFTP has historically suffered from low transfer rates due to the rounddtrip latency involved in successive transfer requests.

John and other members of the GridFTP team designed pipelining to solve this "lots of small files" (LOSF) problem. They modified GridFTP to allow many transfer requests to be outstanding at once. Thus, latency between requests is hidden in the time it takes to transfer previous files: by the time one file has completed, the next request is queued up in the server ready to start.

John finally had time to write a client that takes advantage of this. A set of graphs show the performance improvement, which for "small" (10 kilobyte to 10 megabyte) files can be enormous.

John is now integrating these pipelining techniques into GridFTP clients, in particular RFT. Let us know if you're interested in trying this.

October 19, 2006

Experimenting with Networks

It is well known that naturally occuring networks can have different structures: for example, every node may be connected to a fixed number of neighbors, or additional connections to "distant" nodes may create a "small-world" structure, or some nodes may be connected to far more nodes than others, as in  a "scale-free" or power-law structure (see figure).060807networks2180

Network structure is presumed to be important in determining behavior: for example, how fast information propagates and ability to evolve. But how do you do controlled experiments on the properties of different structures? The problem is that the networks like the Internet and ecological networks can't be easily changed.

In a beautiful series of experiments, Michael Kearns, Siddharth Suri, and Nick Montfort took a class of undergraduate students and asked them to solve various graph coloring problem. In each experiment, each of a set of students was connected (by computer) with other students in some network, and was then asked to select a color for his/her node that was different from those of his/her neighbors. This process continued until the graph coloring problem was "solved," meaning that a consistent set of colors was assigned.

The authors found that the time to solution varied greatly according to both the network structure and the amount of information provided about neighbors. The abstract:

Theoretical work suggests that structural properties of naturally occurring networks are important in shaping behavior and dynamics. However, the relationships between structure and behavior are difficult to establish through empirical studies, because the networks in such studies are typically fixed. We studied networks of human subjects attempting to solve the graph or network coloring problem, which models settings in which it is desirable to distinguish one's behavior from that of one's network neighbors. Networks generated by preferential attachment [i.e., scale-free] made solving the coloring problem more difficult than did networks based on cyclical structures, and "small worlds" networks were easier still. We also showed that providing more information can have opposite effects on performance, depending on network structure.

October 18, 2006

MEDICUS: Globus-powered medical imaging and computing

A really nice Globus-based application: MEDICUS (Medical Imaging and Computing for Unified Information Sharing). This software federates medical imaging and computing resources for clinical and research applications. Quoting the authors:

The objectives of the MEDICUS project are to promote transparent and non-proprietary solutions for medical image processing, and medical image and data sharing between heath care providers, physicians, and researchers in the life sciences. The Globus Toolkit provides the necessary architecture platform and standards to engage in this diverse and difficult field. As such it provides a vendor independent solution to efficiently communicate medical images and image outcome at various levels in the healthcare enterprise.180pxdicom_mr

Why I like it:

  1. It's the real deal--it already provides access to hundreds of thousands of images at some 40 sites across the U.S., including 27 from the Children's Oncology Group, allowing users to publish, search for, access, and process images in powerful ways.
  2. It is a lovely example of how Globus services (security, data, computing, etc.) can be used to develop secure, robust, and efficient distributed systems quickly and easily. (For details, see this slide set or the various papers on the MEDICUS web site.)
  3. It's been set up as an incubator on dev.globus, facilitating community access and contributions.

The responsible parties include Stephan Erberich, Ann Chervenak, Carl Kesselman, Manasee Bhandekar, and Marvin Nelson.

From the incubator web page, a list of features:

  • Vendor independent Grid Enterprise PACS (Picture Archiving and Communication System) deployment by vertically integrating Globus Toolkit Data services
  • DICOM legacy support to connect medical image modalities (MR, CT, X-Ray, etc.) to data Grids
  • Open-source image archiving and distributed warehousing for large-scale multi-center clinical trials
  • Medical image guided drug discovery in combination with remote processing (GRAM service)
  • Flexible fault-tolerant off-site DICOM Grid image storage for man-made or natural disaster recovery
  • Cost-efficient tele-radiology and image workflow between community care facilities and remote specialist using the Grid

October 17, 2006

More Utility Grid Wierdness

It seems that every day turns up some quirky new development in the rapidly developing utility grid market: sometimes even two.

Today, Sun announced "Project Blackbox", a shipping container filled with up to 240 rack-mounted Opterons (or Sparcs) and 1.4 Petabytes of storage.  The compute-hungry user leases or buys the box, plugs in a couple of fire hoses for cooling and a 500 kilowatt power cord, and it's up and running in five minutes. All this apparently designed by Danny Hillis, of all people. (Wasn't Google supposed to be doing this?)

This development emphasizes how hard it is to distinguish "insourcing" and "outsourcing." I need more computing power: do I buy new servers, lease a Blackbox from Sun, or rent time on Amazon EC2?  All three approaches can provide the same "power," but differ greatly in setup time, costs, quality of service, and flexibility.

And in a case of art imitating life (?), the world's largest grid facility operator, Google, announced that it will cover the roofs of its Mountain View offices with solar cells, a move that will provide 30% of the electricity used in those offices--and might sometimes result in Google selling power to that other grid, the old-fashioned electric one.

October 16, 2006

GridFTP (and GRAM) clients for .NET

Having mentioned GridFTP integration with Firefox, I should also point out the very nice work done at Virginia by Marty Humphrey. Glenn Wasson, and colleagues on .NET clients (and services) for GridFTP and GRAM. The clients interoperate with the GT4 GridFTP and GRAM services.

Thus, Globus users wanting a GRAM job submission client on Windows have a few options:

As do users wanting a GridFTP client on Windows:

  • Java RFT libraries and a Java command line program (rft), via the GT4 distribution.
  • Java libraries (GridFTPClient.java) and a Java command line program( globus-url-copy) that are part of the Java CoG distribution.
  • The .NET GUI from Virginia (requires .NET framework and WSRF.NET).

There are probably others, too, that I don't know about. (Please contact me if you know of any.)

October 15, 2006

The Early Steam Internet

Wandering the farther reaches of the net, I came across the Institute of Internet History (IOIH). Having learned recently of Otlet's early work on hypertext, I should perhaps not have been surprised to learn of this little-known precursor to today's Internet, namely, Beardie's pioneering work in the 1840s on a steam-powered Internet.

As the IOIH recounts, "[T]oday's Internet has its roots in the huge silk and cotton mills which grew up in the 19th century during the Industrial Revolution." Facing the need to control a growing number of industrial mills, Aldous B. Huxley proposed:Ehbeardie_1

to record a single copy of each pattern on a stack of perforated rotating metal disks and then distribute the information contained on these disks via a series of pressure pulses through a system of steam pipes to each loom.

Then, in 1847, a brilliant engineer, E. H. Beardie (pictured):

presented a paper titled, "An International Industrial Network of Steam Gulleys and Mechanical Actuators" to the Royal Society of Industrialists. The paper described in some detail Beardie's vision for the phased building of a wide area network connecting mills.

This paper has apparently been lost, but the rest is history.

October 14, 2006

What do Cars and Software Have in Common?

An article by Damian Smith has some interesting things to say about service oriented architecture (SOA). He first compares the software industry today with the automobile industry in the 1980s: a few major players, all massively vertically integrated, little customer choice.

Then he notes that in the automobile industry, competitive pressures led to the definition of common platforms, disaggregation, offshoring, etc.--basically a move to a horizontally stratified market, in which (counterintuitively?) vendors differentiate by how they put together standard pieces:

Although components continue to be manufactured offshore by a wide range of component suppliers, the cars themselves are assembled onshore, close to the consumer, where they can be customized to their desires and needs.

He then argues that:

Over the next five to 10 years, SOA will facilitate developments in the software industry similar to those that have taken place in the auto industry.

Although services will predominantly be developed offshore, applications will be assembled onshore where they can be customized to client needs. Services will come from a variety of sources, including major software vendors, open source developers, and offshore niche vendors. If a suitable existing service is not available, new services will be home grown using custom development (SODA) technologies, Business Process Management (BPM) and/or Business Process Execution Language (BPEL) tools.

Applications and services will be deployed on both public and private open platforms. Organizations will provide private service platforms within their firewalls, probably using network devices, and will deploy services and assemble applications via those platforms. Public service platforms will be provided over the Internet and applications will be assembled and deployed using open source, home grown, and micro-charged Software as a Service (SaaS) offerings.

As integration will no longer be a barrier, assembled applications will be very specific to organizations’ needs and desires. In effect, we will be back to best-of-breed, but at the service level rather than the application level. As a result, all applications will be ‘custom’ to some degree and services will be added, removed, and replaced as business needs change—think plug and play concepts applied to applications.

He also has some interesting things to say about how this transformation is going to be achieved:

A cultural change to create and use reusable services will have to be facilitated. More formal methodology and tighter management and governance will have to be adopted. Carrots and sticks will need to be created to encourage and enforce reuse, and rules and guidelines regarding service ownership, sharing, and accountability will need to be developed.

We've been working for several years with Web Services in the Globus team, and overall this has been a positive experience. We're now starting to gain experience with service outsourcing and composition (e.g., with BPEL in caBIG).

October 13, 2006

Integrate GridFTP with Firefox browser

Some neat (and useful) work from a talented team at UTEP and SDSC: an integration of the GridFTP protocol into the Firefox browser (to be released in final form later this year, I believe), and a description of how this was done.

The GridFTP protocol extends the standard File Transfer Protocol (FTP) with various useful features such as Grid Security Infrastructure (GSI) security, increased reliability via restart markers, high-performance data transfer using striping and parallel streams, and support for third-party transfer between GridFTP servers. The Globus Toolkit provides one of several interoperable GridFTP implementations. GridFTP is used for all sorts of interesting things: e.g., to download Earth System Grid data, and to stage data to nodes on the Open Science Grid for exexecution.

The UTEP/SDSC work make it straightforward to use Firefox to download data from remote GridFTP (and FTP) servers.

October 12, 2006

Experiences with Cyberinfrastructure

A meeting with social scientists led me to wonder: what "stories" should we be telling people embarking on cyberinfrastructure projects, to help them avoid mistakes and achieve success? A good story is presumably something subtle and clever: it entertains, and simultaneously conveys a message--but at a subconcious level.

 A new report posted on the National Science Foundation (NSF)'s Office of Cyberinfrastructure web site, Cyberenvironment Project Management: Lessons Learned, summarizes lessons learned from the Network for Earthquake Engineering Simulation (NEES) project, a major NSF program to both upgrade equipment used for civil engineering research and to connect that equipment into a national "collaboratory." NEES was one of the first major projects of this sort, and it was certainly full of learning experiences. In my somewhat informed and certainly biaised view (I was involved in NEES, and contributed to the report), the report captures some useful wisdom. Reading it, you may well think "management 101," and it is--but there are also some subtle points made.

As useful it is, Lessons Learned doesn't tell stories, and perhaps it should. I might start with these:

  • At the first big meeting, all the civil engineers wore jacket and tie (yes, they were all men). None of the IT team wore a tie (or a jacket, I think), and many were wearing jeans.
  • The lights that went on, among even skeptical engineers, when the IT team first brought "Mini-MOST" (Multi-Site Online Simulation Test)--a simple and portable shake table--to a meeting, and demonstrated teleoperation and telepresence.

Both need some development, but perhaps there is some raw material for the Cyberinfrastructure Telenovella?

By the way, I note that there is now also a New Zealand NEES and a UK NEES. It's a great field for international cooperation.

October 11, 2006

IBM has a secret island headquarters hideaway inside a computer game

Sometime soon, I hope to say something intelligent about Second Life and what it means for numerous things, including science and innovation. First, though, I need to get a computer with a graphics card modern enough to run it myself.

In the meantime, you can read some interesting comments by Charlie Stross (who coined the title for this post, and fears his science fiction magnus opus is being overtaken by events) and Irving Wladawsky-Berger, the very smart IBM executive who has a history of getting IBM into wacky things like Linux, parallel computing, and ... well, now it seems, Second Life. If you prefer traditional dead-tree media, the October 2006 issue of Wired has a nice article.

October 10, 2006

Google Analytics

Map_2If you haven't come across Google analytics, it's not to be missed. You register with www.google.com/analytics, they give you a Javascript snippet that you include in your Web pages, and then you check in periodically with your Google Analytics account to learn who is accessing which parts of your Web site.

For example, the figure shows the geographical distribution of accesses to the Globus web site over a few days in early October. I am immediately intrigued to see that 18% of all accesses during that period came from Bangalore! And why wasn't anyone in New Zealand doing anything?

October 09, 2006

Writing Distributed Programs: Why Not Message Passing?

The question of how to write programs for distributed or "grid" environments has stimulated much debate. Some argue that this new environment demands new programming models and languages--and there is certainly merit in that view. However, we can also reuse well-understood models. For example, we can use the Message Passing Interface (MPI) standard to write message passing programs.

The MPI standard defines an API for sending and receiving messages, in both point-to-point and collective modes, and for such things as dynamic process creation. MPI is sometimes criticized as a low-level "assembly language," but it is more accurate to describe it as an abstract but precise notation for describing data exchanges among concurrently executing processes.Mpichg2

To run message passing programs on grids, consider MPICH-G2 (see paper), a grid-enabled  MPI implementation developed by Nick Karonis and his colleagues. MPICH-G2 allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. It extends the Argonne MPICH implementation of MPI to use Globus services for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, network topology. Thus, the user can variously ignore or exploit knowledge of critical aspects of the heterogeneous environment.

MPICH-G2 has been used to run scientifically important applications. One I like is a high-resolution study of blood flow in the human body: highly coupled 3-D simulations of blood flow in critical areas are placed on distinct clusters, and those simulations are coupled via a 1-D simulation of flow through the arterial system.

MPICH-G2 doesn't do everything: for example, it is not particularly fault tolerant. But if you want to run a program fast on a set of distributed computers (on a LAN, MAN, or WAN), and are prepared to accept failure of one component resulting in failure of the whole (as is often desirable, in fact), it's a powerful tool.

For more information, see: N.T. Karonis, B. Toonen, and I. Foster, "MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface," J. Parallel and Distributed Computing, vol. 63, no. 5, 2003, pp. 551–563. There are also a number of application papers available.

October 08, 2006

What do the US and Turkey have in common?

7651med_1

Hmmm ... both beautiful countries, full of wonderfully friendly people ... both have a somewhat problematic relationship with Europe ... but what else?

Well, I was surprised to read in an article in Science that the US and Turkey have (among 34 countries surveyed) the lowest percentage of citizens that accept as true that "Human beings, as we know them, developed from earlier species of animals." (See figure.) Canada and Russia weren't surveyed, but the rest of the G8 were in the top 11.

I also recall reading that the teaching of evolution is unpopular in theocratic Iran. So, while Iran wasn't surveyed either, it might well be down near the bottom as well.

I find these data fascinating. The life of a scientist in the US or Turkey is not curtailed as in Iran, but there is some commonality of perspective. An "Axis of AntiEvolution"?

October 07, 2006

Nicholas Carr, Amazon Web Services, and Globus

The massive infrastructure investments being made by companies like Google, Amazon, eBay, and Microsoft are having interesting consequences. First, we get 500,000 computers at Google indexing the Web for us, for free. Then, via Amazon Web Services, we get on-demand access to storage, computing, and (most recently) message delivery, all via simple Web service interfaces--not for free, but at a relatively low cost.

Nicholas "IT Doesn't Matter" Carr reported recently on a speech and an interview by Jeff Bezos on Amazon's forays into Web services. Carr's posting, and subsequent comments, raise some interesting questions about the economics of IT as a utility. If Amazon's current offering takes advantage of the fact that its computers are often idle, what happens as demand increases? When is it better to outsource vs. insource? We don't understand such issues yet, but I can't help suspecting that that crazy "grid" idea is becoming very real.

Carr perhaps views Amazon Web Services as support for his view that IT has become a commodity. But grid utilities providing computing, storage, and other basic functions seem likely rather to spur explosive innovation in IT applications. What new applications we will see? (Those described for Amazon seem, so far, rather dull: e.g., backup and picture archiving.) Will those new applications demand new capabilities from utilities, spurring differentiation? And when will people want to deploy their own implementations of these services, rather than trusting to others to provide them?

Our Globus software provides grid utility services such as  GRAM (on-demand service deployment), Workspaces (virtual machines), and GridFTP (storage). These services have interfaces richer than those of similar Amazon Web Services, and the additional capabilities have proved important when deploying those services at remote locations and when using them to implement higher-level services such as policy-driven data delivery (e.g., DRS) and distributed computing (e.g., VDS). However, I believe that some fairly simple refactoring can allow those same higher-level services to drive operations on utilities provided by the likes of Amazon. That will make it feasible for users to mix outsourced and insourced IT functions in interesting ways.

October 06, 2006

Grid Fighting Cancer

The National Institutes of Health Cancer Biomedical Informatics Grid (caBIG) is one of the most exciting Grid deployments out there. There's a nice NIH Web site with background information. Quoting that site:

The National Cancer Institute (NCI) has launched the caBIG™ (cancer Biomedical Informatics Grid™) initiative to speed research discoveries and improve patient outcomes by linking researchers, physicians, and patients throughout the cancer community. caBIG™ is a voluntary network of infrastructure, tools, and ideas that enables the collection, analysis, and sharing of data and knowledge along the entire research pathway from laboratory bench to patient bedside.

I like caBIG for two reasons. The first is my family history of cancer )-:. The second is that they are one of the most ambitious and successful users of Globus software that I know. There are more an 800 people working on 70 projects within caBIG, and every caBIG Web Service is built on Globus technology. The caBIG software distribution uses just about every Globus component. In addition, caBIG has developed some nice new functionality to Globus, including:

I'm as much a fan of the search for the Higg's Boson as anyone, but there is something to be said about finding a cure for cancer!

Title_connection

October 05, 2006

Quantifying the Benefits of Cyberinfrastructure

We need to find a way of quantifying the benefits of "cyberinfrastructure"--the technology that underpins and enables eScience. We need this information if we are to justify spending on infrastructure (or not), decide what infrastructure to build, and understand how to improve the infrastructures that we have.

But quantifying benefits is hard.

An anecdote: In building the Globus-based Earth System Grid (ESG: see the picture for participating sites) we put a lot of effort into instrumentation and quantifying usage. Thus we can know that our more than 3000 registered users have downloaded more than 100 Terabytes of climate simulation data. Yet this data does not provide any real insight into whether the people downloading that data found it useful--or did anything useful with it.Usmap_1 We did survey users, and got useful information, but response rates were low.

Fortunately, one of the two data collections made accessible via ESG was the International Panel on Climate Change (IPCC) assessment simulation data, and the IPCC team was able to document that over 300 scientific papers had been produced [by early 2006] from data downloaded from ESG.

However, we can't always get such nice data. Thus, we may ask: What metrics are important? What data do we need? What is feasible to get? How do we get it? What can it tell us (and what not)?

I think we need to learn how to build infrastructures that can collect this sort of information automatically. We should involve social scientists in designing such systems and in assessing their effectiveness.

October 04, 2006

Argonne Named Postdoctoral Fellowship Program: Deadline Oct 13

My employer, Argonne National Laboratory, has a really nice postdoctoral fellowship program awarded internationally on an annual basis to outstanding doctoral scientists and engineers "at early points in promising careers." The pay isn't bad ($72,000 plus $20,000 for travel and equipment), but the opportunity to work in the Mathematics and Computer Science Division and Computation Institute is really something, in my humble opinion. And you get a fancy title. Perhaps "Metropolis Postdoctoral Fellow" is most appropriate for a computer scientist: he founded the Institute for Computer Research at Chicago in 1957.

Applications are due by October 13. See http://www.anl.gov/Careers/namedpostdocs.html.

October 03, 2006

Who Invented Hypertext?

It's always fun to find that ideas we think are unique to our generation are in fact far older. For example, who invented hypertext?

Many might assert that it was Tim Berners-Lee, with his invention of the Web (1988). But while Sir Tim did (and continues to do) many wonderful things, the idea of hypertext greatly predates the Web.

Other common replies, at least among technologists, might be Ted Nelson, who in his book Literary Machines (1983) and his ambitious but ultimately unsuccessful Xanadu system pioneered many relevant ideas, and Doug Engelbart, who pioneered hypertext and many other things besides.

Historians of science are likely to cite Vannevar Bush's As We May Think (1945), which is notable as a description of a hypertext system that (essentially) predated computers, and influenced Nelson and Engelbart.

There are other precursors, but (getting to the punchline), I learned at a recent workshop of the work of the Belgian Paul Otlet, who from 1895 onwards described and built systems that (using cards, not computers) introduced ideas that (now quoting Wikipedia):Otlet

prefigured what ultimately became the World Wide Web. His vision of a great network of knowledge was centered on documents and included the notions of hyperlinks, search engines, remote access, and social networks. (Obviously these notions were described by different names.)

If he's in Wikipedia, he can't be that obscure (can he?), but this was all news to me.

If you want to learn more, there's a biography, written by W. Boyd Rayward, then at the University of Chicago. But his primary works remain untranslated.

October 02, 2006

History and Theory of Infrastructure

I'm just back from a workshop on "History and Theory of Infrastructure: Lessons for New Scientific Infrastructure" in Ann Arbor, Michigan, which brought together a fascinating group of social scientists and others to discuss "what practical lessons can the history, sociology, and experience of existing infrastructures offer to the imagination, implementation, and governance of cyberinfrastructure."

One delightful aspect of the meeting was meeting wonderful scholars that I had known previously only by reputation, such as Geoff Bowker, Leigh Star, Paul Duguid, and Christine Borgman, as well as some I already knew, such as Tom Finholt, Bob Kahn, Dan Atkins, and Bill Dutton, and others that I was glad to get to know.

There were many fascinating and wide-ranging discussions. My impressions:

  • Social scientists (or at least those at the University of Michigan's School of Information) organize great meetings. The organizers had clearly put a lot of thought into how to structure the meeting to ensure useful discussion, and they also had excellent social events!
  • The mode of discussion was quite different from I expected. There were no formal presentations and little analysis, but many compelling anecdotes. At first, I found this strange, but then realized that "stories" are a compelling way  of conveying insights. That got me thinking: what "stories" should we be telling people embarking on cyberinfrastructure projects, to help them avoid mistakes and achieve success?
  • Another thought that seemed interesting, at least to me: How about designing cyberinfrastructure to collect the information that social scientists require to evaluate its utility? Large systems like TeraGrid, Open Science Grid, Earth System Grid, caBIG, or GEON, and also smaller systems, could be viewed as experimental apparatus for social scientists. What instrumentation should we include in them to that end?

Overall, I didn't come away convinced that the history of existing infrastructures can help those building cyberinfrastructure: railroads and networks are very different thing. But I became yet more convinced that social scientists have a lot to contribute to our understanding of how science and its tools will, and should, evolve in the 21st Century.

October 01, 2006

Rapture for Nerds

I have long been fascinated by apocalyptic and millennial thinking: belief systems in which the world is about to be changed in some fundamental way by a transformative event of an esoteric nature. Typically:

  • The transformation will usher in an era of prosperity, peace, and immortality.
  • Only a select few will get to participate.
  • The transformation will occur within a small number of years: certainly within the lifetime of those involved, and often on a specific date.

During human history we find hundreds of examples of groups who have believed that they possessed information regarding such an imminent transformation. The reccurence of this idea surely tells us something profound about the human spirit.

I was reminded of this topic by "Radical Evolution" Joel Garreau's interesting book about potential futures. The book presents the views of those who predict a potential "singularity": a time at which, due to continued exponential growth in computer power, we obtain computers able to design yet more powerful computers, and thus enter into an era of essentially infinitely rapid change in technological capability. These developments also enable superhuman intelligence, medical advances, thus eternal life, etc., etc.--but only for those prepared to take advantage of these advances.

I've always found the similarities between the "singularity" and millennial ideas intriguing. Others have apparently thought the same, and furthermore coined the beautiful put down "Rapture for Nerds." Now of course either the singularity or the rapture (or both) may turn out to be quite real, but the similarities between the two concepts is certainly cause for thought.