My Photo

January 08, 2008

There's Grid in them thar Clouds*

You’ve probably seen the recent flurry of news concerning “Cloud computing.” Business Week had a long article on it (with an amusing and pointed critique here). Nick Carr has even written a book about it. So what is it about, what is new, and what does it mean for information technology?

The basic idea seems to be that in the future, we won’t compute on local computers, we will compute in centralized facilities operated by third-party compute and storage utilities. To which I say, Hallelujah, assuming that it means no more shrink-wrapped software to unwrap and install.

Needless to say, this is not a new idea. In fact, back in 1960, computing pioneer John McCarthy predicted that “computation may someday be organized as a public utility”—and went on to speculate how this might occur.

In the mid 1990s, the term grid was coined to describe technologies that would allow consumers to obtain computing power on demand. I and others posited that by standardizing the protocols used to request computing power, we could spur the creation of a computing grid, analogous in form and utility to the electric power grid. Researchers subsequently developed these ideas in many exciting ways, producing for example large-scale federated systems (TeraGrid, Open Science Grid, caBIG, EGEE, Earth System Grid, …) that provide not just computing power, but also data and software, on demand. Standards organizations (e.g., OGF, OASIS) defined relevant standards. More prosaically, the term was also co-opted by industry as a marketing term for clusters. But no viable commercial grid computing providers emerged, at least not until recently.

So is “cloud computing” just a new name for grid? In information technology, where technology scales by an order of magnitude, and in the process reinvents itself, every five years, there is no straightforward answer to such questions.

Yes: the vision is the same—to reduce the cost of computing, increase reliability, and increase flexibility by transforming computers from something that we buy and operate ourselves to something that is operated by a third party.

But no: things are different now than they were 10 years ago. We have a new need to analyze massive data, thus motivating greatly increased demand for computing. Having realized the benefits of moving from mainframes to commodity clusters, we find that those clusters are darn expensive to operate. We have low-cost virtualization. And, above all, we have multiple billions of dollars being spent by the likes of Amazon, Google, and Microsoft to create real commercial grids containing hundreds of thousands of computers. The prospect of needing only a credit card to get on-demand access to 100,000+ computers in tens of data centers distributed throughout the world—resources that be applied to problems with massive, potentially distributed data, is exciting! So we’re operating at a different scale, and operating at these new, more massive scales can demand fundamentally different approaches to tackling problems. It also enables—indeed is often only applicable to—entirely new problems.

Nevertheless, yes: the problems are mostly the same in cloud and grid. There is a common need to be able to manage large facilities; to define methods by which consumers discover, request, and use resources provided by the central facilities; and to implement the often highly parallel computations that execute on those resources. Details differ, but the two communities are struggling with many of the same issues.

Unfortunately, at least to date, the methods used to achieve these goals in today’s commercial clouds have not been open and general purpose, but instead been mostly proprietary and specialized for the specific internal uses (e.g., large-scale data analysis) of the companies that developed them. The idea that we might want to enable interoperability between providers (as in the electric power grid) has not yet surfaced. Grid technologies and protocols speak precisely to these issues, and should be considered.

A final point of commonality: we seem to be seeing the same marketing. The first “cloud computing clusters”—remarkably similar to the “grid clusters” of a few years ago—are appearing. Perhaps Oracle 11c is on the horizon?

What does the future hold? I will hazard a few predictions, based on my belief that the economics of computing will look more and more like those of energy. Neither the energy nor the computing grids of tomorrow will look like yesterday’s electric power grid. Both will move towards a mix of microproduction and large utilities, with increasing numbers of small-scale producers (wind, solar, biomass, etc., for energy; for computing, local clusters and embedded processors—in shoes and walls?) co-existing with large-scale regional producers, and load being distributed among them dynamically. Yes, I know that computing isn’t really like electricity, but I do believe that we will nevertheless see parallel evolution, driven by similar forces.

In building this distributed “cloud” or “grid” (“groud”?), we will need to support on-demand provisioning and configuration of integrated “virtual systems” providing the precise capabilities needed by an end-user. We will need to define protocols that allow users and service providers to discover and hand off demands to other providers, to monitor and manage their reservations, and arrange payment. We will need tools for managing both the underlying resources and the resulting distributed computations. We will need the centralized scale of today’s cloud utilities, and the distribution and interoperability of today’s grid facilities.

Some of the required protocols and tools will come from the smart people at Amazon and Google. Others will come from the smart people working on grid. Others will come from those creating whatever we call this stuff after grid and cloud. It will be interesting to see to what extent these different communities manage to find common cause, or instead proceed along parallel paths.

*An obscure cultural reference: the phrase “There’s gold in them thar hills” was first uttered, according to some, by an old prospector in the 1948 movie “Treasure of the Sierra Madre”, starring Humphrey Bogart.

December 14, 2007

Celebrating Licklider: human-computer symbiosis 50 years on

Images I recently completed an article for the HPC conference held in Cetraro, Italy, in 2006. (Ok, I was a little late.) I took the opportunity to talk about how a topic that I find fascinating, namely the vision and legacy of JCR Licklider. The abstract:

Licklider advocated in 1960 the construction of computers capable of working symbiotically with humans to address problems not easily addressed by humans working alone. Since that time, many of the advances that he envisioned have been achieved, yet the time spent by human problem solvers in mundane activities remains large. I propose here four areas in which improved tools can further advance the goal of enhancing human intellect: services, provenance, knowledge communities, and automation of problem-solving protocols.

Needless to say, I could hardly do justice to such a grand topic, but perhaps my comments will spur some interesting thoughts and responses.

December 13, 2007

Red Shift Meets Event Horizon

Sun's CTO Greg Papadopoulos coined the term "red shift" to denote the massive IT buildout that is occurring in the likes of Google, Amazon, eBay, and the like as they provide ever-more-sophisticated services to ever-larger numbers of customers. He posits that this trend will continue, ultimately resulting in a "neutron star collapse of datacenters" to a small number of massive, centralized, highly efficient providers.It's a bizarre choice of term--doesn't a bigger red shift mean that the star we are seeing is further away and thus older (and probably already extinct)?--but it's certainly a compelling analysis.

I liked an article by Phil Wainewright (riffing on an article by Dan Farber about red shift theory) about the pros and cons of ultra-large data centers. As others have commented to me in the past, having all of your data and computing in a single location is not necessarily a good idea.

December 10, 2007

Condor will be open source?

An interesting article on Red Hat's plans for its "Enterprise MRG" distribution. MRG stands for Messaging (an implementation of the Advanced Message Queuing Protocol), Real-time (real-time Linux kernel), and Grid (which for Red Hat, means virtualization support and Condor support for application deployment).

And for those who have always wanted to see the supposedly open source, but never really accessible Condor software, some good news:

As part of the agreement between Red Hat and the University of Wisconsin, the Condor software will seek an OSI-compliant software license that will allow the code to be distributed as part of a RHEL stack.

December 04, 2007

MapReduce info

Rick Stevens pointed me at this post describing Yahoo!'s work with Hadoop, the open source implementation of MapReduce. Of course, that is old news, but some navigation led me to this nice MapReduce cookbook for machine learning.

Many many years ago I wrote a book on parallel programming. My initial plan was to show how just about everything could be expressed using map reduce, but I couldn't quite see how to package things in a way  way that made sense. (Probably I wanted to do it all via language constructs.) So instead I ended up covering a lot of other material instead. It's neat to see how with the right abstractions things turn out so nicely.

November 15, 2007

Earth to be paradise; distance to lose enchantment

I recently had the occasion to present a talk on "virtual environments and knowledge production." This got me thinking about just how virtual environments can help with the important tasks of enhancing both individual and collective human creativity. My talk tries to highlight some of what has been accomplished and some of the challenges that lie ahead. (However, as I continue to strive to remove words from slides [recalling Beckett's admonition that "every word is an unnecessary stain on silence and nothingness"], the slides themselves may not make much sense!) I mentioned the following as some of issues that must be addressed for virtual worlds (a subset of virtual environments) to be useful for "knowledge production":

  • Integration with the physical world, e.g., sensors and instrumentation
  • Integration with the rest of the cyberworld
  • Integration with simulation
  • Security and trust, in their many forms
  • Abstractions, metaphors, interfaces
  • Scale (data volumes, simulation fidelity)

During my reading, I came across a nice article by Gary Olson and Judith Olson, Distance Matters, which included a delightful quote from Arthur Mee:

If, as it is said to be not unlikely in the near future, the principle of sight is applied to the telephone as well as that of sound, earth will be in truth a paradise, and  distance will lose its enchantment by being abolished altogether.

(Arthur Mee was best known for his Children's Encylopedia, but here he is writing in 1898 about the impact of the telephone.)

November 12, 2007

Relative debugging with Guard

Fig4aWhen modifying software, or ported it from one system to another, one often ends up checking for correctness (and attempting to diagnose errors) by comparing the execution of a new program with a reference version that is known to work. My Australian friend and colleague David Abramson developed a tool called Guard to automate this process. With Guard, you specify the variables you want to monitor, and when, and then fire up the new and reference versions of your software. Guard monitors the specified variables, and notifies you when their values differ.

John Michalakes
and I had the occasion to work with  Guard back in 1996 when we applied it in a project developing a parallel mesoscale weather model, MM5. It was spookily wonderful to see differences between the parallel and sequential implementations become visible in real-time in a 3-D visualization. (The figure shows a 2-D plot of differences, which is also useful but less beautiful.) It also then became extremely easy to fix those problems. We wrote a paper together on this work, which won a best paper award at SC'96, due I think to David's presentation skills--and the videos.)

This technology has now been licensed by Cray for use in their new Cascade program. It's exciting to see the technology making its way into mainstream use.

October 23, 2007

The end of Grid computing?

Guy Tel-Zur announces the end of grid computing, based on Google trends data for "grid computing" vs. "virtualization."

The data are fun, but I'm not convinced. Certainly the Google Trends data captures the fact that "virtualization" has replaced "grid computing" as the most popular industry buzzword. But given that industry has used "grid computing" mostly to mean "cluster computing" (e.g., Oracle 10-G, SGE), that doesn't say too much about grid per se.

Measuring adoption and impact is nevertheless an important goal. Thus we have integrated usage reporting mechanisms into our Globus software. We see continued growth in use, as captured by metrics such as service deployments. We're now trying to understand the underlying usage modalities. We believe that many are concerned with "eResearch" functions other than "federating computers"--e.g., on-demand access to computing [on HPC systems and/or EC2], data distribution, service publication and composition, etc. Do these functions count as "grid"? They do according to our article "The Anatomy of the Grid"--and if you look at the goals of projects such as D-Grid.

It would be interesting to see Google Trends data for just "grid." However, that word alone has too many different uses.

June 13, 2007

Digital Rome

ImageI like this story (and see images and video)--June 11, 2007 -- ROME -- Rome's Mayor Walter Veltroni will officiate at the first public viewing of "Rome Reborn 1.0," a 10-year project based at the University of Virginia and begun at the University of California, Los Angeles (UCLA) to use advanced technology to digitally rebuild ancient Rome. The event will take place at 2 p.m. in the Palazzo Senatorio on the Campidoglio.  An international team of archaeologists, architects and computer specialists from Italy, the United States, Britain and Germany employed the same high-tech tools used for simulating contemporary cities such as laser scanners and virtual reality to build the biggest, most complete simulation of an historic city ever created. “Rome Reborn 1.0" shows almost the entire city within the 13-mile-long Aurelian Walls as it appeared in A.D. 320. At that time Rome was the multicultural capital of the western world and had reached the peak of its development with an estimated population of one million.

Continue reading "Digital Rome" »

April 23, 2007

Carr on Grid

An interesting article from GridToday reporting on a talk by Nicholas Carr, included below in full.

Continue reading "Carr on Grid" »

March 15, 2007

Sun Blackbox video

A nice video (1 minute into Greg Papadopolous' talk) on Sun's "cluster in a shipping container" product, BlackBox, which I wrote about a while back.

March 07, 2007

Swift takes wing ...

Picture1

[Update: see also a later post.]

We unveiled this week the first release of Swift, a system for the specification, execution, and management of applications comprising many tasks coupled by disk-resident datasets. Such applications are common when analyzing large quantities of data, performing parameter studies, and/or executing ensemble simulations. (The word "workflow" is often used for such applications, but it doesn't sound right to me) The open source Swift software combines:

  • A simple scripting language, SwiftScript, for the concise, high-level specification of such computations (without regard to data layout or location), and
  • An execution engine for the rapid and reliable dispatch of many tasks to many processors, whether on parallel computers, campus grids, or multi-site grids.

Swift users in the physical, biological, and social sciences; the humanities; computer science; and education have achieved multiple-order-of-magnitude savings (!) in program development and execution time, relative to approaches based on shell scripts and other ad hoc technologies.

Swift builds on work performed with National Science Foundation's Grid Physics Network (GriPhyN) project on the Virtual Data System (VDS). Work on another VDS component, Pegasus, continues at USC/ISI.

The Swift team comprises Yong Zhao (imminent PhD, already interviewing), Mike Wilde, Mihael Hatigan, Tibi Stef-Praun, Ben Clifford, and Nika Nefedova. Gregor von Laszewski architected (and Mihael Hatigan built) the Karajan runtime system on which Swift is built. Globus services are used to access remote computers and to move data.

January 29, 2007

To Stand the Test of Time?

A recent SDSC press release describes a "groundbreaking workshop on digital data stewardship." The topic: the long-term preservation of vital data--"the single most prevalent driver for new discoveries in the 21st century."

The press release also points the interested reader to the workshop report, entitled "To Stand the Test of Time." However, when I tried that URL, I received "Error 404: Document Not Found." (Same thing for another URL discovered by a Web search.) A careless Web reorg at the Association for Research Libraries, or an ironic postmodern commentary on the difficulties of preservation?

I've been sent a copy of the report, and it looks interesting. Hopefully it will be re-preserved soon for the rest of you.

January 25, 2007

The world's biggest grid computer ...

Having recovered from the holidays and several proposals and reports, it is time for met to get back to blogging. Certainly lots of interesting things to write about.

Where better to start than a report of the world's biggest grid computer. Or maybe not ... I read a report from the Davos Forum that quotes Vint Cerf as saying that between 100 and 150 million of the world's 600 million computers are (unwittingly) engaged in botnets. The report notes that "a single botnet at one point used up about 15% of Yahoo's search capacity [retrieving] random text snippets to camouflage messages so that its spam e-mail could get past spam filters." Scary.

January 19, 2007

80 core processor

Says the EE Times:

Intel's researchers have produced an 80-core chip that uses less energy than a quad-core processor and has teraflop performance capabilities.

I know that (super)computing is getting weird, what with GPUs and Cell and the like. But it seems to be getting weirder faster than expected. I still don't understand: will the commodity processor of 2010-2015 be a homogeneous, multi-core, general purpose system, or a heterogeneous system with GPUs, CPUs, etc.?

January 08, 2007

Gradatim Ferociter

070103_blue_vsmall_320pstandard

You're not going to see any mention of the iPhone here. Not a word.

Instead, a pointer to a video of the recent Blue Origin test flight. Their motto: Gradatim Ferociter, which apparently means "step-by-step--courageously." More details here. Some people think it is a prototype for a single stage to orbit launcher. Well, they have 30-40 years to get it right, from my perspective, assuming their is no age limit on passengers.

January 05, 2007

Blog on Virtualization and Grid

I discovered today Tim Freeman's interesting blog on virtualization and grid. Well worth monitoring, even though I was sad to find a pointer to a debunking of the "blue pill myth." If we can distinguish virtuality from reality on computers, what's next? Is The Matrix a myth also?

December 19, 2006

Wikisauri: Thesauri from Wikipedia

David Milne, Olena Medelyan and Ian Witten have a nice paper at this conference I am attending in Hong Kong, on mining domain-specific thesauri from Wikipedia. As they say:

How can you obtain a thesaurus to support a library of documents in a particular domain? Manual construction is prohibitively expensive; automatic generation is woefully inaccurate. General thesauri do not incorporate the specialist terminology that pervades our professions, nor can they keep pace with the deluge of new topics and concepts that arrive each day. Yet a contemporary resource that incorporates expertise in all fields of human endeavour already exists: the widely known Wikipedia.

Basically, they  mine the structure of Wikipedia (its redirects, hierarchy, and hyperlinks) to infer the equivalence, hierarchical, and associative relations needed to build a thesaurus. Comparison with a professionally prepared thesaurus (from agriculture) shows that this approach can be effective. Another example of crowdsourcing, based on a rather nonobvious use of the work of its contributors.

December 18, 2006

The flattening of supercomputers

Matei Ripeanu has an interesting brief article in IEEE Distributed Systems Online in which he analyzes the shape of the by-now-(in)famous Top 500 list of supercomputers, released every six months since 1993.

He notes first that a plot of performance vs. rank gives a power law. Not in itself surprising. But then he notes that the power law coefficient is getting smaller over time: in other words, the bigger machines are, on average, getting faster more slowly than the slower machines. Thus, for example, the bottom 25 machines in the Top500, if aggregated together, would match only the #30 machine in 1993, but match the #5 machine in 2005.

Why this change? Alex Szalay attributes it to the top500 spurring people to buy bigger computers. (I.e., the act of measuring supercomputer evolution perturbs that evolution!) A provocative thesis, but hard to evaluate. Matei attributes it simply to the increasing ease with which one can aggregate systems.

November 14, 2006

Hadoop on EC2

Here's something neat (and details here).

Hadoop, an open source clone of Google FS and MapReduce, can be run on top of Amazon EC2, a hosting service that allows leasing servers on an hourly basis.

As Greg Linden goes on to say:

Developers may now be able to rapidly bring up hundreds of servers, run a massive parallel computation on them using Hadoop's MapReduce implementation, and then shut down all the instances, all with low effort and at low cost. Very cool.

My colleague Tim Freeman points out that you can run those same VMs on your own resources using the Globus Workspace service.

November 13, 2006

Grid in Spanish/La Grid en Español

A recent article on "la Computacion Grid" in a Spanish Newspaper, by Borja Sotomayor. He concludes:

La Computación Grid, a pesar de ser una tecnología con bastante madurez, sigue estando en constante evolución. Actualmente existen muchas grids computacionales (como EGEE y TeraGrid), construidas para dar salida principalmente a problemas científicos, pero todavía no existe «La Grid». De la misma manera que internet nació en el ámbito científico para luego llegar al público general, lo mismo puede esperarse, a largo plazo, de la Computación Grid. Cuando la tecnología madure lo suficiente, será posible que cualquier usuario, desde su ordenador personal de casa, pueda enviar complejos trabajos computacionales a «La Grid», como si tuviésemos un supercomputador en el salón de casa.

Borja asked me to write a few words, too, which I did. I didn't realize my Spanish was so fluent ...

Continue reading "Grid in Spanish/La Grid en Español" »

October 27, 2006

Connections Between Grid and P2P

What do Grid and P2P have to do with each other? Adriana Iamnitchi and I wrote a paper on that question a few years ago. The title is "On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing", which we explained as follows:

It has been reported that life holds but two certainties, death and taxes. And indeed, it does appear that any society-and in the context of this article, any large-scale distributed system-must address both death (failure) and the establishment and maintenance of infrastructure (which we assert is a major motivation for taxes, so as to justify our title!).

Continue reading "Connections Between Grid and P2P" »

October 11, 2006

IBM has a secret island headquarters hideaway inside a computer game

Sometime soon, I hope to say something intelligent about Second Life and what it means for numerous things, including science and innovation. First, though, I need to get a computer with a graphics card modern enough to run it myself.

In the meantime, you can read some interesting comments by Charlie Stross (who coined the title for this post, and fears his science fiction magnus opus is being overtaken by events) and Irving Wladawsky-Berger, the very smart IBM executive who has a history of getting IBM into wacky things like Linux, parallel computing, and ... well, now it seems, Second Life. If you prefer traditional dead-tree media, the October 2006 issue of Wired has a nice article.