My Photo

« November 2006 | Main | January 2007 »

December 21, 2006

caBIG releases caGrid

I've written previously about the cancer Biomedical Informatics Grid, caBIG, a national-scale network linking research laboratories, cancer centers, and investigator projects to accelerate the development of effective patient therapies for cancer. They just released the (Globus-based) caGrid version 1.0, which implements the core Grid architecture of caBIG to support scientific use cases from the cancer research community. A nice way to end the year. 

December 20, 2006

Posner on Second Life

Richard Posner, US judge well known for his books on various topics (and also a lecturer at U.Chicago law school, I discover), appeared on Second Life to promote his new book, Not a Suicide Pact. (I haven't read the book, but it is supposedly "controversial.")

I signed up to participate (there are limits on how many people can be in one place in SL), but couldn't make it. However, I read the transcript. What did I learn? Not that much:

  • There is a limited selection of suits available for avatars.
  • Not that many people turned up--the event was perhaps more PR than communication.
  • There are people starting to think about the legal implications of virtual worlds.

But I'm sorry I missed it.

December 19, 2006

Wikisauri: Thesauri from Wikipedia

David Milne, Olena Medelyan and Ian Witten have a nice paper at this conference I am attending in Hong Kong, on mining domain-specific thesauri from Wikipedia. As they say:

How can you obtain a thesaurus to support a library of documents in a particular domain? Manual construction is prohibitively expensive; automatic generation is woefully inaccurate. General thesauri do not incorporate the specialist terminology that pervades our professions, nor can they keep pace with the deluge of new topics and concepts that arrive each day. Yet a contemporary resource that incorporates expertise in all fields of human endeavour already exists: the widely known Wikipedia.

Basically, they  mine the structure of Wikipedia (its redirects, hierarchy, and hyperlinks) to infer the equivalence, hierarchical, and associative relations needed to build a thesaurus. Comparison with a professionally prepared thesaurus (from agriculture) shows that this approach can be effective. Another example of crowdsourcing, based on a rather nonobvious use of the work of its contributors.

December 18, 2006

The flattening of supercomputers

Matei Ripeanu has an interesting brief article in IEEE Distributed Systems Online in which he analyzes the shape of the by-now-(in)famous Top 500 list of supercomputers, released every six months since 1993.

He notes first that a plot of performance vs. rank gives a power law. Not in itself surprising. But then he notes that the power law coefficient is getting smaller over time: in other words, the bigger machines are, on average, getting faster more slowly than the slower machines. Thus, for example, the bottom 25 machines in the Top500, if aggregated together, would match only the #30 machine in 1993, but match the #5 machine in 2005.

Why this change? Alex Szalay attributes it to the top500 spurring people to buy bigger computers. (I.e., the act of measuring supercomputer evolution perturbs that evolution!) A provocative thesis, but hard to evaluate. Matei attributes it simply to the increasing ease with which one can aggregate systems.

December 17, 2006

Travel to Hong Kong

I flew to Hong Kong today ... it is strange how while one might expect to cross my favorite ocean to get from the US to China, in practice, we never flew over open water. Instead, some beautiful views of Canada, Siberia, Mongolia, and China.

December 13, 2006

Energy

What is the most important problem that one can work on? That is a question that we all should ask ourselves from time to time.

A compelling answer to that question is "energy." Without inexpensive, nonpolluting, carbon-neutral energy, many other things that we may think are important--health, longevity, environment, prosperity, freedom from conflict--are likely to remain elusive for many, and indeed become inaccessible for an increasing number.

Of course the energy problem is not simply a question of supply: we must also address demand. But as Pacala and Socolow argue, any complete solution must be multifaceted.

Sustained improvements in demand and supply will require significant advances in science and engineering. It so happens that I work at a Department of Energy laboratory, which is devoted to producing those advances. The joke used to be that the "E" in DOE stood for "everything." But the E in DOE, and thus the DOE laboratories, seems likely to become increasingly important.

December 12, 2006

NEON and the Earth System Grid

Information on the Earth System Grid is featured (for a few weeks) on the NEON project's web site, along with the latest NEON planning documents. (NEON=National Ecological Observatory Network, major US initiative to collect unprecedented amounts of ecological data, assuming budget is ever allocated.)

Arguably even more cool than ESG is the plan to enlist hikers to collect ecological data along the Appalachian Trail, also featured on the NEON web site (-;

December 10, 2006

The S stands for?

My tongue-in-cheek post a while back on Web Fundamentalism generated lots of interesting traffic and pointers. At some point I must internalize and summarize it all, but for now I just read (some of) it. The best thing I've seen so far is Peter Lacey's The S stands for Simple, a hilarious and very relevant Socratic dialog.

Continue reading "The S stands for?" »

December 09, 2006

System-Level Science and Systems Biology

Our recent article on system science in IEEE Computer generated an interesting email from Peter Saffrey, who pointed me at the Beacon project, which aims to "build a model of the human liver by composing models of biological entities down to the level of cells."

The project has produced several articles. I've just read one so far, "Computational Challenges of Systems Biology," by Anthony Finkelstein*, Peter Saffrey, and others, which provides a nice introduction to the field, written by computer scientists (and one biologist) for computer scientists.

*I remember Anthony from my time at Imperial College, when he impressed me with (among other things) his aphorism that "inheritance turns all programming into maintenance--which programmers are particularly bad at."

December 08, 2006

The Nature of eScience

A talk by Tiejien Luo at CANS reminded me of Jim Gray's nice formulation of the evolution of science methodologies:

Thousand years ago: science was empirical, describing natural phenomena

Last few hundred years: theoretical branch, using models, generalizations

Last few decades: a computational branch, simulating complex phenomena

Today: data exploration (eScience)--unify theory, experiment, and simulation. (Data captured by instruments, or generated by simulator; processed by software; information/knowledge stored in computer; scientist analyzes database/files, using data management and statistics.)

Jim's equating of "eScience" with "data exploration" seems a little too narrow. (John Taylor, who coined the term, had a somewhat broader definition: "e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet.") However, the growing importance of data can hardly be overstated, and Jim's perspectives are worthy of careful consideration, especially by those who think of "computation and science" as being entirely about simulation

December 07, 2006

Grid in China

I'm participating today and tomorrow in the China-America Networking Symposium (CANS), which this year has a particular focus on grid. It's good to see friends from China, although several are not here because of visa problems. (A familiar, and painful, story.)

I've had the good fortune to visit China several times in recent years. In addition, we have hosted several visitors from China, and I also have some wonderful Chinese students. So I know a little about Chinese grid activities, which include several major deployments, including:

Continue reading "Grid in China" »

December 06, 2006

Integrate GridFTP with Jakarta Commons Virtual File System

Sounds like neat stuff ... anyone tried it? --

The Jakarta Project Commons Virtual File System (VFS) provides a single application programming interface (API) for accessing various different file systems. Commons VFS presents a uniform view of the files from various sources, such as local files, FTP servers, SSH, WebDAV, HTTP, HTTPS, Windows® shares, and others. VFS supports a wide range of file systems. However, grid computing protocols, such as GridFTP, are missing. Check out an implementation of a GridFTP provider for use within Commons VFS.

December 05, 2006

The brain and computer science (etc.)

A dense but interesting report from the National Science Foundation, Brain Science as a Mutual Opportunity for the Physical and Mathematical Sciences, Computer Science, and Engineering, talks about the state of the art in our understanding of Woody Allen's "second favorite organ," and opportunities for the the physical and mathematical sciences, computer science, and engineering to contribute to progress. The abstract follows.

Humankind now stands at a special moment in its long history of thinking about the brain, a moment of revolutionary change in the kinds of questions that can be asked and the kinds of answers that can be achieved. Fundamental shifts include:

Continue reading "The brain and computer science (etc.)" »

December 04, 2006

In search of lost time

ImgA fascinating article and associated commentary in this week's Nature on the Antikythera mechanism: a spookily amazing mechanical analog device for predicting the future positions of astronomical objects--built in the 2nd Century BC, in Greece. The picture is a reconstruction (see also supporting material).

The abstract from the main article explains the new work, which sheds a lot of new light on the nature of this mechanism:

Continue reading "In search of lost time" »

December 03, 2006

Web 4.0

Web 2.0 was first a noun, then a conference. Next it was an adjective. Perhaps soon it will be a verb, adverb, or expletive. Regardless of its grammatical status, it already has an entry in the Devil's Dictionary:

Web 2.0 Proper noun. The name given to the social and technical sophistication and maturity that mark the— Oh, screw it. Money! Money money money! Money! The money’s back! Ha ha! Money!

It has inspired people to write about Science 2.0, Bubble 2.0, and many other 2.0s as well. (Also some cleverer definitions than those attempted by Tim O'Reilly: e.g., "Web 2.0 = chmod 777.")

Continue reading "Web 4.0" »

December 02, 2006

Open source problem solving in science

Linus' Law according to Eric S. Raymond: "given enough eyeballs, all bugs are shallow." In other words, if a large enough community of users and developers has access to (and is using) your source code, even subtle problems will be identified and resolved quickly.

The use of the Internet to create a "massively parallel human problem-solving system" is a powerful concept, as evidenced by such phenomena as the blogger as a source of news, wikipedia as a source of information, and advertising campaigns that solicit user-generated spots. (For more examples, see Jeff Howe's writings on crowdsourcing.)

Now Karim Lakhani of Harvard Business School is looking into whether such techniques can be applied to scientific problems. From a recent article (and interview):

Continue reading "Open source problem solving in science" »

December 01, 2006

Einstein's Clocks, Poincare's Maps

I always like discovering that things are perhaps not quite as they seem. Peter Galison's lovely book Einstein's Clocks, Poincare's Maps has that flavor. I at least have always understood Einstein to be an isolated genius, thinking great thoughts in the obscurity of the Swiss Patent Office. But what were those patents he was reviewing in his day job? Apparently many had to do with time synchronization, a topic of great interest in the late 19th and early 20th centuries, as telegraphs spread across Europe.

Continue reading "Einstein's Clocks, Poincare's Maps" »