My Photo

June 13, 2008

BEVO Report on Virtual Organizations Released

The report from the two workshops that Jonathan Dugan, Tom Finhiolt, Carl Kesselman, Katherine Lawrence, and I ran (with much help from Diana Rhoten) on "Building Effective Virtual Organizations" is now available on the BEVO website. The title is "Beyond Being There: A Blueprint for Advancing the Design, Development, and Evaluation of Virtual Organizations"--and that is what the report provides.

We lead off with the delightful quote from Arthur Mee, writing in the Strand Magazine in 1898:

If as it is said to be not unlikely in the near future, the principle of sight is applied to the telephone as well as that of sound, earth will be in truth a paradise, and distance will lose its enchantment by being abolished altogether.

Of course things are not quite so simple (fortunately!)--but it is certainly true that the Internet has transformed the practice of science. This report explains why that it is so, reviews exemplar projects that illustrate the nature of this transformation, and explains what we need to do next to take virtual organizations to the next level.

April 03, 2008

Nice article on caBIG cancer biomedical informatics grid

I keep mentioning caBIG. But here is a nice profile in ComputerWorld that describes the project's goals and status.

March 31, 2008

Big enhancements to caGrid federation infrastructure

I've mentioned the cancer Biomedical Informatics Grid (caBIG) project before. It's goal is "to develop applications and the underlying systems architecture that connects together data, tools, scientists and organizations in an open federated environment." Underpinning caBIG is a service oriented infrastructure, caGrid, which is in turn built on Globus software.

The caBIG project announced today the release of version 1.2 of caGrid core infrastructure, with  enhancements such as

  • A simplified Grid Transfer Service to handle large scale data transfer for grid-based queries
  • Integration with the latest releases of Software Development Kit (SDK) and Common Security Module (CSM)
  • Availability of a new Web based Single Sign On Framework, called WebSSO, for providing single sign on capability for web based applications integrated with caGrid GAARDs security infrastructure, based on collaboration with the caBIG Clinical Trials Suite (CCTS) team from the Clinical Trials Management Systems Workspace
  • Substantial improvements to usability of tools such as Introduce Toolkit based on user feedbackI
  • ntegration with Apache Ivy build system for improved dependency management of sub components within the caGrid core infrastructure
  • Early preview to integration with the Taverna Workbench for developing workflows, based on collaboration with the Integrative Cancer Research Workspace
  • Backward compatible with the caGrid 1.0 and caGrid 1.1

My colleagues Ravi Madduri and Wei Tan at the University of Chicago and Argonne National Laboratory have been very involved in this work, in particular the integration with Taverna.

November 18, 2007

On Interdisciplinary Work

I recently came across this quote from Théophile Gautier (1811–1872)'s poem L'Art:

Oui, l'oeuvre sort plus belle
D'une forme au travail
Rebelle,
Vers, marbre, onyx, émail.

[Yes, the work comes out more beautiful from a material that resists the process, verse, marble, onyx, or enamel (not email!)]

I think this summarizes well the merits of interdisciplinary work and the challenges and joys of system-level science.

October 25, 2007

Michael Turner on the National Laboratories

Our esteemed Chief Scientist has some interesting and inspiring things to say about the US Department of Energy's National Laboratories. Michael Turner led development of the report Connecting Quarks with the Cosmos, which is still a good read, at least for this computer scientist.

October 22, 2007

Fred Brooks on "computer science"

I recently came across the speech given by Fred Brooks (author of The Mythical Man Month and other wonderful works) on receiving the ACM Allen Newell Award in 1996. In this speech, titled "The Computer Scientist as Toolsmith," he says many interesting things. I'll quote a couple. First, he writes that "computer science" is above all an engineering discipline, concerned with "systems design problems characterized by arbitrary complexity":

Examples are the intricate demands upon operating systems, or knowledge webs, or computer networks. The arbitrariness is inherent—the requirements and constraints spring from a host of independent minds.

These problems scandalize and discourage those who approach them from backgrounds of mathematics and natural science, and for different reasons. Mathematicians are scandalized by the complexity—they like problems which can be simply formulated and readily abstracted, however difficult the solution. The four-color problem is a perfect example.

Physicists or biologists, on the other hand, are scandalized by the arbitrariness. Complexity is no stranger to them. The deeper the physicists dig, the more subtle and complex the structure of the “elementary” particles they find. But they keep digging, in full faith that the natural world is not arbitrary, that there is a unified and consistent underlying law if they can but find it.

No such assurance comforts the computer scientist. Arbitrary complexity is our lot, and here more than anywhere else we need the best minds of our discipline fashioning more powerful attacks on such problems.

It's a useful reminder that "computer science" is not [just] mathematics or physics, and that there are many challenging things to be done in computing that do not involve theorems or physical laws.

Second, he challenges what he saw as the goals of AI research to replace human intelligence:

If indeed our objective is to build computer systems that solve very challenging problems, my thesis is that
                                IA > AI
that is, that intelligence amplifying systems can, at any given level of available systems technology, beat AI systems. That is, a machine and a mind can beat a mind-imitating machine working by itself.

Someday a computer may beat the world champion in chess. When that day comes, I should like to see the world champion equipped with a powerful and suitable IA chess tool, and then play against the AI system. I’ll bet on the IA team.

Amen.

October 18, 2007

caGrid article

Cagrid_small An article on iSGTW describes the National Institutes of Health's cancer biomedical information grids (caBIG) project and its caGrid infrastructure. Not too much new information relative to previous posts, but a good reminder of the nice work that this group is doing. The event that spurred this article is the recent release of caGrid 1.1.

Interestingly, recent articles note that cancer rates continue to decline in the US, but increase worldwide--to and estimated 15 million new cases by 2020. That's a lot of people, emphasizing the importance of this work.

From a technology perspective, caBIG and caGrid are exciting because of the extensive and powerful use they make of the Web Services infrastructure developed over the past several years. In particular, I can't resist pointing out that the entire infrastructure is based on Globus software, and in particular its implementation of the WSRF, WS-Notification, WS-Addressing, security, and related specifications.

October 13, 2007

Nobel Peace Prize and the Grid

After_peaceThe Intergovernmental Panel on Climate Change (IPCC) and Al Gore have been awarded the Nobel Peace Prize.

A perhaps underappreciated aspect of this award is that (as I have noted previously) the climate model simulation data that underpinned the IPCC analysis was made available to the international community via the Earth System Grid, which itself uses (among other things) Globus software. Thus the Grid and Globus communities could (if they were less modest) claim a tiny little bit of credit for that prize.

The next phase of IPCC will require the analysis of far more data than in the current round, as models become yet more sophisticated and more scenarios are run. The next phase of Earth System Grid will feature a more decentralized, federated structure to enable this analysis.

October 12, 2007

Globus and Swift at eSocial Science Conference

My colleagues Lee Liming and Tibi Stef-Praun were at the eSocial Science conference in Ann Arbor last week.

Lee presented a nice tutorial, "Service Oriented Science: Globus Software in Action" that examined how to use Globus software to address different application problems. Take a look at the slides.

Tibi presented recent work in the Computation Institute involving the use of Swift to accelerate the solution of problems in computational economics. This preliminary work is part of a larger effort with Rob Townsend and other economists focused on the development of cybertools for problems in economics.

May 10, 2007

Fundamentalist physics: why Dark Energy is bad for Astronomy

While visiting Alex Szalay in Munich recently, I spent some time talking with Simon White, director of the Max Planck Institute for Astrophysics. Alex later pointed me at an article that  Simon authored recently, entitled Fundamentalist physics: Why Dark Energy is bad for Astronomy. It's a fascinating commentary on the goals and cultures of two scientific communities that one the surface might appear to have much in common. The abstract:

Continue reading "Fundamentalist physics: why Dark Energy is bad for Astronomy" »

May 01, 2007

Forest Observation via GT4

It seems that every day I learn about a neat new GT4-based cyberinfrastructure project that I had previously not heard of. This week, Bill St Arnaud writes about the SAFORAH forest observation system:

The Canadian SAFORAH has many objectives - of which one is to measure the 
amount of carbon dioxide absorbed by Canadian forests. This cyber-infrastructure
project also supports studies in bird habitat across Canada. It uses
Globus Toolkit v.4 at all of the SAFORAH participating sites. Currently,
four Canadian Forestry Centres located in Victoria British Columbia,
Cornerbrook Newfoundland, Edmonton, Alberta and Laurentian Québec are
operationally connected to the SAFORAH data grid.  SAFORAH offers
Grid-enabled OGC services which are used to increase interoperability of
EO data between SAFORAH and other geospatial information systems. The
Grid-enabled OGC services consist of the following main components:
Grid-enabled Web Map Service (GWMS), Grid-enabled Web Coverage Service
(GWCS), Grid-enabled Catalog Service for Web (GCSW), Grid-enabled Catalog
Service Federation (GCSF), Control Grid Service (CGS) and the Standard
Grid Service Interfaces and OGC Standard User Interfaces.

I want to learn more!

April 26, 2007

Globus MEDICUS wins award

I wrote a while back about the Globus MEDICUS work being done in LA. They've now won an Internet2 IDEA Award. Stephan Erberich, project leader and Director Functional Imaging and Biomedical Informatics at the University of Southern California, writes:

Today we routinely expect information to be available on the Internet, but this is still not the case with medical information. We believe that making it available, in a secure fashion, is crucial: it has the potential to deliver better, more informed care at reduced cost. We believe that our Globus MEDICUS project takes important first steps toward this goal. Our system lets doctors and patients utilize the power of high-speed Internet to easily and securely share information. Much remains to be done, but we are gratified by the benefits that are already apparent.

March 10, 2007

Warren Washington on climate change

I attended a talk by the distinguished climate modeler Warren Washington on Thursday: "Climate Modeling of the 20th and 21st Centuries." He spoke on the state of the art in climate modeling, the evidence for warming, and the likely impacts of future warming. The scientific consensus is that we the planet has warmed 0.7C since the beginning of the 20th Century. January 2007 was the warmest January in recorded history. It's easy to see why people are so worried.

The talk also featured a long Q&A. One Q: how well can models predict catastrophic change? A: Not very well, as relevant physics (e.g., Antarctic ice sheet collapse, methane hydrate emissions) are not well understood. From a climate change skeptic: might observed warming not be due to some other factor? (As: Physics of greenhouse greenhouse forcing well understood, no
other mechanism known, rate of change unprecedented in record.)

Warren Washington co-authored
An Introduction to Three-Dimensional Climate Modeling, from which I learned much of what I know of geophysical dynamics. He was also chair of the National Science Board for a while. A great scientist.

February 22, 2007

Cyber-enabled Discovery and Innovation

The National Science Foundation's Information Technology Research program was (IMHO) one of the most successful efforts to support interdisciplinary research in computational science. (The Department of Energy's SciDAC program does well too.) Thus it has been a great concern that this program was terminated with no clear next step.

The new Cyber-enabled Discovery and Innovation (CDI) program, announced as beginning in 2008, seems to be a worthy successor. Five key elements:

  • Knowledge extraction
  • Interacting elements
  • Computational experimentation
  • Virtual environments
  • Educating researchers and students in computational discovery

February 15, 2007

Microfluidic Bubble Logic

Img_1 A wonderful article in Science (press release and article): Manu Prakash and Neil Gershenfeld describe how to use bubbles in a microfluidic device to carry on-chip process control information, while also performing chemical reactions.

Microfluidics is a fascinating technology that deals with the control and manipulation of microliter and nanoliter volumes of fluids. Fluids flow through microfabricated channels (see figure), allowing the delivery of precise quantities of reactants and the precise control of chemical reactions.

Previously, control has been achieved via external valves and control systems. The authors present channel geometrics that exploit nonlinear behavior of bubbles in microfluidic flows to perform logic operations (e.g., "a bubble has arrived on channel A AND B") and to store bubbles. (E.g., the figure shows three AND-gates connected in a ring oscillator. A bubble flows clockwise around the ring until it joins a stream.)

Quoting the press release: "Controlling chemical reactions will likely be a primary application for the chips. It will be possible to create large-scale microfluidic systems such as chemical memories, which store thousands of reagents on a chip (similar to data storage), using counters to dispense exact amounts and logic circuits to deliver them to specific destinations."
   

January 27, 2007

Charles Falco on Optics

I had the opportunity this week to listen to a wonderful talk by Charles Falco, a physicist from the University of Arizona who has done fascinating work in recent years with the painter David Hockney on the use of optics in early renaissance painting. As he mentioned at the beginning of his beautifully constructed and entertaining talk, his ambition used to be to have his name in the index of a physics textbook; instead, he now encounters PhD theses on the "Hockney-Falco thesis."

As recounted in an article and his FAQ, Falco and Hockney showed via careful analysis that a number of famous renaissance painters (e.g., van Eyck) made use of lenses and mirrors in their paintings. They do this not (only) by showing that the paintings are unreasonably accurate, but also (in some cases) that they are inaccurate in unexpected ways.

This work is interesting for several reasons: first, it was generally assumed that optics were far less advanced at that time, and second it provides interesting insights into how van Eyck and others worked. It is also a fascinating example of how even the most studied objects can have surprises to reveal.

Falco observed that many art historians responded to this analysis by claiming (often stridently) that it was "wrong, irrelevant, or obvious--and sometimes all three." That a more nuanced response is possible was evidenced by Barbara Stafford, an art historian at UChicago, who observed after the talk that precisely because optics distort in various ways, their use represents an aesthetic choice, not (just) a labor-saving device.

January 11, 2007

Tapping Private Sector Innovation?

Charlie Catlett writes about recent partnerships between federal science projects and Google and Amazon, in which corporate services are used to host (and/or process) data from science experiments.

I wonder: are these projects being undertaken as philanthropic efforts by Amazon and Google? (They can certainly afford it, and it's good PR--and maybe good experience.) Or have people determined that they make good economic sense, and they are paying for it? I hope the latter, as it means these approaches are replicatable. If so, I'd love to see the analysis.

December 21, 2006

caBIG releases caGrid

I've written previously about the cancer Biomedical Informatics Grid, caBIG, a national-scale network linking research laboratories, cancer centers, and investigator projects to accelerate the development of effective patient therapies for cancer. They just released the (Globus-based) caGrid version 1.0, which implements the core Grid architecture of caBIG to support scientific use cases from the cancer research community. A nice way to end the year. 

December 13, 2006

Energy

What is the most important problem that one can work on? That is a question that we all should ask ourselves from time to time.

A compelling answer to that question is "energy." Without inexpensive, nonpolluting, carbon-neutral energy, many other things that we may think are important--health, longevity, environment, prosperity, freedom from conflict--are likely to remain elusive for many, and indeed become inaccessible for an increasing number.

Of course the energy problem is not simply a question of supply: we must also address demand. But as Pacala and Socolow argue, any complete solution must be multifaceted.

Sustained improvements in demand and supply will require significant advances in science and engineering. It so happens that I work at a Department of Energy laboratory, which is devoted to producing those advances. The joke used to be that the "E" in DOE stood for "everything." But the E in DOE, and thus the DOE laboratories, seems likely to become increasingly important.

December 12, 2006

NEON and the Earth System Grid

Information on the Earth System Grid is featured (for a few weeks) on the NEON project's web site, along with the latest NEON planning documents. (NEON=National Ecological Observatory Network, major US initiative to collect unprecedented amounts of ecological data, assuming budget is ever allocated.)

Arguably even more cool than ESG is the plan to enlist hikers to collect ecological data along the Appalachian Trail, also featured on the NEON web site (-;

December 09, 2006

System-Level Science and Systems Biology

Our recent article on system science in IEEE Computer generated an interesting email from Peter Saffrey, who pointed me at the Beacon project, which aims to "build a model of the human liver by composing models of biological entities down to the level of cells."

The project has produced several articles. I've just read one so far, "Computational Challenges of Systems Biology," by Anthony Finkelstein*, Peter Saffrey, and others, which provides a nice introduction to the field, written by computer scientists (and one biologist) for computer scientists.

*I remember Anthony from my time at Imperial College, when he impressed me with (among other things) his aphorism that "inheritance turns all programming into maintenance--which programmers are particularly bad at."

December 08, 2006

The Nature of eScience

A talk by Tiejien Luo at CANS reminded me of Jim Gray's nice formulation of the evolution of science methodologies:

Thousand years ago: science was empirical, describing natural phenomena

Last few hundred years: theoretical branch, using models, generalizations

Last few decades: a computational branch, simulating complex phenomena

Today: data exploration (eScience)--unify theory, experiment, and simulation. (Data captured by instruments, or generated by simulator; processed by software; information/knowledge stored in computer; scientist analyzes database/files, using data management and statistics.)

Jim's equating of "eScience" with "data exploration" seems a little too narrow. (John Taylor, who coined the term, had a somewhat broader definition: "e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet.") However, the growing importance of data can hardly be overstated, and Jim's perspectives are worthy of careful consideration, especially by those who think of "computation and science" as being entirely about simulation

December 05, 2006

The brain and computer science (etc.)

A dense but interesting report from the National Science Foundation, Brain Science as a Mutual Opportunity for the Physical and Mathematical Sciences, Computer Science, and Engineering, talks about the state of the art in our understanding of Woody Allen's "second favorite organ," and opportunities for the the physical and mathematical sciences, computer science, and engineering to contribute to progress. The abstract follows.

Humankind now stands at a special moment in its long history of thinking about the brain, a moment of revolutionary change in the kinds of questions that can be asked and the kinds of answers that can be achieved. Fundamental shifts include:

Continue reading "The brain and computer science (etc.)" »

December 04, 2006

In search of lost time

ImgA fascinating article and associated commentary in this week's Nature on the Antikythera mechanism: a spookily amazing mechanical analog device for predicting the future positions of astronomical objects--built in the 2nd Century BC, in Greece. The picture is a reconstruction (see also supporting material).

The abstract from the main article explains the new work, which sheds a lot of new light on the nature of this mechanism:

Continue reading "In search of lost time" »

December 02, 2006

Open source problem solving in science

Linus' Law according to Eric S. Raymond: "given enough eyeballs, all bugs are shallow." In other words, if a large enough community of users and developers has access to (and is using) your source code, even subtle problems will be identified and resolved quickly.

The use of the Internet to create a "massively parallel human problem-solving system" is a powerful concept, as evidenced by such phenomena as the blogger as a source of news, wikipedia as a source of information, and advertising campaigns that solicit user-generated spots. (For more examples, see Jeff Howe's writings on crowdsourcing.)

Now Karim Lakhani of Harvard Business School is looking into whether such techniques can be applied to scientific problems. From a recent article (and interview):

Continue reading "Open source problem solving in science" »

November 23, 2006

Breaking the medical image communication barrier

A press release from USC describes the Globus-based MEDICUS system, to be demonstrated at the Radiological Society of North America (RSNA) meeting in Chicago next week. They claim that "doctors [are] finally able to share digital medical images instantly, nationwide, with full patient privacy protection." It is a press release of course, but as I've commented before, it's a pretty neat system. I'll be at RSNA to see it.

Meanwhile, we have two projects underway in the Computation Institute applying the Virtual Data System to medical problems. In one, we are processing 10s of thousands of mammograms and in the other, hundreds of functional MRIs (e.g., see this article). I'll write more about these projects as we get results.

November 22, 2006

GapMinder: Myths about the developing world

From Dan Atkins, a pointer to a wonderful source of animated information on world development trends. "Making sense of the world by having fun with statistics." You can also see Hans Rosling's talk on this work at TED. I'm not sure which is more amazing: the ways in which modern visualization techniques can be used to bring dry economic data to life, or the misconceptions that many of us (well, myself, certainly) have about the state of our highly dynamic world.

November 21, 2006

System-level Science

This month's issue of IEEE Computer includes four articles on system-level science: the integration of diverse sources of knowledge about the constituent parts of a complex system with the goal of obtaining an understanding of the system's properties as a whole. This being IEEE Computer, they focus in particular on information technology (IT) issues involved in achieving scientific goals:

[S]ystem-level science integrates not only different disciplines but also, typically, software systems, data, computing resources, and people. System-level science is usually a team pursuit. Data comes from different sources, different groups develop component models, team members provide specialized expertise, and the often substantial computing and data resources required for success are themselves diverse and distributed. Thus, system-level science itself requires the creation of yet another sort of system that may combine large numbers of both physical and human components.

Continue reading "System-level Science" »

November 18, 2006

Computational Social Sciences

I've been spending a lot of time recently talking with economists--of which the University of Chicago has quite a few. We're running a "Disciplinary Deep Dive" (3-D) look at computational economics this quarter, with lectures and discussions on a wide range of relevant topics.

Continue reading "Computational Social Sciences" »

November 16, 2006

Science Grid This Week

Science Grid This Week (SGTW) is now International Science Grid This Week (iSGTW). This weekly (duh) newsletter features news and stories on grid technology, grid deployments, and scientific applications enabled by grid. It's easy to subscribe. You should also send it stories on your work. There are many smaller projects that don't get the exposure of the big grid deployments, but they are often just as interesting.

November 08, 2006

Virtual Organizations at NSF

Dan Atkins is the author of the much-discussed "Atkins report" advocating and defining a US national strategy for cyberinfrastructure. Now he runs the National Science Foundation's new Office of Cyberinfrastructure, and in that position has the opportunity to execute on that report's recommendations--modulo the fact that he doesn't have the $1B budget that his report advocated.

Atkins has a new weblog, CI Topics, with pointers to many interesting documents. I like a recent talk he gave to the NSF Education and Human Resources Directorate. In particular, slides 17 onwards talk about the importance of "VOs"--the technologies and processes that enable communities to form and operate efficiently. I think this is just the sort of work required to scale the benefits of cyberinfrastructure to reach millions of researchers and students. Let's hope Dan can make it happen.

October 31, 2006

What Do You Do With a Million Books?

I am looking forward to an upcoming symposium at the University of Chicago: What Do You Do With a Million Books? (November 5 and 6: but why on a Sunday?):

In the wake of recent large-scale digitization projects aimed at providing universal access to the world's vast textual repositories, humanities scholars, librarians and computer scientists find themselves newly challenged to make such resources functional and meaningful.
Digitizing "a million books" ... poses far more than just technical challenges. Tomorrow, a million scholars will have to re-evaluate their notions of archive, textuality and materiality in the wake of these developments. How will humanities scholars, librarians and computer scientists find ways to collaborate in the "Age of Google?"

Speakers include John Unsworth, from the University of Illinois, a pioneer in digital humanities, and Gregory Crane, who's March 2006 article perhaps suggested the name for this symposium. (He discusses the challenges of scale, heterogeneity, granularity, noise, audience, and distributors.)

The Computation Institute is a sponsor. We already have a preliminary project underway applying machine learning technology to english language texts, and I hope to see more such projects in the near future.

October 28, 2006

Durable Nonsense

Sally Floyd cites three principles of durable nonsense in a talk on network simulation:

  1. For every piece of durable nonsense, there is an irrelevant frame of reference in which it makes perfect sense.
  2. Rigorous reasoning from inapplicable assumptions yields the world's most durable nonsense.
  3. The roots of most nonsense are found in the fact that people are more specialized than problems.

This concept of "durable nonsense" is wonderful and I think very useful. I suspect that more computer science research than we would like can be categorized in such terms.

I think the source for these principles is R. A. Rosanoff, "A Survey of Modern Nonsense as Applied to Matrix Computations," April 1969, although Floyd also cites John Spragins, "Computer System Performance Modeling and Durable Nonsense", January 1979--perhaps that's where she saw Rosanoff quoted.

October 19, 2006

Experimenting with Networks

It is well known that naturally occuring networks can have different structures: for example, every node may be connected to a fixed number of neighbors, or additional connections to "distant" nodes may create a "small-world" structure, or some nodes may be connected to far more nodes than others, as in  a "scale-free" or power-law structure (see figure).060807networks2180

Network structure is presumed to be important in determining behavior: for example, how fast information propagates and ability to evolve. But how do you do controlled experiments on the properties of different structures? The problem is that the networks like the Internet and ecological networks can't be easily changed.

In a beautiful series of experiments, Michael Kearns, Siddharth Suri, and Nick Montfort took a class of undergraduate students and asked them to solve various graph coloring problem. In each experiment, each of a set of students was connected (by computer) with other students in some network, and was then asked to select a color for his/her node that was different from those of his/her neighbors. This process continued until the graph coloring problem was "solved," meaning that a consistent set of colors was assigned.

The authors found that the time to solution varied greatly according to both the network structure and the amount of information provided about neighbors. The abstract:

Theoretical work suggests that structural properties of naturally occurring networks are important in shaping behavior and dynamics. However, the relationships between structure and behavior are difficult to establish through empirical studies, because the networks in such studies are typically fixed. We studied networks of human subjects attempting to solve the graph or network coloring problem, which models settings in which it is desirable to distinguish one's behavior from that of one's network neighbors. Networks generated by preferential attachment [i.e., scale-free] made solving the coloring problem more difficult than did networks based on cyclical structures, and "small worlds" networks were easier still. We also showed that providing more information can have opposite effects on performance, depending on network structure.

October 18, 2006

MEDICUS: Globus-powered medical imaging and computing

A really nice Globus-based application: MEDICUS (Medical Imaging and Computing for Unified Information Sharing). This software federates medical imaging and computing resources for clinical and research applications. Quoting the authors:

The objectives of the MEDICUS project are to promote transparent and non-proprietary solutions for medical image processing, and medical image and data sharing between heath care providers, physicians, and researchers in the life sciences. The Globus Toolkit provides the necessary architecture platform and standards to engage in this diverse and difficult field. As such it provides a vendor independent solution to efficiently communicate medical images and image outcome at various levels in the healthcare enterprise.180pxdicom_mr

Why I like it:

  1. It's the real deal--it already provides access to hundreds of thousands of images at some 40 sites across the U.S., including 27 from the Children's Oncology Group, allowing users to publish, search for, access, and process images in powerful ways.
  2. It is a lovely example of how Globus services (security, data, computing, etc.) can be used to develop secure, robust, and efficient distributed systems quickly and easily. (For details, see this slide set or the various papers on the MEDICUS web site.)
  3. It's been set up as an incubator on dev.globus, facilitating community access and contributions.

The responsible parties include Stephan Erberich, Ann Chervenak, Carl Kesselman, Manasee Bhandekar, and Marvin Nelson.

From the incubator web page, a list of features:

  • Vendor independent Grid Enterprise PACS (Picture Archiving and Communication System) deployment by vertically integrating Globus Toolkit Data services
  • DICOM legacy support to connect medical image modalities (MR, CT, X-Ray, etc.) to data Grids
  • Open-source image archiving and distributed warehousing for large-scale multi-center clinical trials
  • Medical image guided drug discovery in combination with remote processing (GRAM service)
  • Flexible fault-tolerant off-site DICOM Grid image storage for man-made or natural disaster recovery
  • Cost-efficient tele-radiology and image workflow between community care facilities and remote specialist using the Grid

October 06, 2006

Grid Fighting Cancer

The National Institutes of Health Cancer Biomedical Informatics Grid (caBIG) is one of the most exciting Grid deployments out there. There's a nice NIH Web site with background information. Quoting that site:

The National Cancer Institute (NCI) has launched the caBIG™ (cancer Biomedical Informatics Grid™) initiative to speed research discoveries and improve patient outcomes by linking researchers, physicians, and patients throughout the cancer community. caBIG™ is a voluntary network of infrastructure, tools, and ideas that enables the collection, analysis, and sharing of data and knowledge along the entire research pathway from laboratory bench to patient bedside.

I like caBIG for two reasons. The first is my family history of cancer )-:. The second is that they are one of the most ambitious and successful users of Globus software that I know. There are more an 800 people working on 70 projects within caBIG, and every caBIG Web Service is built on Globus technology. The caBIG software distribution uses just about every Globus component. In addition, caBIG has developed some nice new functionality to Globus, including:

I'm as much a fan of the search for the Higg's Boson as anyone, but there is something to be said about finding a cure for cancer!

Title_connection

October 05, 2006

Quantifying the Benefits of Cyberinfrastructure

We need to find a way of quantifying the benefits of "cyberinfrastructure"--the technology that underpins and enables eScience. We need this information if we are to justify spending on infrastructure (or not), decide what infrastructure to build, and understand how to improve the infrastructures that we have.

But quantifying benefits is hard.

An anecdote: In building the Globus-based Earth System Grid (ESG: see the picture for participating sites) we put a lot of effort into instrumentation and quantifying usage. Thus we can know that our more than 3000 registered users have downloaded more than 100 Terabytes of climate simulation data. Yet this data does not provide any real insight into whether the people downloading that data found it useful--or did anything useful with it.Usmap_1 We did survey users, and got useful information, but response rates were low.

Fortunately, one of the two data collections made accessible via ESG was the International Panel on Climate Change (IPCC) assessment simulation data, and the IPCC team was able to document that over 300 scientific papers had been produced [by early 2006] from data downloaded from ESG.

However, we can't always get such nice data. Thus, we may ask: What metrics are important? What data do we need? What is feasible to get? How do we get it? What can it tell us (and what not)?

I think we need to learn how to build infrastructures that can collect this sort of information automatically. We should involve social scientists in designing such systems and in assessing their effectiveness.

October 02, 2006

History and Theory of Infrastructure

I'm just back from a workshop on "History and Theory of Infrastructure: Lessons for New Scientific Infrastructure" in Ann Arbor, Michigan, which brought together a fascinating group of social scientists and others to discuss "what practical lessons can the history, sociology, and experience of existing infrastructures offer to the imagination, implementation, and governance of cyberinfrastructure."

One delightful aspect of the meeting was meeting wonderful scholars that I had known previously only by reputation, such as Geoff Bowker, Leigh Star, Paul Duguid, and Christine Borgman, as well as some I already knew, such as Tom Finholt, Bob Kahn, Dan Atkins, and Bill Dutton, and others that I was glad to get to know.

There were many fascinating and wide-ranging discussions. My impressions:

  • Social scientists (or at least those at the University of Michigan's School of Information) organize great meetings. The organizers had clearly put a lot of thought into how to structure the meeting to ensure useful discussion, and they also had excellent social events!
  • The mode of discussion was quite different from I expected. There were no formal presentations and little analysis, but many compelling anecdotes. At first, I found this strange, but then realized that "stories" are a compelling way  of conveying insights. That got me thinking: what "stories" should we be telling people embarking on cyberinfrastructure projects, to help them avoid mistakes and achieve success?
  • Another thought that seemed interesting, at least to me: How about designing cyberinfrastructure to collect the information that social scientists require to evaluate its utility? Large systems like TeraGrid, Open Science Grid, Earth System Grid, caBIG, or GEON, and also smaller systems, could be viewed as experimental apparatus for social scientists. What instrumentation should we include in them to that end?

Overall, I didn't come away convinced that the history of existing infrastructures can help those building cyberinfrastructure: railroads and networks are very different thing. But I became yet more convinced that social scientists have a lot to contribute to our understanding of how science and its tools will, and should, evolve in the 21st Century.

September 30, 2006

Mapping in Time and Place

I had an interesting conversation today with Michael Buckland about the importance of mapping historical cultural data to time and place. Most documents refer to place names, which may be ambiguous (e.g., country names come and go, town names change or are reused), and refer to time in similarly ambiguous ways (e.g., "last year", "during the summer", "when I was 10", "after the war"). If such references can be disambiguated, then it becomes possible to see connections that might not otherwise be visible.

Michael Buckland directs the Electronic Cultural Atlas Initiative (ECAI) an international project to develop and distribute digital data on historical and archaeological resources. To this end, they are working to "create digital maps that display a wide range of cultural material by using place and time as a common element."

Apparently current Geographical Information System (GIS) tools just don't deal with time in an adequate way. One exception is the University of Sydney's TimeMap system, which ECAI uses.

I've always loved maps, and we are seeing from recent innovations such as Google Maps just how powerful it can be to enable easy mapping of diverse data to geographical space. But I had never thought about the temporal dimension.

Ecaitop520

September 24, 2006

$60M per Year for Scientitic Discovery through Advanced Computing

The U.S. Department of Energy's Office of Science announced on September 7th its awards for the next phase of its "Scientific Discovery through Advanced Computing" (SciDAC) program. This is the major DOE program that funds research in computational science and tools, and is by several measures the most significant program in the world focused on high end computing for science.

This new program will spend $60M per year over the next three to five years on "projects aimed at accelerating research in designing new materials, developing future energy sources, studying global climate change, improving environmental cleanup methods and understanding physics from the tiniest particles to the massive explosions of supernovae." These projects will make use of amazing new computational facilities at Argonne, Oak Ridge, and Lawrence Berkeley National Laboratories, capable of computational rates of 100s of teraflop/s. The scientific goals of these

Usa_labs_univ2med

projects are truely remarkable in their ambitions and implications: it's well worth browsing the list to see what they are up to. It's also interesting to see where SciDAC researchers are located (see figure).

SciDAC emphasizes numerical simulation and supercomputers, but there is clearly also a growing recognition of the importance of linking both supercomputers and experimental facilities with the communities of scientists that must ultimately make sense of the petabytes of data produced by simulations and experiments. Thus, SciDAC-2 includes three projects focused on distributed data:

I and my colleagues in the Computation Institute at Argonne National Laboratory and the University of Chicago are involved in all three of these projects.

It's sobering to see that DOE funded only 30 out of 240 proposals. Given the exceptional quality of the people and ideas in many of the 210 proposals that were not funded, one is left keenly aware of the tremendous potential that remains untapped. Let's hope those ideas can be supported by other programs.

September 20, 2006

Earth System Grid

I'm at the kickoff meeting for the next phase of the Globus-based Earth System Grid (ESG), a U.S. Department of Energy project developing technology to manage and provide access to large quantities of climate simulation data. The two ESG  portals provide access to more than 100 terabytes of output from U.S. and international climate models. The 4000 registered users have so far download