My Photo

November 11, 2008

Research in Paradise

ThumbJpeg.ashx This was the title of a talk I gave in New Zealand a couple of weeks ago, at Running Hot, a conference for young New Zealand scientists. (The slides are mostly images, so may not be too illuminating.) It was a great meeting. Participants from every field of science and speakers ranging from the Chief Scientific Advisor for Scotland to the Dean of Holy Trinity Cathedral, Auckland discussed what it means to be a working scientist, and to support science, in a country of 4M people.

In my talk I discussed the impact of information technology on the practice of science:

Impressed with the telephone, Arthur Mee* predicted in 1898 that if videoconferencing could be developed, 'earth will be in truth a paradise.' Since his time, rapid technological change, in particular in telecommunications, has transformed the scientific playing field in ways that while not entirely paradisical, certainly have profound implications for New Zealand scientists. The Internet has abolished distance, as Mee also predicted–a New Zealand scientist can participate as fully in online discussions as anyone else, and their blog can be every bit as influential. Exponential improvements in networks, computing, sensors, and data storage are also profoundly transforming the practice of science in many disciplines. But those seeking to leverage these advances become painfully familiar with the 'dirty underbelly' of exponentials: if you don't constantly innovate, you can fall behind exponentially fast. Such considerations pose big challenges for the individual scientist and for institutions, for researchers and educators, and for research funders. Some of the old ways of researching and educating need to be preserved, others need to be replaced to take advantage of new methods. But what should we preserve? What should we seek to change?

(*) Arthur Mee was a remarkable fellow, known to many in the British Commonwealth (but not, I think, in the US) through his wonderful Childrens' Encyclopedia.

Reblog this post [with Zemanta]

November 09, 2008

Scientific Collaboration on the Internet

0262151200-medium I'm looking forward to receiving my copy of Scientific Collaboration on the Internet. I have an article in it on lessons learned from the NEESgrid project (an earlier version is here, I think it's a good read, especially between the lines), but the other articles are probably far more interesting:

Continue reading "Scientific Collaboration on the Internet" »

September 19, 2008

Argonne named postdoc positions

The Argonne Named Postdoctoral Fellowship Program is a great opportunity for a recent or imminent PhD looking to work at the cutting edge of computing. You also get a fancy title, like "Arthur Holly Compton Fellow" or similar. (There are a few to choose from.)

The application deadline is November 5. If you are interested, drop me a line. More details below:

The Director's Office initiated these special postdoctoral fellowships at Argonne, to be awarded internationally on an annual basis to outstanding doctoral scientists and engineers who are at early points in promising careers.š The fellowships are named after scientific and technical luminaries who have been associated with Argonne and its predecessors, and the University of Chicago, since the 1940's.

Candidates for these fellowships must display superb ability in scientific or engineering research, and must show definite promise of becoming outstanding leaders in the research they pursue.š Fellowships are awarded for a two-year term, with a possible renewal for a third year, and carry a stipend of $76,000 per annum with an additional allocation of up to $20,000 per annum for research support and travel.

Requirements for applying for an Argonne Named Postdoctoral Fellowship:

The following documents must be sent via e-mail to:š Named-Postdoc@anl.gov by November 5, 2008.š In the subject line please include the name of the candidate.

* Nomination memo (˜ 2 pages) from ANL sponsor
* Research proposal (˜ 2 pages)
* Three letters of recommendation from other than Argonne staff
* CV
* List of publications, abstracts and significant presentations
* Graduate School and Undergraduate Transcripts

The sponsor could be someone who is already familiar with your research work and accomplishments through previous collaborations of professional societies.š If you have not yet identified an ANL sponsor, visit the detailed websites of the various Research Programs and Research Divisions at www.anl.gov

All correspondence should be addressed to Argonne Named Postdoctoral Fellowship Program.š One application is sufficient to be considered for all named fellowships.š For additional details, visit the Argonne web site at http://www.dep.anl.gov/postdocs/

September 12, 2008

The real difference between grid and cloud

Tim Freeman pointed me to this video, which reveals the real difference between grid and cloud: automatic weapons.

September 11, 2008

A critique of "Using Clouds to Provide Grids..."

The authors of a recent OGF document, "Using Clouds to Provide Grids Higher Levels of Abstractions and Explicit Usage Modes" make several assertions with which I take exception:

1) "There is a level of agreement that computational Grids have not been able to deliver on the promise of better applications and usage scenarios."

It is fascinating to watch the Gartner hype cycle in action, if sad to see people stuck in the trough of disillusionment. But the fact is, fortunately, that there are substantial grid projects and applications that are having substantial success. Ones that come immediately to mind are the Earth System Grid, cancer Biomedical Informatics Grid, and the LIGO Scientific Collaboration, but as it was yesterday that the LHC was switched on, we should also recall the remarkable successes of the LHC Computing Grid and its partner projects such as Open Science Grid. At a different level, Globus people will be happy to talk about the millions of files moved via GridFTP every day, and Miron Livny will be happy to talk at length about how many millions of CPU hours are delivered every day via Condor.

2) To address this purported lack of success, "there is a need to expose less detail and provide functionality in a simplified way. If there is a lesson to be learned from Grids it is that the abstractions that Grids expose – to the end-user, to the deployers and to application developers – are inappropriate and they need to be higher level."

No evidence is provided for this assertion that complex interfaces are the reason for the difficulties people have with grids. I argue that the issues are more complex.

First, the interfaces themselves are not, in my view, a significant issue. We can argue whether we prefer REST or Web Services, or say Nimbus (a grid virtualization interface) or EC2 (a cloud virtualization interface), but the differences among these alternatives are not great.

On the other hand, the economic systems that apply in the two cases are extremely different:

  • Amazon services are designed to support the masses, they have no political constraints on who they can provide service to, and their charging model provides strong return to scale; thus, Amazon can focus on, and succeed in providing, modest-scale, reliable, on-demand service to many.
  • TeraGrid (to use a US example) is designed to support a small number of extreme computing users, with a negative return to scale (the more users, the more work for fixed budget); thus, they are not motivated to provide virtualization solutions or to operate highly reliable remote access interfaces.

The implications of these different foci for users are tremendous. On EC2, I give my credit card and start a VM--a few seconds. On TeraGrid, I request an allocation (which may not be granted!), get an account, submit a request to run a job (they won't allow me to start a VM), wait in the queue--a many week process. Furthermore, I sometimes find that the remote access interfaces fail because keeping them running is not high priority.

This alternative perspective is I think more revealing about the sources of the differences and the ways we might address them. If we want on-demand, high-quality, compute and storage services, then we need either to create an economic system in which academic providers are motivated to provide such services, or decide to outsource to industry.

The importance of higher-level interfaces is a separate issue. Yes, tools like Hadoop and Swift for data analysis, Introduce for service authoring, Taverna for service composition are important and necessary. Yes, we should be hoping to leverage and influence work done in the far larger corporate market to our advantage. (A focus of the upcoming CCA workshop.)

3) "Grids as currently designed and implemented are difficult to interoperate." The authors make a big deal of this point, but it is not clear to what purpose.

It is true that interoperation is not automatic. [If only everyone used Globus software, then all would be well :) --although of course the policy issues would remain!]. But I am not sure that this is a significant problem for users, or hard to achieve when it is needed. E.g., the caBIG team recently demonstrated a gateway to TeraGrid. The LHC Computing Grid integrates resources worldwlde. Etc. Most users never ask about interoperability, in my experience.

September 10, 2008

World does not come to an end!

The Large Hadron Collider turned on today, and the world did not come to an end. Phew ...

August 13, 2008

Personal genome

Sequence100b I'm fascinated by Geoge Church's personal genome project (the name recalls another PGP), in which one volunteers to have one's genome sequenced and made available for research. Personally, I think it's a great thing and I would be delighted to participate. I can't imagine caring if I (or the world) know that I have the gene for obsessive-compulsive disorder or whatever I may turn out to have. The one thing that gives me pause is wondering whether my children or siblings should have a say in whether I publish my genome, given that they share a fair bit of it. (I don't think my parents would mind.) The PGP web site doesn't discuss that issue, although it does point out other dangers that hadn't occurred to me, e.g.:

[someone might] make synthetic DNA corresponding to the participant and plant it at a crime scene

A quick search didn't reveal too many profound thoughts on this topic, just some recognition that if is an issue. E.g., from Baylor College of Medicine:

With [personal genomes] will come a host of legal and ethical issues, said Amy McGuire, J.D., Ph.D., Assistant Professor of Medicine in BCM's Center for Medical Ethics and Health Policy.

"Sequencing a personal genome possibly will reveal information about children, parents and siblings," she said. At present, there are no real standards as to what control family members can have over sequencing of an individual's genome or its release.

And some thoughts on the opposite dimension--a physician's "duty to warn.'

August 12, 2008

Midwest Grid School @ Chicago, September 17-19

Alina Bejan just posted the following announcement:

MidWest Grid School (MWGS'08) -- Call for Participation

Please JOIN US for an exciting 3-day course in large-scale and high-performance grid computing to take place Sep 17-19, 2008, at University of Chicago, Chicago IL.

The Open Science Grid (OSG), a major national grid infrastructure, provides scientists with more than 70 production sites offering over 20,000 CPUs and 4 Petabytes of storage to advance their research. This organization includes members from particle and nuclear physics, astrophysics, bioinformatics, gravitational-wave science and computer science collaborations, all contributing to the development of the OSG and benefiting from advances in grid technology. Applications in other areas of science, such as mathematics, medical imaging and nanotechnology can also gain from the interactions with OSG through its partnership with local and regional grids or their communities’ use of the Virtual Data Toolkit software stack.

We invite you to learn more about grid and high throughput computing and its implications in various research areas through this intensive OSG course that introduces the techniques of grid and distributed computing for science and engineering with hands-on training in the use of large-scale grid computing resources.

The workshop will focus on enabling the use of OSG and TeraGrid cyberinfrastructure to perform large-scale computations and data-intensive processing in different application domains. Participants will learn how to use grids of thousands of processors and will be able to continue to use these resources for their research after the course completion.

The workshop will cover:

* Overview of distributed computing concepts and tools
* Concepts, tools, and techniques of grid computing
* Discovering and using grid resources
* Grid scheduling and distributed data management
* Techniques for workflow and collaboration

Target audience:
Undergraduate and graduate students, researchers, educators and professionals in engineering, computer science, or any scientific, data-or computing-intensive discipline may apply.

Important deadlines:

Application Deadline: Aug 30, 2008 -- now OPEN (please visit website to apply)
Notification Deadline: Sep 5, 2008
Registration Deadline: Sep 15, 2008

For more information and to apply, please visit www.opensciencegrid.org/workshop

August 10, 2008

Large Hadron Collider explained

I always wondered what my physicist friends at CERN were really up to. Now all is explained in this amusing video. These beautiful pictures are also worth a look.

August 08, 2008

Petascale computing at the Computation Institute

Petascale data-intensive computing, that is ...

We recently received an NSF Major Research Instrumentation award to acquire and operate a Petascale Active Data Store. To quote from our press release (removing at least some of the fluff):

The Computation Institute, a joint effort of the University of Chicago and the U.S. Department of Energy's Argonne National Laboratory, has received a grant for a computer system that will enable researchers to store, access and analyze massive data sets.

The system is made possible through a $1.5 million National Science Foundation grant, which includes cost-sharing support from the University of Chicago. The new system is called the Petascale Active Data Store (PADS), which has been optimized for rapid data transactions, both on campus and around the globe.

The PADS design resulted from a study of the storage and analysis requirements of groups in astronomy and astrophysics, computer science, economics, evolutionary and organismal biology, geosciences, high-energy physics, linguistics, materials science, neuroscience, psychology and sociology.

For these groups, according to the PADS team, PADS represents a significant opportunity to look at their data in new ways, enabling new scientific insights and collaborations across disciplines. PADS also will serve as a vehicle for computer science research into active data storage systems and will provide rich data to investigate new techniques.

Several nVidia Tesla graphics processing units (GPUs) will be integrated with traditional CPUs in the PADS system. These GPUs are capable of computing certain operations many times faster than general-purpose personal computers.

PADS will be a hybrid system with many layers of storage. These layers range from a large, tape-based system at Argonne to individual computers on campus and elsewhere. The intermediate layer is a rack of computer disks at Argonne containing duplicate data sets as insurance against hard-drive failure.

To University of Chicago scientists, PADS represents a dramatic improvement over current practice, which requires them to quickly analyze data and then remove it from the system to make room for new data sets. With the storage that PADS provides, groups will be able to keep data active for longer periods of analysis.