The question of how to write programs for distributed or "grid" environments has stimulated much debate. Some argue that this new environment demands new programming models and languages--and there is certainly merit in that view. However, we can also reuse well-understood models. For example, we can use the Message Passing Interface (MPI) standard to write message passing programs.
The MPI standard defines an API for sending and receiving messages, in both point-to-point and collective modes, and for such things as dynamic process creation. MPI is sometimes criticized as a low-level "assembly language," but it is more accurate to describe it as an abstract but precise notation for describing data exchanges among concurrently executing processes.
To run message passing programs on grids, consider MPICH-G2 (see paper), a grid-enabled MPI implementation developed by Nick Karonis and his colleagues. MPICH-G2 allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. It extends the Argonne MPICH implementation of MPI to use Globus services for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, network topology. Thus, the user can variously ignore or exploit knowledge of critical aspects of the heterogeneous
environment.
MPICH-G2 has been used to run scientifically important applications. One I like is a high-resolution study of blood flow in the human body: highly coupled 3-D simulations of blood flow in critical areas are placed on distinct clusters, and those simulations are coupled via a 1-D simulation of flow through the arterial system.
MPICH-G2 doesn't do everything: for example, it is not particularly fault tolerant. But if you want to run a program fast on a set of distributed computers (on a LAN, MAN, or WAN), and are prepared to accept failure of one component resulting in failure of the whole (as is often desirable, in fact), it's a powerful tool.
For more information, see: N.T. Karonis, B. Toonen, and I. Foster, "MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface," J. Parallel and Distributed Computing, vol. 63, no. 5, 2003, pp. 551–563. There are also a number of application papers available.
Recent Comments