MPICH-G2


What is MPICH-G2?

MPICH-G2 is a grid-enabled implementation of the MPI v1.1 standard. That is, using services from the Globus Toolkit® (e.g., job startup, security), MPICH-G2 allows you to couple multiple machines, potentially of different architectures, to run MPI applications. MPICH-G2 automatically converts data in messages sent between machines of different architectures and supports multiprotocol communication by automatically selecting TCP for intermachine messaging and (where available) vendor-supplied MPI for intramachine messaging.

MPICH-G2 is a complete redesign and implementation of our previous implementation MPICH-G (see How does MPICH-G differ from MPICH-G2?). It is implemented as one of the devices (called the globus2 device) of the popular MPICH library, which in turn was developed and is distributed by the MPICH group led by Bill Gropp and Ewing (Rusty) Lusk of the Mathematics and Computer Science Division at Argonne National Laboratory.

Should I use MPICH-G2?

One important class of problems is those that are distributed by nature, that is, problems whose solutions are inherently distributed, for example, remote visualization applications in which computationally intensive work producing visualization output is performed at one location, perhaps as an MPI application running on some massively parallel processor (MPP), and the images are displayed on a remote high-end (e.g., IDesk, CAVE) device. For such problems, MPICH-G2 allows you to use MPI as your programming model.

A second class of problems is those that are distributed by design, in which you have access to multiple computers, perhaps at multiple sites connected across a WAN, and you wish couple these computers int a computational grid, or simply grid. Here MPICH-G2 can be used to run your application using (where available) vendor-supplied implementations of MPI for intramachine communication and TCP for intermachine communication.

In one scenario illustrating this second class of problems, you have a cluster of workstations. Here Globus services made available through MPICH-G2 provide an environment in which you can conveniently launch your MPI application. An example is of this scenario is the Grid Application Development Software (GrADS) Project.

In another scenario you have an MPI application that runs on a single MPP but have problem sizes that are too large for any single machine you have access to. In this situation a wide-area implementation of MPI like MPICH-G2 may help by enabling you to couple multiple MPPs in a single execution. Making efficient use of the additional CPUs that are distributed across a LAN and/or WAN typically requires modifying the application to adjust to the relatively poor latency and bandwidth introduced by the intermachine communication. Two example applications are Cactus (winner of the Gordon Bell Award at SuperComputing 2001, see MPI-Related Papers) and Overflow(D2) from Information Power Grid (IPG).

MPICH-G2 feature/release history

New MPICH-G2 features in:

How do I acquire and install MPICH-G2?

In this section we discuss issues that pertain directly to the configuration and installation of MPICH with the globus2 device. This section is not intended to replace the MPICH Installation Manual distributed with MPICH. You should read that manual before installing and configuring MPICH and should use the information in this section to augment the instructions found in the manual.

Before installing MPICH-G2 you must have already installed Globus. The MPICH-G2 installation steps are slightly different for machines equipped with Globus v1.1.4 and those equipped with Globus v2.0 or later.


Once it's installed, how do I use MPICH-G2?

Before using MPICH-G2 you must have already acquired your Globus security credentials. Then, on each machine you intend to run your MPI application, Once these are done, you are ready to compile and execute your MPI application using MPICH-G2 by following these steps:
  1. Compile your application on each machine you intend to run using one of the MPICH-G2 compilers:
    C Compiler <MPICH_INSTALL_PATH>/bin/mpicc
    C++ Compiler <MPICH_INSTALL_PATH>/bin/mpiCC
    Fortran77 Compiler <MPICH_INSTALL_PATH>/bin/mpif77
    Fortran90 Compiler <MPICH_INSTALL_PATH>/bin/mpif90
    Of course, if you are planning to run only on a cluster of binary-compatible workstations that share a filesystem, it suffices to compile your program only once.

  2. Launch your application using MPICH-G2 mpirun. Every mpirun command under the globus2 device submits a Globus Resource Specification Language Script, or simply RSL script, to a Globus-enabled grid of computers. Each RSL script is composed of one or more RSL subjobs, typically one subjob for each machine in the computation. You may supply your own RSL script mpirun, or you may have mpirun construct an RSL script for you based on the arguments you pass to mpirun and the contents of your machines file (discussed below). In either case, it is important to remember that communication between nodes in different subjobs is always done over TCP/IP, while the more efficient vendor-supplied MPI is used only among nodes within the same subjob.
    You may terminate the entire job by hitting cntrl-c in your mpirun window. Be careful to hit cntrl-c only once, as hitting multiple times will foil clean termination. Be patient; terminating all the processes on all the machines cleanly can sometimes take a few minutes.

An example: Your first MPICH-G2 application

Here is an example MPI application, ring.c, and its associated Makefile. The files are shown here for your review, however if you want to download these files do not cut/paste them from the text below. This will not work as cut/paste changes tab characters into multiple spaces which will not work for make. To download these files, right click on each file link below and 'Save Link As ...'. If you download these files, edit the Makefile by changing MPICH_INSTALL_PATH to your MPICH-G2 installation directory. After editing the Makefile and following the steps in the preceding section Once it's installed, how do I use MPICH-G2?, type the following:
% make ring
% <MPICH_INSTALL_PATH>/bin/mpirun -np 4 ring
You should see the following output:
Master: end of trip 1 of 1: after receiving passed_num=4 (should be =trip*numprocs=4) from source=3
Here are the contents of ring.c and Makefile for your review, but remember, please do not cut/paste this text. Download by using the links above.

ring.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

/* command line configurables */
int Ntrips;  /* -t <ntrips> */
int Verbose; /* -v */

int parse_command_line_args(int argc, char **argv, int my_id)
{

    int i;
    int error;

    /* default values */
    Ntrips = 1;
    Verbose = 0;

    for (i = 1, error = 0; !error && i < argc; i ++)
    {
        if (!strcmp(argv[i], "-t"))
        {
            if (i + 1 < argc && (Ntrips = atoi(argv[i+1])) > 0)
                i ++;
            else
                error = 1;
        }
        else if (!strcmp(argv[i], "-v"))
            Verbose = 1;
        else
            error = 1;

    } /* endfor */

    if (error && !my_id)
    {
        /* only Master prints usage message */
        fprintf(stderr, "\n\tusage: %s {-t <ntrips>} {-v}\n\n", argv[0]);
        fprintf(stderr, "where\n\n");
        fprintf(stderr,
	    "\t-t <ntrips>\t- Number of trips around the ring.  "
	    "Default value 1.\n");
        fprintf(stderr,
            "\t-v\t\t- Verbose.  Master and all slaves log each step. \n");
        fprintf(stderr, "\t\t\t  Default value is FALSE.\n\n");
    } /* endif */

    return error;

} /* end parse_command_line_args() */

int main(int argc, char **argv)
{

    int numprocs, my_id, passed_num;
    int trip;
    MPI_Status status;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_id);

    if (parse_command_line_args(argc, argv, my_id))
    {
        MPI_Finalize();
        exit(1);
    } /* endif */

    if (Verbose)
        printf("my_id %d numprocs %d\n", my_id, numprocs);

    if (numprocs > 1)
    {
        if (my_id == 0)
        {
            /* I am the Master */

            passed_num = 0;

            for (trip = 1; trip <= Ntrips; trip ++)
            {
                passed_num ++;

                if (Verbose)
                    printf("Master: starting trip %d of %d: "
			"before sending num=%d to dest=%d\n", 
			trip, Ntrips, passed_num, 1);


                MPI_Send(&passed_num,    /* buff  */
                        1,               /* count */
                        MPI_INT,         /* type  */
                        1,               /* dest  */
                        0,               /* tag   */
                        MPI_COMM_WORLD); /* comm  */
     
                if (Verbose)
		    printf("Master: inside trip %d of %d: "
			"before receiving from source=%d\n", 
			trip, Ntrips, numprocs-1);

                MPI_Recv(&passed_num,    /* buff   */
                        1,               /* count  */
                        MPI_INT,         /* type   */
                        numprocs-1,      /* source */
                        0,               /* tag    */
                        MPI_COMM_WORLD,  /* comm   */
                        &status);        /* status */

                printf("Master: end of trip %d of %d: "
		    "after receiving passed_num=%d "
		    "(should be =trip*numprocs=%d) from source=%d\n", 
		    trip, Ntrips, passed_num, trip*numprocs, numprocs-1);
            } /* endfor */
        }
        else
        {
            /* I am a Slave */

            for (trip = 1; trip <= Ntrips; trip ++)
            {
                if (Verbose)
                    printf("Slave %d: top of trip %d of %d: "
			"before receiving from source=%d\n", 
			my_id, trip, Ntrips, my_id-1);

                MPI_Recv(&passed_num,    /* buff   */
                        1,               /* count  */
                        MPI_INT,         /* type   */
                        my_id-1,         /* source */
                        0,               /* tag    */
                        MPI_COMM_WORLD,  /* comm   */
                        &status);        /* status */

                if (Verbose)
                    printf("Slave %d: inside trip %d of %d: "
			"after receiving passed_num=%d from source=%d\n", 
			my_id, trip, Ntrips, passed_num, my_id-1);

                passed_num ++;

                if (Verbose)
                    printf("Slave %d: inside trip %d of %d: "
			"before sending passed_num=%d to dest=%d\n", 
			my_id, trip, Ntrips, passed_num, (my_id+1)%numprocs);

                MPI_Send(&passed_num,       /* buff  */
                        1,                  /* count */
                        MPI_INT,            /* type  */
                        (my_id+1)%numprocs, /* dest  */
                        0,                  /* tag   */
                        MPI_COMM_WORLD);    /* comm  */

                if (Verbose)
		    printf("Slave %d: bottom of trip %d of %d: "
			"after send to dest=%d\n",
			my_id, trip, Ntrips, (my_id+1)%numprocs);
            } /* endfor */
        } /* endif */
    } 
    else
        printf("numprocs = %d, should be run with numprocs > 1\n", numprocs);

    MPI_Finalize();

    exit(0);

} /* end main() */
Makefile
#
# assumes MPICH-G2 was installed in /usr/local/mpich
#

MPICH_INSTALL_PATH    = /usr/local/mpich

ring: force
        $(MPICH_INSTALL_PATH)/bin/mpicc -o ring ring.c

force:

clean:
        /bin/rm -rf *.o ring

Firewalls

You can use MPICH-G2 to run applications in which all the processes are on the same side of a firewall. However, if you want to run your MPICH-G2 application where processes are on opposite sides of a firewall, then you will need to make some special accomodations.

The two issues that arise in the presence of firewalls are job control (e.g., start-up, monitoring, and termination) and TCP messaging during execution. MPICH-G2 uses Globus for all both of these, and therefore, using MPICH-G2 through firewalls is really an issue of using Globus through firewalls. Therefore, we refer our MPICH-G2 users that need to run their applications through firewalls to the Globus web page on firewalls which provides an excellent description of the problem and offers a number of solutions.

Described briefly here, the basic strategy behind the solution is to have your system administrators create a small hole of port numbers in the fire wall (what are called controllable ephemeral ports on the Globus web page on firewalls) and to specify that port range with the environment variable GLOBUS_TCP_PORT_RANGE in your RSL.

Setting environment variables is described in Using mpirun by supplying your own RSL script in Once it's installed, how do I use MPICH-G2? and setting the GLOBUS_TCP_PORT_RANGE is described in Setting port range in new MPICH-G2 features in MPICH v1.2.2.3. Finally, there is a small Perl-based connection test found in Troubleshooting section to help quickly determine if processes of your MPICH-G2 application are sitting on opposite sides of a firewall.

Troubleshooting

If you did not encounter any problems in running the ring program from An example: Your first MPICH-G2 application, you may skip this section and proceed directly to the next section How does MPICH-G2 work?. On the other hand, if you did have some trouble we provide some small test programs here that strip away all of MPICH-G2 and focus on the various steps in using MPICH-G2.

These small test programs are intended to be run in sequence in the order specified immediately below. If a particular test fails you should stop the testing sequence and contact the group (e.g., Globus developers or MPICH developers) identified each test's section.

  1. A Globus-based "hello, world"
  2. A Perl-based connection test
  3. Testing vendor-supplied MPI mpirun's ability to export environment
  4. What to try if you get a failed globusrun pr_tcp assertion when trying mpirun

  1. A Globus-based "hello, world"

    This test is limited only to Globus-related issues of launching a job.

    Below is our Globus version of Kernighan and Ritchie's "hello, world" program, accompanied by instructions to make and run it. In the same spirit as K&R presented their program, we offer ours as a very small (minimal?) program designed to flush out all the details of installing and deploying Globus, acquiring Globus security credentials, registering yourself as Globus a user on each machine, etc.

    The instructions below are intended to test one machine at a time. If you are planning to run your MPICH-G2 application on many different machines, you should start by following the instructions below on one machine at a time.

    Here is a link to hello.c. The contents of hello.c is shown below for your review To download this file, right click on each file link below and 'Save Link As ...'. , however if you want to download this files do not cut/paste it from the text below. This will not work as cut/paste changes tab characters into multiple spaces which will not work for make. To download these files, right click on each file link below and 'Save Link As ...'. Here is the contents of hello.c and Makefile for your review. Download by using the links above.

    hello.c

    #include <globus_duroc_runtime.h>
    
    int main(int argc, char **argv)
    {
    
    #if defined(GLOBUS_CALLBACK_GLOBAL_SPACE)
        globus_module_set_args(&argc, &argv);
    #endif
    
        globus_module_activate(GLOBUS_DUROC_RUNTIME_MODULE);
        globus_duroc_runtime_barrier();
        globus_module_deactivate(GLOBUS_DUROC_RUNTIME_MODULE);
    
        printf("hello, world\n");
    
    }
    The instructions for making and running "hello, world" depend on the version of Globus that you are testing. Select a link from the list below based on your version of Globus.


  2. A Perl-based connection test

    This test is limited only to the ability for one CPU to socket-connect to another. It is good for detecting problems often introduced by firewalls. It is a perl program (requiring perl5) donated to this page by Brian Toonen of the Mathematics and Computer Science Division (MCS) at Argonne National Laboratory.

    Here is a link to a small perl program perl_connect which is shown here for your review, however if you want to download this file do not cut/paste them from the text below. This will not work as cut/paste changes tab characters into multiple spaces which will not work for make. To download these files, right click on each file link below and 'Save Link As ...'.


    Here is the contents of perl_connect for your review, but remember, please do not cut/paste this text. Download by using the link above.

    perl_connect
    #!/usr/bin/perl -w
    
    # Perl script to test TCP connection establishment and communication.
    # This code is based on the examples in 'man perlipc' with vastly
    # improved error checking and a few bug fixes.
    
    use strict;
    use Getopt::Long;
    use IO::Socket;
    use Sys::Hostname;
    
    my $N_MSGS = 1024;
    
    my $rc = 0;
    
    sub usage
    {
        print "usage $0 <-server | -client host:port>\n";
        exit 1;
    }
    
    my $server=0;
    my $client=0;
    GetOptions('s|server' => \$server,
               'c|client' => \$client);
    
    &usage if ($client && $server || !$client && !$server);
    &usage if ($server && $#ARGV > -1);
    &usage if ($client && $#ARGV != 0);
    
    my $EOL = "\015\012";
    
    sub logmsg 
    {
        print "$0 $$: @_ at ", scalar localtime, "\n";
    }
    
    sub s_catch_int
    {
        close Server;
        logmsg "caught Ctrl-C...terminating server";
        exit 0;
    }
    
    sub errnoprn
    {
        printf "errno=%d, %s\n", $!, $! if ($! != 0);
    }
    
    sub dieprn
    {
        print "@_\n";
        &errnoprn;
        exit 1;
    }
    
    if ($server)
    {
        my ($tcp_proto, $s_sockaddr, $s_addr, $s_host, $s_port,
            $c_sockaddr, $c_addr, $c_host, $c_port);
    
        print "$0: establishing server...";
        ($tcp_proto = getprotobyname "tcp")
            || &dieprn("failed protocol name lookup");
        (socket Server, PF_INET, SOCK_STREAM, $tcp_proto) 
            || &dieprn("failed to obtain a socket");
        $SIG{INT} = \&s_catch_int;
        (bind Server, (sockaddr_in 0, INADDR_ANY))
            || &dieprn("failed to bind socket to port");
        ($s_sockaddr = getsockname Server)
            || &dieprn("unable to obtain socket address");
        (($s_port, $s_addr) = sockaddr_in $s_sockaddr)
            || &dieprn("unable to obtain port number");
        ($s_host = gethostbyaddr $s_addr, AF_INET)
            || ($s_host = hostname)
                || &dieprn("unable to get hostname");
        (listen Server, SOMAXCONN)
            || &dieprn("error establishing listener on socket");
        print "established on $s_host:$s_port\n";
    
        logmsg "server started on port $s_port";
    
        while (1)
        {
            if ($c_sockaddr = accept Client, Server)
            {
                ($c_port,$c_addr) = sockaddr_in $c_sockaddr;
                ($c_host = gethostbyaddr $c_addr, AF_INET)
                    || ($c_host = inet_ntoa $c_addr) ;
                logmsg "connection established from $c_host:$c_port";
    
                for (my $i = 0; $i < $N_MSGS; $i++)
                {
                    if (!(print Client "Hello there, $c_host, it's now ",
                          scalar localtime, $EOL))
                    {
                        my $msg;
    
                        if ($! != 0)
                        {
                            $msg = sprintf "ERROR sending message to " .
                                "$c_host:$c_port (errno=%d, %s)", $!, $!;
                        }
                        else
                        {
                            $msg = "ERROR sending message to $c_host:$c_port";
                        }
                        logmsg $msg;
                        last;
                    }
                }
                logmsg "messages successfully sent to $c_host:$c_port";
    
                if (close Client)
                {
                    logmsg "connection to $c_host:$c_port successfully closed";
                }
                else
                {
                    my $msg;
    
                    if ($! != 0)
                    {
                        $msg = sprintf "ERROR closing connection to " .
                            "$c_host:$c_port (errno=%d, %s)",
                            $!, $!;
                    }
                    else
                    {
                        $msg = "ERROR closing connection to $c_host:$c_port";
                    }
                    logmsg $msg;
                }
            }
        }
    }
    else
    {
        my ($tcp_proto, $sockaddr, $addr, $host, $port);
    
        &usage if (!($ARGV[0] =~ /^([^:]+):(\d+)$/));
        $host = $1; $port = $2;
    
        print "$0: attempting to connect to $host:$port...";
        ($tcp_proto = getprotobyname "tcp")
            || &dieprn("failed protocol name lookup");
        ($addr = inet_aton($host))
            || &dieprn("name lookup failed");
        ($sockaddr = sockaddr_in($port, $addr))
            || &dieprn("sockaddr failed");
        (socket Sock, PF_INET, SOCK_STREAM, $tcp_proto) 
            || &dieprn("failed to obtain a socket");
        (connect Sock, $sockaddr)
            || &dieprn("connection failure");
        print "connection established\n";
    
        my $n = 0;
        $! = 0;
        while(<Sock>)
        {
            $n++;
            if ($! != 0)
            {
                print "Error reading messages from the connection\n";
                &errnoprn;
                exit 1;
            }
        }
        if ($n < $N_MSGS)
        {
            print "ERROR: fewer messages received ($n) than expected ($N_MSGS)\n";
            $rc = 1;
        }
        else
        {
            print "All messages received.\n";
        }
    
        if (close Sock)
        {
            print "Connection with $host:$port successfully closed.\n";
        }
        else
        {
            print "ERROR closing the connection.\n";
            &errnoprn;
            exit 1;
        }
    }
    
    exit $rc;
    Here is how to run this test. The instructions below are intended to test two machines at a time.

    1. Launch the server on the first machine,
      % perl perl_connect -server
      perl_connect: establishing server...established on pitcairn.mcs.anl.gov:62574
      perl_connect 27936: server started on port 62574 at Fri Jan 25 11:24:28 2002
      
      You will see something other than pitcairn.mcs.anl.gov:62574 at the end of the first output line. You should see the host:port of your first machine.
    2. While the server is still running on the first machine, launch the client on the second machine,
      % perl perl_connect -client pitcairn.mcs.anl.gov:62574
      perl_connect: attempting to connect to pitcairn.mcs.anl.gov:62574...connection established
      All messages received.
      Connection with pitcairn.mcs.anl.gov:62574 successfully closed.
      %
      Of course, you should replace pitcairn.mcs.anl.gov:62574 with the host:port that appeared at the end of the first line of output when you started the server in step 1 above.

      If the client produces both the All messages received. and the Connection with ... successfully closed. then the test ran successfully.
    3. Kill the server by typing control-c.
    4. Repeat steps 1-3 this time running the server on the machine you had just run the client on and vice versa.

    If the perl_connect test above did not run correctly then the problem is not rooted in Globus, MPICH, or MPICH-G2. It may be a problem with firewall(s). If you know that one or both of the machines are sitting behind a firewall, or you suspect that there may be a problem with firewalls, then try reading Notes on getting MPICH running on RH 7.2 written by Rob Ross of the Mathematics and Computer Science Division (MCS) at Argonne National Laboratory.

    If after reading that documentation you still believe that you are having firewall problems, you should pursuit the problem. Start by contacting your local Globus administrator and, if necessary, continue by checking the Globus Toolkit Error FAQ.

    If you still don't know what the problem is, contact the Globus Developers by submitting a Globus problem report form found there specifying GlobusIO as the "product" that you are having trouble with. Do not specify MPICH-G2. This will only serve to slow down our repsonse time by having us route the problem away from MPICH-G2.

    Back to Troubleshooting

  3. Testing vendor-supplied MPI mpirun's ability to export environment

    This is a test to determine if environment variables are passed to an application when it is launched using mpirun. It should be used run to test the vendor-supplied MPI that was used to build the MPI flavor of Globus (i.e., not to test MPICH-G2). For MPICH-G2 to successfully use a vendor-supplied MPI that vendor-supplied MPI's mpirun must pass environment variables to the MPI application.

    Here is a link to a small program tenv.c which is shown here for your review. You may cut/paste this program or right click on the file link below and 'Save Link As ...'.


    tenv.c
    #include <mpi.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    main(int argc, char **argv)
    {
     
        char *value;
     
        MPI_Init(&argc, &argv);
    
        if (value = getenv("FOO"))
        {
    	printf("the value for env var FOO=%s\n", value);
        }
        else
        {
    	printf("env var FOO is not defined\n");
        }
    
        MPI_Finalize();
    
    } /* end main() */
    Here is how to run this test.

    1. Compile and link the tenv.c using the vendor-supplied MPI C compiler (i.e, not MPICH-G2's mpicc). This must be the same MPI that was used to create the MPI flavor of Globus that was, in turn, used to configure MPICH-G2.
      % mpicc -o tenv tenv.c
    2. Unset the environment variable FOO.
      % unsetenv FOO
    3. Using the vendor-supplied mpirun launch the program.
      % mpirun tenv
      env var FOO is not defined
      %
    4. Set the environment variable FOO.
      % setenv FOO bar
    5. Run the program again using the vendor-supplied mpirun.
      % mpirun tenv
      the value for env var FOO=bar
      %
    If the tenv test did not run as shown above then MPICH-G2 cannot be configured using that MPI flavor of Globus. You will need to contact the authors of the underlying vendor-MPI or possibly have your Globus system administrator modify the Globus Job Manager at your site to "push" the environment variables into your application.

    Back to Troubleshooting

  4. What to try if you get a failed globusrun pr_tcp assertion when trying mpirun

    This is a test to determine if environment variables are passed If after you type your mpirun command you see an error message that is similar to this:
    globusrun: pr_tcp.c:1548: outgoing_open: Assertion `rc == 0' failed.
    
    It is most likely caused by the fact that some or all of your compute nodes do not return fully qualified domain names (FQDN) in response to a call to gethostname(). This can easily be tested with the following program phost.c and its associated Makefile. The files are shown here for your review, however if you want to download these files do not cut/paste them from the text below. This will not work as cut/paste changes tab characters into multiple spaces which will not work for make. To download these files, right click on each file link below and 'Save Link As ...'.

    phost.c
    #include <globus_common.h>
    
    int main(int argc, char **argv)
    {
        char hostname[1024];
        if (globus_libc_gethostname(hostname, 1024))
        {
    	globus_libc_fprintf(stderr,
    	    "ERROR: failed globus_libc_gethostname()");
    	exit(1);
        } /* endif */
        globus_libc_fprintf(stdout, "hostname >%s<\n", hostname);
    
    } /* end main() */
    Makefile
    include makefile_header
    
    phost:
    	$(GLOBUS_CC) $(GLOBUS_CFLAGS) $(GLOBUS_INCLUDES) -c phost.c
    	$(GLOBUS_LD) -o phost phost.o \
    	$(GLOBUS_LDFLAGS) \
    	$(GLOBUS_PKG_LIBS) \
    	$(GLOBUS_LIBS)
    Before using the Makefile you must create a file called makefile_header using the Globus tool globus-makefile-header specifying one of the Globus flavors at your installation. You should select the same Globus flavor you intend to use when configuring MPICH-G2. Here is an example of how you must use globus-makefile-header to create makefile_header specifying a gcc32dbg as the flavor:
     
    % $GLOBUS_LOCATION/sbin/globus-makefile-header -flavor=gcc32dbg \
    globus_common globus_gram_client globus_io globus_data_conversion \
    globus_duroc_runtime globus_duroc_bootstrap > makefile_header
    1. Download phost.c and Makefile using the links above.
    2. Use globus-makefile-header to create the file makefile_header as described above.
    3. Compile the phost.c. NOTE: You are not using MPICH-G2's mpicc.
      	% make phost
    4. Run your phost.
      	% grid-proxy-init
      	% globusrun -o -r "m1.utech.edu" \
      	'&(count=1)(executable=/home/smith/phost)'
    If the hostname that gets printed is not a fully qualified domain name then that is the problem (globusrun requires the FQDNs of all compute nodes). There are two possible solutions. You can either re-configure your compute nodes so that they do return FQDNs in response to gethostname() or you can specify the domain name in the environment variable GLOBUS_DOMAIN_NAME in your RSL like this:
    	% globusrun -o -r "m1.utech.edu" \
    	'&(count=1)(environment=(GLOBUS_DOMAIN_NAME "utech.edu"))(executable=/home/smith/phost)'

    This should return a FQDN and so if you do not re-configure your compute nodes you need only specify a value for the environment variable GLOBUS_DOMAIN_NAME in all your RSLs that run on that machine.

    Back to Troubleshooting

How does MPICH-G2 work?

Here we provide an overview of how MPICH-G2 works: how it interfaces with the vendor's MPI, how it uses Globus services, and so on. We included this section for the curious; it may be skipped by the casual reader. Our intention in providing this overview is to enhance the reader's understanding of MPICH-G2 and, hence, its strengths and weaknesses.

  1. Compiling MPICH-G2 and your application, and linking with the vendor's MPI

    Configuring MPICH-G2 with an "mpi" flavor of Globus (see How do I acquire and install MPICH-G2?) implicitly declares that all programs linked with MPICH-G2 will include the vendor's implementation of MPI, (i.e., the vendor's MPI library). This, of course, presents an immediate linking problem in that MPICH-G2 is itself an MPI library.

    Our solution preprocesses MPICH-G2 source code (with the exception of one file) and all C/C++ application source code renaming all C-binding MPI symbols from {P}MPI_xxx to {P}MPQ_xxx. The one MPICH-G2 file spared prepreprocessing is the "wrapper file," which houses all of MPICH-G2's calls to the vendor's MPI. The C-binding MPI symbols are renamed in C/C++ source files using the CPP with a sequence of #define statements. Fortran{77,90} source files are not preprocessed, and C++ references to MPI symbols are also left untouched. Fortran{77,90} and C++ symbols are resolved by making sure that the MPICH libraries appear before vendors' MPI libraries.

    This preprocessing is presumably a "safe" practice in that according to Sections 2.5 "Language Binding" and 2.5.2 "C Binding Issues" of the MPI v1.1 standard:

    "Programs must not declare variables or functions with name beginning with the prefix, MPI_."

  2. Launching your application with MPICH-G2's mpirun

    You can use MPICH-G2's mpirun in two ways to launch your application (see Once it's installed, how do I use MPICH-G2?): you may write your own Globus RSL script and submit that directly to mpirun, which in turn passes your RSL script directly to globusrun (a Globus utility), or you may use mpirun with its arguments as they are explained in the MPICH User's Manual, in which case mpirun writes its own Globus RSL script and submits it to globusrun. Either way, MPICH-G2 jobs are launched by passing a Globus RSL script to globusrun.

  3. During execution

    MPI_Init has a Globus-enforced (DUROC) barrier that waits for all processes, across all machines, to be loaded and start execution before proceeding. Thereafter MPICH's design distills all communication (including collective operations) into its constituent point-to-point components before passing them on to the lower-level device. The globus2 device is therefore presented with only point-to-point communication requests. The choice of protocol (TCP or vendor-supplied MPI) is based on the source/destination, that is, vendor-supplied MPI for intramachine messaging (assuming MPICH-G2 was configured with an "mpi" flavor of Globus) and TCP for all other messaging. In situations where MPI_ANY_SOURCE is specified on a receive, both TCP and the vendor's MPI are polled for incoming messages until the receive is satisfied. The following additional points about MPICH-G2's implementation of point-to-point communication are noteworthy:

Things that don't work or are missing in MPICH-G2

Problem: MPICH-G2 does not work with GT 3.2 or GT 3.2.1
GT 3.2 has moved its implementation of GlobusIO (something MPICH-G2 relies very heavily on) atop of the new Globus XIO. This triggered a new bug which caused MPICH-G2 to "hang" when configured with either GT 3.2 or GT 3.2.1.
Solution: The problem has been fixed and the revised code will be distributed with GT 3.2.2 and later. In the meantime, if you have GT 3.2 you need to upgrade to GT 3.2.1. With a GT 3.2.1 install you can apply two update packages to GT 3.2.1 found on www-unix.globus.org/toolkit/advisories.html. Follow the instructions on that page to acquire and apply the two packages identified as "globus_io-5.5" and "globus_nexus-6.5" (both with date 2004-08-12 and both say "for mpich-g2" in their descriptions).
 
Problem: MPI_PACKED
According to Sections 3.3.1 "Type Matching Rules" and 3.13 "Pack and Unpack" of the MPI v1.1 standard, type MPI_PACKED matches any other type. That is, a message sent with any type (including MPI_PACKED) can be received using the type MPI_PACKED, and a message sent as MPI_PACKED can be received as the message's constituent type.

We assume that a vendor's implementation of MPI_Pack is to essentially perform a memcpy. Under that assumption, we can meet the standard as it is stated above for both inter- and intramachine messaging.

Under the standard it is possible to send data as MPI_PACKED, receive it as MPI_PACKED, and then forward the packed data to a third process by sending it as MPI_PACKED. This will not always work in MPICH-G2. Forwarding MPI_PACKED data in this manner will work as long as the protocol is homogeneous throughout the forwarding chain (e.g., all TCP or all vendor-MPI). Forwarding MPI_PACKED data will definitely fail in a heterogeneous protocol forwarding chain; for example, it will fail if process 0 sends MPI_PACKED data to process 1 over vendor-MPI and then process 1 sends the same buffer also as MPI_PACKED data to process 2 over TCP.
Solution: None.
 
Problem: MPI_{Cancel,Wait}
MPICH-G2, like many other MPI libraries, uses an "eager" protocol (data is transferred to the receiver before a matching receive is posted) for TCP messaging. Under an eager protocol, cancelling a send (MPI_Cancel) requires communication with the intended receiver in order to free allocated buffers. The following is a quote from MPI v1.1 standard, section "3.8. Probe and Cancel", about MPI_{Cancel,Wait}:

"If a communication is marked for cancellation, then a MPI_WAIT call for that communication is guaranteed to return, irrespective of the activities of other processes (i.e., MPI_WAIT behaves as a local function). ..."

Under an eager protocol, satisfying the statement above, in particular, MPI_Wait returning "irrespective of the activities of other processes" on most systems requires interrupting the intended receiver, (i.e., an asynchronous listener on all machines). MPICH-G2's implementations of MPI_{Cancel,Wait} (as well as most other MPICH devices and many other MPI implementations) is not compliant with this when waiting for the cancellation of TCP-sent messages. In MPICH-G2 cancelling a send (MPI_Cancel) marks the request for cancellation and returns immediately, but MPI_Wait on a cancelled TCP send might wait for the intended receiver, who may be in a deep computational loop, to make its next MPI call.
Solution: None (or maybe relax the standard?). In the future (see Future Work) we plan to make the globus2 thread safe, which will allow users to configure MPICH-G2 with a "threaded" flavor of Globus. MPICH-G2 will then comply with the standard in that the MPI_Wait that follows an MPI_Cancel will return immediately, "irrespective of the activities of other processes."
 
Problem: MPI_LONG_DOUBLE
Section 3.2.2 "Message Data" of the MPI v1.1 standard lists C type MPI_LONG_DOUBLE as a required datatype. MPICH-G2 does not support MPI_LONG_DOUBLE for TCP messages. Although "long double" is part of the ANSI C standard it has not been added to the Globus data conversion library. Intramachine messages over vendor-supplied MPI are passed directly to the vendor's MPI, so if they support MPI_LONG_DOUBLE, then so too does MPICH-G2.
Solution: When "long double" support is added to the Globus data conversion library, then MPICH-G2 will support MPI_LONG_DOUBLE for TCP messages.
 
Problem: stdout/stderr on MPI_Abort
When calling MPI_Abort, stdout/stderr are not always flushed unless the user explicitly flushes (fflush) both prior to calling MPI_abort, and even then, the data is sent to stdout/stderr of the other processes
Solution: This is a bug in one of the Globus services (GASS) used by MPICH-G2. We are aware of the problem, and a future Globus patch should fix it. In the meantime, writing your own RSL (see Once it's installed, how do I use MPICH-G2?) and specifying "(stdout=..)" and "(stderr=...)" in each subjob tends to alleviate (not eliminate) the problem by getting more of the data out.
 
Problem: exit code on MPI_Abort
The exit code passed to MPI_Abort does not get propagated back to mpirun.
Solution: This is a limitation of the Globus job startup mechanisms. We are working on those portions of Globus that would enable the exit code to be propagated back to mpirun.
 
Problem: MPICH test suite fails
Some of the tests in the test suite distributed with MPICH fail.
Solution: If you have configured MPICH-G2 with an "mpi" flavor of Globus and have written a <MPICH_INSTALL_PATH>/bin/machines file that induces vendor-supplied MPI communication for intramachine messaging (see How do I acquire and install MPICH-G2), then there is a good chance that the test is failing because the underlying vendor-supplied implementation of MPI is incorrect. To test this, make and run the failing test using the vendor's implementation of MPI (i.e., not MPICH-G2). If the test fails using the vendor's MPI, then that is likely the reason why it is failing under MPICH-G2.
 
Problem: silent loss of information
MPICH-G2 automatically converts data in messages passed between machines with different data representations (i.e., big- vs. little-endian). This data conversion can sometimes results in a loss of information. For example, an "unsigned long" may be 64 bits on one machine and only 32 bits on another. A message sent from the 64-bit to the 32-bit machine containing an "unsigned long" whose value is >= 232 will lose information as a result of data conversion. Further, this loss of information will occur silently (i.e., no error/warning messages).
Solution: Of course, there is nothing that MPICH-G2 can do about this loss of information. However, in the future (see Future Work) we plan to provide optional mechanisms by which users will be notified (e.g., warning messages to stderr) when information is lost.
 
Problem: mpirun under Linux
When using MPICH-G2's mpirun on a Linux platform such that mpirun is constructing an RSL script for you (see Once it's installed, how do I use MPICH-G2?) you may find that MPICH-G2's mpirun does not work and reports many syntax error messages of the form "integer expression expected before -eq". This is due to a bug in the the Linux shell. As described above, each line of the machines file must name a Globus jobmanager service and then optionally end with an integer value (default value of 1). When you omit the optional integer in your machines file, Linux shells cannot correctly parse the machines file and you get the errors above.
Solution: Locate the machines file your mpirun command is using. In most cases this will be <MPICH_INSTALL_PATH>/bin/machines, but it could be a file named "machines" in the directory in which you typed mpirun or a file that you explicitly named with -machinefile on your mpirun command line. In any case, locate that file and edit it by explicitly ending each line of the file with an integer. If there was no integer at the end of the line, then placing a 1 there is semantically equivalent. Always be sure to surround each Globus job manager service (typically a machine name) in "double quotes", like this:
"m1.utech.edu" 10
"m2.utech.edu" 5
 

How does MPICH-G2 differ from MPICH-G?

MPICH-G2, like MPICH-G, still uses many Globus services (e.g., job startup, security, data conversion, etc.). The major difference between MPICH-G and MPICH-G2 is that we have removed Nexus (a Globus component which was used for all communication in MPICH-G) from MPICH-G2. While Nexus provided the communication infrastructure for MPICH-G for many years and had many attractive features (e.g., multiprotocol support with highly tuned TCP support and automatic data conversion), there were other attributes of Nexus that could be improved. MPICH-G2 now handles all communication directly by reimplementing the good things about Nexus and improving the others. For a quantitative comparison of MPICH-G2 and MPICH-G see MPICH-G2 Performance Evaluation. Here is a summary of what the changes bring.

MPICH-G2 Performance Evaluation

We evaluated MPICH-G2 using the performance tool mpptest (distributed with MPICH in examples/perftest) on an SGI Origin 2000 (denali) the IBM SP (quad) and at Argonne National Laboratory's Center for Computational Science and Technology (CCST), and for LAN TCP/IP evaluation, a pair of SUN workstations in Argonne's Mathematics and Computer Science Division. In all experiments MPICH-G2 and MPICH-G were configured with non-threaded and no-debug flavors of Globus v1.1.4, and unless otherwise noted, MPICH-G2 was configured using mpi flavors of Globus.

On the SGI and IBM machines we conducted three separate sets of experiments each exercising different MPICH-G2 receive behavior (see How does MPICH-G2 work?),
For each set of experiments we present the results in a sequence of 7 graphs each time increasing the range of the independent variable Message Size from 1KB, 2KB, 16KB, 32KB, 64KB, 512KB, and finally 1MB. We chose to present the results in this way to provide a better, more detailed, view at various message sizes. Here are convenience links to our graphs:

SGI Experiments - MPICH-G2, MPICH-G, and SGI-MPI - Specified

Back to MPICH-G2 Performance Evaluation

SGI Experiments - MPICH-G2, MPICH-G, and SGI-MPI - Specified-pending

Back to MPICH-G2 Performance Evaluation

SGI Experiments - MPICH-G2, MPICH-G, and SGI-MPI - Non-specified (MPI_ANY_SOURCE)

Back to MPICH-G2 Performance Evaluation



IBM Experiments - MPICH-G2 and IBM-MPI - Specified

Back to MPICH-G2 Performance Evaluation

IBM Experiments - MPICH-G2 and IBM-MPI - Specified-pending

Back to MPICH-G2 Performance Evaluation

IBM Experiments - MPICH-G2 and IBM-MPI - Non-specified (MPI_ANY_SOURCE)

Back to MPICH-G2 Performance Evaluation



LAN Experiments - TCP/IP - MPICH-G2, MPICH-G, and MPICH with p4

Back to MPICH-G2 Performance Evaluation

Future Work

MPI-Related Papers

MPICH's list of MPI-related papers

Our MPI-related papers:

Our Application papers:

How to contact us

Related Globus topics

Project Sponsors

MPICH-G2 is supported by the following government agencies.

The National Science Foundation, through the Division of Advanced Networking Infrastructure Research of the Directorate for Computer and Information Science Engineering.

The Mathematical, Information, and Computational Sciences Division of the Office of Science, U.S. Department of Energy.

Acknowledgements

MPICH-G2, specifically the globus2 device, was written by Many of the MPICH-G2 precompile ideas that enabled the use of vendor-supplied MPI for intramachine messaging came from discussions between Olle Mulmo, Warren Smith, Nick Karonis, and Brian Toonen. In fact, the precompile idea was prototyped by Olle Mulmo and first used by Warren Smith on a Cray T3E.

MPICH-G2 is the successor of MPICH-G, which was originally designed and implemented by Jonathan Geisler while he was in MCS at Argonne. Later MPICH-G was adapted by George Thiruvathukal (also while he was in MCS at Argonne) and further developed. Finally, MPICH-G was passed on to Nick Karonis.

We thank Bill Gropp, Ewing (Rusty) Lusk, Rajeev Thakur, Debbie Swider, David Ashton, and Anthony Chan of the MPICH group at ANL for their guidance, assistance, insight, and many discussions. Their input was a valuable contribution to this work.

We thank Sebastien Lacour for many of the additions in the MPICH v1.2.2.3 release. In particular, we thank him for implementing the topology-aware collective operations, the topology discovery mechanisms, MPICH-G2 over MPICH-based vMPI, and the perftest collective operations. His insight and understanding of these issues coupled with his ingenuity proved invaluable in the implementation of these additions.

We also thank Sebastien Lacour for his efforts in conducting the performance evaluation and preparing all the graphs. We thank the MCS Division at Argonne for the use of the resources in their Center for Computational Science and Technology (CCST) in conducting those experiments. In particular we thank Sandra Bittner and the rest of the MCS Systems Group for their cooperation and patience in providing us exclusive access to CCST resources so that we may collect our performance data.