charm_mpi#
Module to work with CHarm on distributed memory systems.
The routines are split into three categories. Functions related to:
spherical harmonic coefficients (
charm_shc),evaluation points (
charm_point) anderror handling (
charm_err).
Except for routines from this module, structures charm_shc, charm_point and charm_err distributed among MPI processes should only enter the following functions:
charm_shc_free()(to freecharm_shcreturned by functions from this module),charm_crd_point_free()(to freecharm_pointreturned by functions from this module),charm_err_free(),charm_err_reset(),charm_err_handler()(to free, reset and handle errors incharm_err),charm_shs_point()(to perform distributed spherical harmonic synthesis),charm_sha_point()(to perform distributed spherical harmonic analysis).
The rest of the CHarm’s routines requires non-distributed structures charm_shc, charm_point and charm_err.
For each function in this module, an example usage is provided. Note that we often use if statements to feed MPI processes by data. While this is perfectly valid, this is not usually done in practice. Still, this approach is hopefully beneficial for learning purposes.
Note
All routines from this module are collective communication functions. They must therefore always be called by all participating MPI processes. Furthermore, using the structures charm_shc, charm_point and charm_err returned by the routines from this module as inputs to functions from other modules (e.g., charm_shs_point()) makes that particular routines collective communication functions.
Note
The performance of charm_sha_point() and charm_shs_point() depends enormously on charm_glob_shc_block_nmax_multiplier, charm_glob_sha_block_lat_multiplier and charm_glob_shs_block_lat_multiplier. To properly tune these parameters, see the documentation and the cookbook.
Note
If CHarm is compiled with the MPI support (--enable-mpi), spherical harmonic transforms on shared-memory systems will generally be slightly slower compared to when the MPI support is disabled. This is true if even none of the data are actually distributed. This is the price to be paid for having the charm_glob_sha_block_lat_multiplier and charm_glob_shs_block_lat_multiplier variables, which require to dynamically allocate memory in the computationally most expensive parts of CHarm.
Note
An MPI implementation is required to support the MPI standard version 3.0 (September 2012) or newer.
Note
This documentation is written for double precision version of CHarm.
Spherical harmonic coefficients
-
charm_shc *charm_mpi_shc_init(unsigned long nmax, double mu, double r, double *c, double *s, size_t local_nchunk, const unsigned long *local_order, MPI_Comm comm, charm_err *err)#
The same as
charm_shc_init()but forcharm_shcdistributed among MPI processes incomm.Function parameters that are new with respect to
charm_shc_init()are explained below.The
candspointers must have an access to the number of elements returned bycharm_mpi_shc_local_ncs(). The ordering of spherical harmonic coefficients incandsdepends onlocal_order(seecharm_shc.local_order).The example below illustrates how to create a distributed
charm_shcstructure with this function. We assume thatrankrepresents the rank of MPI processes and the size of the MPI communicatorcommis3.// The maximum harmonic degree of spherical harmonic coefficients unsigned nmax = 10; // The number of chunks of spherical harmonic coefficients for // individual MPI processes size_t local_nchunk; if (rank == 2) local_nchunk = 1; else local_nchunk = 2; // Now define the chunks of spherical harmonic coefficients by // specifying their first and last spherical harmonic order unsigned long local_order[4]; if (rank == 0) // Process with rank "0" { // The first chunk will consist of all spherical harmonic // coefficients of order "0" local_order[0] = 0; local_order[1] = 0; // The second chunk will hold all coefficients of orders from // "1" to "2" local_order[2] = 1; local_order[3] = 2; } else if (rank == 1) // Process with rank "1" { // The first chunk at another MPI process will consist of all // coefficients of orders from "3" to "5" local_order[0] = 3; local_order[1] = 5; // The second chunk consists of all coefficients of order "6" local_order[2] = 6; local_order[3] = 6; } else if (rank == 2) // Process with rank "2" { // This process has one chunk only, starting at order "7" and // ending at "10" local_order[0] = 7; local_order[1] = 10; } // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); // "local_ncs" is the total amount of spherical harmonic // coefficients that is stored locally by the individual MPI // processes size_t local_ncs = charm_mpi_shc_local_ncs(nmax, local_nchunk, local_order, err); charm_err_handler(err, 1); // Allocate arrays that will store spherical harmonic coefficients double *c = (double *)malloc(local_ncs * sizeof(double)); if (c == NULL) exit(1); double *s = (double *)malloc(local_ncs * sizeof(double)); if (s == NULL) exit(1); // You can now fill "c" and "s" with some reasonable data. // Alternatively, you can do this with "shcs->c" and "shcs->s" // after "shcs" is created below. It doesn't matter, as "c" points // to the same location in the memory as "shcs->c". The same is // also true for "s" and "shcs->s". // Create a distributed "charm_shc" structure charm_shc *shcs = charm_mpi_shc_init(nmax, 1.0, 1.0, c, s, local_nchunk, local_order, comm, err); charm_err_handler(err, 1); // Now is the time to do some heavy lifting with "shcs" // Once you no longer need "shcs", free it as usually charm_shc_free(shcs); free(c); free(s); charm_err_free(err);
- Parameters:
local_nchunk – [in] See
charm_shc.local_nchunk.local_order – [in] See
charm_shc.local_order.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_shcstructure. If one or more processes failed, all processes return aNULLpointer and an error message is written toerr.
-
charm_shc *charm_mpi_shc_malloc(unsigned long nmax, double mu, double r, size_t local_nchunk, const unsigned long *local_order, MPI_Comm comm, charm_err *err)#
The same as
charm_shc_malloc()but forcharm_shcdistributed among MPI processes incomm.For an example usage, see
charm_mpi_shc_init(). All you need to change is to replacecharm_mpi_shc_init()bycharm_mpi_shc_malloc()and not to usecandsas input parameters tocharm_mpi_shc_malloc().Function parameters that are new with respect to
charm_shc_malloc()are explained below.- Parameters:
local_nchunk – [in] See
charm_shc.local_nchunk.local_order – [in] See
charm_shc.local_order.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_shcstructure. If one or more processes failed, all processes return aNULLpointer and an error message is written toerr.
-
charm_shc *charm_mpi_shc_calloc(unsigned long nmax, double mu, double r, size_t local_nchunk, const unsigned long *local_order, MPI_Comm comm, charm_err *err)#
The same as
charm_mpi_shc_malloc()but all spherical harmonic coefficients are initialized to zero.
-
size_t charm_mpi_shc_local_ncs(unsigned long nmax, size_t local_nchunk, const unsigned long *local_order, charm_err *err)#
Returns the number of \(\bar{C}_{nm}\) coefficients that are stored locally by the
charm_shcstructure when its maximum harmonic degree isnmaxand there arelocal_nchunklocal chunks given by harmonic orderslocal_order. The same number of coefficients applies to \(\bar{S}_{nm}\), too.Tip
There are two use cases of this routine.
Use it before calling
charm_mpi_shc_init()to get the number of elements that thecandsinput pointers must have access to.Use it before calling
charm_mpi_shc_malloc()orcharm_mpi_shc_calloc()to get the number of elements that will be locally allocated tocharm_shc.candcharm_shc.s.
- Parameters:
nmax – [in] Maximum harmonic degree of the
charm_shcstructure.local_nchunk – [in] See
charm_shc.local_nchunk.local_order – [in] See
charm_shc.local_order.err – [out] Error reported by the function (if any).
- Returns:
On success, returned is the total number of \(\bar{C}_{nm}\) and \(\bar{S}_{nm}\) coefficients that CHarm associates with the local portion of
charm_shc.candcharm_shc.sfor a given chunk(s). On error,0is returned and the error message is written toerr.
Evaluation points
-
charm_point *charm_mpi_crd_point_init(int type, size_t local_nlat, size_t local_nlon, size_t local_0_start, double *lat, double *lon, double *r, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_init()but forcharm_pointdistributed among MPI processes incomm.Function parameters that are new with respect to
charm_crd_point_init()are explained below. Thelatandrpointers must have an access tolocal_nlatelements. Thelonpointer requireslocal_nlonelements.Examples:
Scattered points:
Assume that you have
6scattered points that you want to distribute among3MPI processes. Let the latitudes of the points belat0,lat1,...,lat5. Assume that you want to have, say, the first three latitudes at the MPI process with rank0, then two points at rank1and the last point at rank2. Finally, let’s assume thatrankdenotes the rank of MPI processes andcommis the MPI communicator. Then, you could write:size_t local_nlat, local_0_start; if (rank == 0) { local_nlat = 3; local_0_start = 0; } else if (rank == 1) { local_nlat = 2; local_0_start = 3; } else if (rank == 2) { local_nlat = 1; local_0_start = 5; } double *lat = (double *)malloc(local_nlat * sizeof(double)); if (lat == NULL) exit(1); size_t local_nlon = local_nlat; double *lon = (double *)malloc(local_nlon * sizeof(double)); if (lon == NULL) exit(1); double *r = (double *)malloc(local_nlat * sizeof(double)); if (r == NULL) exit(1); // At the process with "rank == 0", set "lat[0]", "lat[1]" and // "lat[2]" to the first three latitudes of your six points, "lat0", // "lat1", "lat2". // At the process with "rank == 1", set "lat[0]" and "lat[1]" to // the next two latitudes of your six points, "lat3" and "lat4". // At the process with "rank == 2", set "lat[0]" to // the last latitude, "lat5". // Similarly, set the longitudes and spherical radii. // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); charm_point *pnt = charm_mpi_crd_point_init(CHARM_CRD_POINT_SCATTERED, local_nlat, local_nlon, local_0_start, lat, lon, r, comm, err); charm_err_handler(err, 1); // Once you no longer need "pnt", free it as usually charm_crd_point_free(pnt); charm_err_free(err);
Non-symmetric point grids:
Assume that you have the same six latitudes as in the previous example with scattered points and you also have ten longitudes, so that your grid has
6times10points. Finally, assume that your grid is non-symmetric with respect to the equator (details on symmetric and non-symmetric grids here).You can initialize the
charm_pointstructure in the same way as in the example with scatter points with two differences. First, your value oflocal_nlonwill now be10and you must set of course all the ten elements oflonto your longitudes. All ten longitudes must be equal at all MPI processes. Second, you need to set the first input parametertypeincharm_mpi_crd_point_inittoCHARM_CRD_POINT_GRID.Symmetric point grids:
There is a special way to tell CHarm that your grid is symmetric with respect to the equator. As a general rule, a symmetric grid must be distributed across MPI processes in such a way that local portions of the grid at all processes are symmetric (details on symmetric and non-symmetric grids here).
Let’s say that you have three MPI processes in
command with ranksrank. You have7grid latitudes1.5,1.0,0.5,0.0,-0.5,-1.0,-1.5. You also have10longitudes and7spherical radii (their values are irrelevant here). Then, you can write (observe carefully the variablelocal_0_start):size_t local_nlat, local_0_start; if (rank == 0) { // This process will hold latitudes "1.5" and "-1.5" local_nlat = 2; local_0_start = 0; // "1.5" is the first grid latitude } else if (rank == 1) { // Latitudes "1.0", "-1.0" local_nlat = 2; local_0_start = 1; // "1.0" is the second grid latitude } else if (rank == 2) { // Latitude "0.5", "0.0", "-0.5" local_nlat = 3; local_0_start = 2; // "0.5" is the third grid latitude // Should "rank == 1" have, say, four // latitudes ("1.0", "0.9", "-0.9", "-1.0"), // then "local_0_start" would be "3" here. } double *lat = (double *)malloc(local_nlat * sizeof(double)); if (lat == NULL) exit(1); if (rank == 0) { lat[0] = 1.5; lat[1] = -1.5; } else if (rank == 1) { lat[0] = 1.0; lat[1] = -1.0; } else if (rank == 2) { lat[0] = 0.5; lat[1] = 0.0; lat[2] = -0.5; } // Note that the latitudes are symmetric with respect to the equator at // *each* MPI process. This is very important. // Also note that "local_0_start" is approached differently with // symmetric grids. // Some longitudes size_t local_nlon = 10; double *lon = (double *)malloc(local_nlon * sizeof(double)); if (lon == NULL) exit(1); for (size_t i = 0; i < local_nlon; i++) lon[i] = (double)i / 10.0; // Some spherical radii double *r = (double *)malloc(local_nlat * sizeof(double)); if (r == NULL) exit(1); for (size_t i = 0; i < local_nlat; i++) r[i] = 1.0; charm_point *pnt = charm_mpi_crd_point_init(CHARM_CRD_POINT_GRID, local_nlat, local_nlon, local_0_start, lat, lon, r, comm, err); charm_err_handler(err, 1); charm_crd_point_free(pnt); charm_err_free(err); free(lat); free(lon); free(r);
- Parameters:
local_nlat – [in] The same as
charm_point.nlat, but for the latitudes stored locally.local_0_start – [in] See
charm_point.local_0_start.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_pointstructure. If one or more processes failed, all processes return aNULLpointer and an error message is written toerr.
-
charm_point *charm_mpi_crd_point_malloc(int type, size_t local_nlat, size_t local_nlon, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_malloc()but forcharm_pointdistributed among MPI processes incomm.Function parameters that are new with respect to
charm_crd_point_malloc()are explained below.For a use case, see the example with scattered points and non-symmetric point grids in
charm_mpi_crd_point_init(). You only have to replacecharm_mpi_crd_point_init()bycharm_mpi_crd_point_malloc()and not to uselat,lonandras input parameters tocharm_mpi_crd_point_malloc().Note
This routine is capable of returning a non-symmetric point grid only. The input parameter
local_0_startmust therefore be always be approached as shown for non-symmetric point grids incharm_mpi_crd_point_init(). This is because this function does not initialize the memory forcharm_point.lat, so the latitudes are specified only after this function has been called. There is therefore no way for CHarm to know by which latitudes will the structure be fed in by the user. If you want to be sure that your distributedcharm_pointstructure will be considered as symmetric, usecharm_mpi_crd_point_init()instead.- Parameters:
local_nlat – [in] See
charm_point.local_nlat.local_0_start – [in] See
charm_point.local_0_start.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_pointstructure. If one or more processes failed, all processes return aNULLpointer and an error message is written toerr.
-
charm_point *charm_mpi_crd_point_calloc(int type, size_t local_nlat, size_t local_nlon, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_mpi_crd_point_malloc()but all array elements are initialized to zero.
-
charm_point *charm_mpi_crd_point_gl(unsigned long nmax, double r, size_t local_nlat, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_gl()but forcharm_pointdistributed among MPI processes incomm.Function parameters that are new with respect to
charm_crd_point_gl()are explained below.Building a distributed
charm_pointstructure for Gauss-Legendre grids depends on the parity ofnmax.If
nmaxis even,local_nlatmust be odd at one process and even at all the remaining processes incomm. The process with oddlocal_nlatmust include the equator.If
nmaxis odd,local_nlatmust be even at all processes withincomm.
The code example below shows how to compute a distributed Gauss-Legendre grid. We assume that
rankdenotes the rank of MPI processes and that there are3MPI processes within the communicatorcomm.// Maximum harmonic degree unsigned long nmax = 10; // For our "nmax = 10", "nlat" will be "11" (you can check this // with "charm_crd_point_gl_shape"). We will thus have the // following latitudes in the Gauss--Legendre grid: "lat0, lat1, // lat2, ..., lat10". size_t local_nlat, local_0_start; if (rank == 0) { // This MPI process will hold 2 latitudes, "lat1" and "lat9". local_nlat = 2; local_0_start = 1; } else if (rank == 1) { // This process also holds two latitudes, "lat0" and "lat10". // We intentionally put "lat0" to the process of rank "1" to // demonstrate that CHarm does not care about which chunk is // stored is by which MPI process. local_nlat = 2; local_0_start = 0; } else if (rank == 2) { // "nmax" is even, so the process holding the equator must have // odd value of "local_nlat". // This process will hold seven latitudes, "lat2, lat3, ..., // lat8". local_nlat = 7; local_0_start = 2; } // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); // Compute the Gauss--Legendre grid charm_point *glg = charm_mpi_crd_point_gl(nmax, 1.0, local_nlat, local_0_start, comm, err); charm_err_handler(err, 1); // Once you no longer need "glg", free it as usually charm_crd_point_free(glg); charm_err_free(err);
- Parameters:
local_nlat – [in] See
charm_point.local_nlat.local_0_start – [in] See
charm_point.local_0_start.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_pointstructure. If one or more processes failed, all processes return aNULLpointer and an error message is written toerr.
-
charm_point *charm_mpi_crd_point_dh1(unsigned long nmax, double r, size_t local_nlat, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_dh1()but forcharm_pointdistributed among MPI processes incomm.Function parameters that are new with respect to
charm_crd_point_dh1()are explained below.There are some restrictions on
local_nlatandlocal_0_start.Two processes must have odd value of
local_nlatand all the remaining processes must have evenlocal_nlat. As an exception, if one process holds all latitudes of the Driscoll-Healy grid, thenlocal_nlatis always even.One process with odd
local_nlatmust havelocal_0_startset to0. In other words, the process holding the north pole must have oddlocal_nlat.The other process with odd
local_nlatmust contain the equator.
The code example below shows how to compute a distributed Driscoll-Healy grid. We assume that
rankdenotes the rank of MPI processes and that there are3MPI processes within the communicatorcomm.// Maximum harmonic degree unsigned long nmax = 10; // For our "nmax = 10", "nlat" will be "22" (you can check this // with "charm_crd_point_dh1_shape"). We will thus have the // following latitudes in the Driscoll-Healy grid: "lat0, lat1, // lat2, ..., lat21". size_t local_nlat, local_0_start; if (rank == 0) { // This process holds the north pole, hence odd // "local_nlat_dh". The latitudes available to this process // are "lat0", "lat1", "lat21". Remember that "lat21" is not // the south pole. local_nlat = 3; local_0_start = 0; } else if (rank == 1) { // This process holds the equator "lat11", hence odd // "local_nlat_dh" local_nlat = 1; local_0_start = 11; } else if (rank == 2) { // All the remaining latitudes of the Driscoll--Healy grid: // "lat2", "lat3", ...,"lat10", "lat12", "lat13", ..., "lat20". local_nlat = 18; local_0_start = 2; } // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); // Compute the Driscoll--Healy grid charm_point *dh1 = charm_mpi_crd_point_dh1(nmax, 1.0, local_nlat, local_0_start, comm, err); charm_err_handler(err, 1); // Once you no longer need "dh1", free it as usually charm_crd_point_free(dh1); charm_err_free(err);
- Parameters:
local_nlat – [in] See
charm_point.local_nlat.local_0_start – [in] See
charm_point.local_0_start.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_pointstructure. If one or more processes failed, all processes return aNULLpointer and an error message is written toerr.
-
charm_point *charm_mpi_crd_point_dh2(unsigned long nmax, double r, size_t local_nlat, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_mpi_crd_point_dh1()but for the equiangular modification of the Driscoll-Healy grid after Wieczorek and Meschede (2018).
Error handling
-
charm_err *charm_mpi_err_init(MPI_Comm comm)#
The same as
charm_err_init()but forcharm_errdistributed among processes in the MPI communicatorcomm.- Returns:
If all processes succeeded, returned is a pointer to the
charm_errstructure. If one or more processes failed, all processes return aNULLpointer.
-
_Bool charm_mpi_err_isempty(const charm_err *err)#
The same as
charm_err_isempty()but for both distributed (err->distributed == 1) and non-distributed (err->distributed == 0)charm_errstructures.- Returns:
Returns boolean
1iferris empty at all processes inerr->comm. Iferris not empty at one or more processes,0is returned.