charm_mpi#
Module to work with CHarm on distributed memory systems.
The routines are split into three categories. Functions related to:
spherical harmonic coefficients (
charm_shc
),evaluation points (
charm_point
) anderror handling (
charm_err
).
Except for routines from this module, structures charm_shc
, charm_point
and charm_err
distributed among MPI processes should only enter the following functions:
charm_shc_free()
(to freecharm_shc
returned by functions from this module),charm_crd_point_free()
(to freecharm_point
returned by functions from this module),charm_err_free()
,charm_err_reset()
,charm_err_handler()
(to free, reset and handle errors incharm_err
),charm_shs_point()
(to perform distributed spherical harmonic synthesis),charm_sha_point()
(to perform distributed spherical harmonic analysis).
The rest of the CHarm’s routines requires non-distributed structures charm_shc
, charm_point
and charm_err
.
For each function in this module, an example usage is provided. Note that we often use if
statements to feed MPI processes by data. While this is perfectly valid, this is not usually done in practice. Still, this approach is hopefully beneficial for learning purposes.
Note
All routines from this module are collective communication functions. They must therefore always be called by all participating MPI processes. Furthermore, using the structures charm_shc
, charm_point
and charm_err
returned by the routines from this module as inputs to functions from other modules (e.g., charm_shs_point()
) makes that particular routines collective communication functions.
Note
The performance of charm_sha_point()
and charm_shs_point()
depends enormously on charm_glob_shc_block_nmax_multiplier
, charm_glob_sha_block_lat_multiplier
and charm_glob_shs_block_lat_multiplier
. To properly tune these parameters, see the documentation and the cookbook.
Note
If CHarm is compiled with the MPI support (--enable-mpi
), spherical harmonic transforms on shared-memory systems will generally be slightly slower compared to when the MPI support is disabled. This is true if even none of the data are actually distributed. This is the price to be paid for having the charm_glob_sha_block_lat_multiplier
and charm_glob_shs_block_lat_multiplier
variables, which require to dynamically allocate memory in the computationally most expensive parts of CHarm.
Note
An MPI implementation is required to support the MPI standard version 3.0 (September 2012) or newer.
Note
This documentation is written for double precision version of CHarm.
Spherical harmonic coefficients
-
charm_shc *charm_mpi_shc_init(unsigned long nmax, double mu, double r, double *c, double *s, size_t local_nchunk, const unsigned long *local_order, MPI_Comm comm, charm_err *err)#
The same as
charm_shc_init()
but forcharm_shc
distributed among MPI processes incomm
.Function parameters that are new with respect to
charm_shc_init()
are explained below.The
c
ands
pointers must have an access to the number of elements returned bycharm_mpi_shc_local_ncs()
. The ordering of spherical harmonic coefficients inc
ands
depends onlocal_order
(seecharm_shc.local_order
).The example below illustrates how to create a distributed
charm_shc
structure with this function. We assume thatrank
represents the rank of MPI processes and the size of the MPI communicatorcomm
is3
.// The maximum harmonic degree of spherical harmonic coefficients unsigned nmax = 10; // The number of chunks of spherical harmonic coefficients for // individual MPI processes size_t local_nchunk; if (rank == 2) local_nchunk = 1; else local_nchunk = 2; // Now define the chunks of spherical harmonic coefficients by // specifying their first and last spherical harmonic order unsigned long local_order[4]; if (rank == 0) // Process with rank "0" { // The first chunk will consist of all spherical harmonic // coefficients of order "0" local_order[0] = 0; local_order[1] = 0; // The second chunk will hold all coefficients of orders from // "1" to "2" local_order[2] = 1; local_order[3] = 2; } else if (rank == 1) // Process with rank "1" { // The first chunk at another MPI process will consist of all // coefficients of orders from "3" to "5" local_order[0] = 3; local_order[1] = 5; // The second chunk consists of all coefficients of order "6" local_order[2] = 6; local_order[3] = 6; } else if (rank == 2) // Process with rank "2" { // This process has one chunk only, starting at order "7" and // ending at "10" local_order[0] = 7; local_order[1] = 10; } // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); // "local_ncs" is the total amount of spherical harmonic // coefficients that is stored locally by the individual MPI // processes size_t local_ncs = charm_mpi_shc_local_ncs(nmax, local_nchunk, local_order, err); charm_err_handler(err, 1); // Allocate arrays that will store spherical harmonic coefficients double *c = (double *)malloc(local_ncs * sizeof(double)); if (c == NULL) exit(1); double *s = (double *)malloc(local_ncs * sizeof(double)); if (s == NULL) exit(1); // You can now fill "c" and "s" with some reasonable data. // Alternatively, you can do this with "shcs->c" and "shcs->s" // after "shcs" is created below. It doesn't matter, as "c" points // to the same location in the memory as "shcs->c". The same is // also true for "s" and "shcs->s". // Create a distributed "charm_shc" structure charm_shc *shcs = charm_mpi_shc_init(nmax, 1.0, 1.0, c, s, local_nchunk, local_order, comm, err); charm_err_handler(err, 1); // Now is the time to do some heavy lifting with "shcs" // Once you no longer need "shcs", free it as usually charm_shc_free(shcs); free(c); free(s); charm_err_free(err);
- Parameters:
local_nchunk – [in] See
charm_shc.local_nchunk
.local_order – [in] See
charm_shc.local_order
.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_shc
structure. If one or more processes failed, all processes return aNULL
pointer and an error message is written toerr
.
-
charm_shc *charm_mpi_shc_malloc(unsigned long nmax, double mu, double r, size_t local_nchunk, const unsigned long *local_order, MPI_Comm comm, charm_err *err)#
The same as
charm_shc_malloc()
but forcharm_shc
distributed among MPI processes incomm
.For an example usage, see
charm_mpi_shc_init()
. All you need to change is to replacecharm_mpi_shc_init()
bycharm_mpi_shc_malloc()
and not to usec
ands
as input parameters tocharm_mpi_shc_malloc()
.Function parameters that are new with respect to
charm_shc_malloc()
are explained below.- Parameters:
local_nchunk – [in] See
charm_shc.local_nchunk
.local_order – [in] See
charm_shc.local_order
.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_shc
structure. If one or more processes failed, all processes return aNULL
pointer and an error message is written toerr
.
-
charm_shc *charm_mpi_shc_calloc(unsigned long nmax, double mu, double r, size_t local_nchunk, const unsigned long *local_order, MPI_Comm comm, charm_err *err)#
The same as
charm_mpi_shc_malloc()
but all spherical harmonic coefficients are initialized to zero.
-
size_t charm_mpi_shc_local_ncs(unsigned long nmax, size_t local_nchunk, const unsigned long *local_order, charm_err *err)#
Returns the number of \(\bar{C}_{nm}\) coefficients that are stored locally by the
charm_shc
structure when its maximum harmonic degree isnmax
and there arelocal_nchunk
local chunks given by harmonic orderslocal_order
. The same number of coefficients applies to \(\bar{S}_{nm}\), too.Tip
There are two use cases of this routine.
Use it before calling
charm_mpi_shc_init()
to get the number of elements that thec
ands
input pointers must have access to.Use it before calling
charm_mpi_shc_malloc()
orcharm_mpi_shc_calloc()
to get the number of elements that will be locally allocated tocharm_shc.c
andcharm_shc.s
.
- Parameters:
nmax – [in] Maximum harmonic degree of the
charm_shc
structure.local_nchunk – [in] See
charm_shc.local_nchunk
.local_order – [in] See
charm_shc.local_order
.err – [out] Error reported by the function (if any).
- Returns:
On success, returned is the total number of \(\bar{C}_{nm}\) and \(\bar{S}_{nm}\) coefficients that CHarm associates with the local portion of
charm_shc.c
andcharm_shc.s
for a given chunk(s). On error,0
is returned and the error message is written toerr
.
Evaluation points
-
charm_point *charm_mpi_crd_point_init(int type, size_t local_nlat, size_t local_nlon, size_t local_0_start, double *lat, double *lon, double *r, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_init()
but forcharm_point
distributed among MPI processes incomm
.Function parameters that are new with respect to
charm_crd_point_init()
are explained below. Thelat
andr
pointers must have an access tolocal_nlat
elements. Thelon
pointer requireslocal_nlon
elements.Examples:
Scattered points:
Assume that you have
6
scattered points that you want to distribute among3
MPI processes. Let the latitudes of the points belat0
,lat1
,...
,lat5
. Assume that you want to have, say, the first three latitudes at the MPI process with rank0
, then two points at rank1
and the last point at rank2
. Finally, let’s assume thatrank
denotes the rank of MPI processes andcomm
is the MPI communicator. Then, you could write:size_t local_nlat, local_0_start; if (rank == 0) { local_nlat = 3; local_0_start = 0; } else if (rank == 1) { local_nlat = 2; local_0_start = 3; } else if (rank == 2) { local_nlat = 1; local_0_start = 5; } double *lat = (double *)malloc(local_nlat * sizeof(double)); if (lat == NULL) exit(1); size_t local_nlon = local_nlat; double *lon = (double *)malloc(local_nlon * sizeof(double)); if (lon == NULL) exit(1); double *r = (double *)malloc(local_nlat * sizeof(double)); if (r == NULL) exit(1); // At the process with "rank == 0", set "lat[0]", "lat[1]" and // "lat[2]" to the first three latitudes of your six points, "lat0", // "lat1", "lat2". // At the process with "rank == 1", set "lat[0]" and "lat[1]" to // the next two latitudes of your six points, "lat3" and "lat4". // At the process with "rank == 2", set "lat[0]" to // the last latitude, "lat5". // Similarly, set the longitudes and spherical radii. // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); charm_point *pnt = charm_mpi_crd_point_init(CHARM_CRD_POINT_SCATTERED, local_nlat, local_nlon, local_0_start, lat, lon, r, comm, err); charm_err_handler(err, 1); // Once you no longer need "pnt", free it as usually charm_crd_point_free(pnt); charm_err_free(err);
Non-symmetric point grids:
Assume that you have the same six latitudes as in the previous example with scattered points and you also have ten longitudes, so that your grid has
6
times10
points. Finally, assume that your grid is non-symmetric with respect to the equator (details on symmetric and non-symmetric grids here).You can initialize the
charm_point
structure in the same way as in the example with scatter points with two differences. First, your value oflocal_nlon
will now be10
and you must set of course all the ten elements oflon
to your longitudes. All ten longitudes must be equal at all MPI processes. Second, you need to set the first input parametertype
incharm_mpi_crd_point_init
toCHARM_CRD_POINT_GRID
.Symmetric point grids:
There is a special way to tell CHarm that your grid is symmetric with respect to the equator. As a general rule, a symmetric grid must be distributed across MPI processes in such a way that local portions of the grid at all processes are symmetric (details on symmetric and non-symmetric grids here).
Let’s say that you have three MPI processes in
comm
and with ranksrank
. You have7
grid latitudes1.5
,1.0
,0.5
,0.0
,-0.5
,-1.0
,-1.5
. You also have10
longitudes and7
spherical radii (their values are irrelevant here). Then, you can write (observe carefully the variablelocal_0_start
):size_t local_nlat, local_0_start; if (rank == 0) { // This process will hold latitudes "1.5" and "-1.5" local_nlat = 2; local_0_start = 0; // "1.5" is the first grid latitude } else if (rank == 1) { // Latitudes "1.0", "-1.0" local_nlat = 2; local_0_start = 1; // "1.0" is the second grid latitude } else if (rank == 2) { // Latitude "0.5", "0.0", "-0.5" local_nlat = 3; local_0_start = 2; // "0.5" is the third grid latitude // Should "rank == 1" have, say, four // latitudes ("1.0", "0.9", "-0.9", "-1.0"), // then "local_0_start" would be "3" here. } double *lat = (double *)malloc(local_nlat * sizeof(double)); if (lat == NULL) exit(1); if (rank == 0) { lat[0] = 1.5; lat[1] = -1.5; } else if (rank == 1) { lat[0] = 1.0; lat[1] = -1.0; } else if (rank == 2) { lat[0] = 0.5; lat[1] = 0.0; lat[2] = -0.5; } // Note that the latitudes are symmetric with respect to the equator at // *each* MPI process. This is very important. // Also note that "local_0_start" is approached differently with // symmetric grids. // Some longitudes size_t local_nlon = 10; double *lon = (double *)malloc(local_nlon * sizeof(double)); if (lon == NULL) exit(1); for (size_t i = 0; i < local_nlon; i++) lon[i] = (double)i / 10.0; // Some spherical radii double *r = (double *)malloc(local_nlat * sizeof(double)); if (r == NULL) exit(1); for (size_t i = 0; i < local_nlat; i++) r[i] = 1.0; charm_point *pnt = charm_mpi_crd_point_init(CHARM_CRD_POINT_GRID, local_nlat, local_nlon, local_0_start, lat, lon, r, comm, err); charm_err_handler(err, 1); charm_crd_point_free(pnt); charm_err_free(err); free(lat); free(lon); free(r);
- Parameters:
local_nlat – [in] The same as
charm_point.nlat
, but for the latitudes stored locally.local_0_start – [in] See
charm_point.local_0_start
.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_point
structure. If one or more processes failed, all processes return aNULL
pointer and an error message is written toerr
.
-
charm_point *charm_mpi_crd_point_malloc(int type, size_t local_nlat, size_t local_nlon, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_malloc()
but forcharm_point
distributed among MPI processes incomm
.Function parameters that are new with respect to
charm_crd_point_malloc()
are explained below.For a use case, see the example with scattered points and non-symmetric point grids in
charm_mpi_crd_point_init()
. You only have to replacecharm_mpi_crd_point_init()
bycharm_mpi_crd_point_malloc()
and not to uselat
,lon
andr
as input parameters tocharm_mpi_crd_point_malloc()
.Note
This routine is capable of returning a non-symmetric point grid only. The input parameter
local_0_start
must therefore be always be approached as shown for non-symmetric point grids incharm_mpi_crd_point_init()
. This is because this function does not initialize the memory forcharm_point.lat
, so the latitudes are specified only after this function has been called. There is therefore no way for CHarm to know by which latitudes will the structure be fed in by the user. If you want to be sure that your distributedcharm_point
structure will be considered as symmetric, usecharm_mpi_crd_point_init()
instead.- Parameters:
local_nlat – [in] See
charm_point.local_nlat
.local_0_start – [in] See
charm_point.local_0_start
.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_point
structure. If one or more processes failed, all processes return aNULL
pointer and an error message is written toerr
.
-
charm_point *charm_mpi_crd_point_calloc(int type, size_t local_nlat, size_t local_nlon, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_mpi_crd_point_malloc()
but all array elements are initialized to zero.
-
charm_point *charm_mpi_crd_point_gl(unsigned long nmax, double r, size_t local_nlat, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_gl()
but forcharm_point
distributed among MPI processes incomm
.Function parameters that are new with respect to
charm_crd_point_gl()
are explained below.Building a distributed
charm_point
structure for Gauss-Legendre grids depends on the parity ofnmax
.If
nmax
is even,local_nlat
must be odd at one process and even at all the remaining processes incomm
. The process with oddlocal_nlat
must include the equator.If
nmax
is odd,local_nlat
must be even at all processes withincomm
.
The code example below shows how to compute a distributed Gauss-Legendre grid. We assume that
rank
denotes the rank of MPI processes and that there are3
MPI processes within the communicatorcomm
.// Maximum harmonic degree unsigned long nmax = 10; // For our "nmax = 10", "nlat" will be "11" (you can check this // with "charm_crd_point_gl_shape"). We will thus have the // following latitudes in the Gauss--Legendre grid: "lat0, lat1, // lat2, ..., lat10". size_t local_nlat, local_0_start; if (rank == 0) { // This MPI process will hold 2 latitudes, "lat1" and "lat9". local_nlat = 2; local_0_start = 1; } else if (rank == 1) { // This process also holds two latitudes, "lat0" and "lat10". // We intentionally put "lat0" to the process of rank "1" to // demonstrate that CHarm does not care about which chunk is // stored is by which MPI process. local_nlat = 2; local_0_start = 0; } else if (rank == 2) { // "nmax" is even, so the process holding the equator must have // odd value of "local_nlat". // This process will hold seven latitudes, "lat2, lat3, ..., // lat8". local_nlat = 7; local_0_start = 2; } // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); // Compute the Gauss--Legendre grid charm_point *glg = charm_mpi_crd_point_gl(nmax, 1.0, local_nlat, local_0_start, comm, err); charm_err_handler(err, 1); // Once you no longer need "glg", free it as usually charm_crd_point_free(glg); charm_err_free(err);
- Parameters:
local_nlat – [in] See
charm_point.local_nlat
.local_0_start – [in] See
charm_point.local_0_start
.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_point
structure. If one or more processes failed, all processes return aNULL
pointer and an error message is written toerr
.
-
charm_point *charm_mpi_crd_point_dh1(unsigned long nmax, double r, size_t local_nlat, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_crd_point_dh1()
but forcharm_point
distributed among MPI processes incomm
.Function parameters that are new with respect to
charm_crd_point_dh1()
are explained below.There are some restrictions on
local_nlat
andlocal_0_start
.Two processes must have odd value of
local_nlat
and all the remaining processes must have evenlocal_nlat
. As an exception, if one process holds all latitudes of the Driscoll-Healy grid, thenlocal_nlat
is always even.One process with odd
local_nlat
must havelocal_0_start
set to0
. In other words, the process holding the north pole must have oddlocal_nlat
.The other process with odd
local_nlat
must contain the equator.
The code example below shows how to compute a distributed Driscoll-Healy grid. We assume that
rank
denotes the rank of MPI processes and that there are3
MPI processes within the communicatorcomm
.// Maximum harmonic degree unsigned long nmax = 10; // For our "nmax = 10", "nlat" will be "22" (you can check this // with "charm_crd_point_dh1_shape"). We will thus have the // following latitudes in the Driscoll-Healy grid: "lat0, lat1, // lat2, ..., lat21". size_t local_nlat, local_0_start; if (rank == 0) { // This process holds the north pole, hence odd // "local_nlat_dh". The latitudes available to this process // are "lat0", "lat1", "lat21". Remember that "lat21" is not // the south pole. local_nlat = 3; local_0_start = 0; } else if (rank == 1) { // This process holds the equator "lat11", hence odd // "local_nlat_dh" local_nlat = 1; local_0_start = 11; } else if (rank == 2) { // All the remaining latitudes of the Driscoll--Healy grid: // "lat2", "lat3", ...,"lat10", "lat12", "lat13", ..., "lat20". local_nlat = 18; local_0_start = 2; } // Initialize an error structure charm_err *err = charm_mpi_err_init(); if (err == NULL) exit(1); // Compute the Driscoll--Healy grid charm_point *dh1 = charm_mpi_crd_point_dh1(nmax, 1.0, local_nlat, local_0_start, comm, err); charm_err_handler(err, 1); // Once you no longer need "dh1", free it as usually charm_crd_point_free(dh1); charm_err_free(err);
- Parameters:
local_nlat – [in] See
charm_point.local_nlat
.local_0_start – [in] See
charm_point.local_0_start
.comm – [in] MPI communicator.
err – [out] Error reported by the function (if any).
- Returns:
If all processes succeeded, returned is a pointer to the
charm_point
structure. If one or more processes failed, all processes return aNULL
pointer and an error message is written toerr
.
-
charm_point *charm_mpi_crd_point_dh2(unsigned long nmax, double r, size_t local_nlat, size_t local_0_start, MPI_Comm comm, charm_err *err)#
The same as
charm_mpi_crd_point_dh1()
but for the equiangular modification of the Driscoll-Healy grid after Wieczorek and Meschede (2018).
Error handling
-
charm_err *charm_mpi_err_init(MPI_Comm comm)#
The same as
charm_err_init()
but forcharm_err
distributed among processes in the MPI communicatorcomm
.- Returns:
If all processes succeeded, returned is a pointer to the
charm_err
structure. If one or more processes failed, all processes return aNULL
pointer.
-
_Bool charm_mpi_err_isempty(const charm_err *err)#
The same as
charm_err_isempty()
but for both distributed (err->distributed == 1
) and non-distributed (err->distributed == 0
)charm_err
structures.- Returns:
Returns boolean
1
iferr
is empty at all processes inerr->comm
. Iferr
is not empty at one or more processes,0
is returned.