charm_glob#

Module defining global CHarm variables.

The global variables are declared in charm_glob.h and are initialized to safe values. Should you need to modify them for some reason, simply include charm_glob.h (or charm.h) in your code and set (without definition) the variables to whatever values you prefer. After that, whenever you call a CHarm routine that uses the global variables, the routine will use your own values instead of the default ones.

Warning

This module is for experienced users only. Most users should not interact with it.

Note

This documentation is written for double precision version of CHarm.

Thresholds used to compare floating point numbers

double charm_glob_threshold#

Threshold to judge whether two floating point numbers are equal, 100.0 * EPS, where EPS is the machine epsilon of the floating point data type (float, double or __float128). Experienced users may (under some very specific circumstances) need the possibility to tighten or relax the threshold.

double charm_glob_threshold2#

A more relaxed threshold to judge whether two floating point numbers are equal, 100000.0 * EPS. It is used to check longitudes of custom user-defined grids for a constant longitudinal step (required by the FFT-based algorithms). To this end, we use double differences. For high-resolution grids, the longitudinal step may be very small, so that two consecutive longitudes are very similar. Since double differences increase numerical errors, the threshold needs to be sufficiently large in such situations (yet narrow enough). The current value was found to be sufficient for quadrature grids associated with degrees as high as 70,000.

Parameters defining the polar optimization

CHarm can be told to skip the computation of all spherical harmonics during spherical harmonic synthesis/analysis for which

\[m - n_{\max} \, \cos\varphi > \max(a_1, a_2 \, n_{\max}){.}\]

If the parameters a1 and a2 (see charm_glob_polar_optimization_a1 and charm_glob_polar_optimization_a2) are tuned reasonably, the polar optimization can improve the computation speed while not deteriorating the output accuracy. This is because the inequality filters out spherical harmonics that are of such small magnitudes in the polar areas that they do not contribute to the result (within the numerical precision). However, if the parameters are set unwisely, the accuracy can be compromised.

If the second polar optimization parameter charm_glob_polar_optimization_a2 is negative (which is the default), no polar optimization is applied, that is, all spherical harmonics are evaluated, regardless of their smallness in magnitude. If it is non-negative, the polar optimization is applied.

In double precision, the following values of the tuning parameters improve the computation speed and, in most cases, can safely be used without deteriorating the accuracy (Reinecke and Seljebotn, 2013):

In single and quadruple precision, different values of the tuning parameters are needed. Currently, we do not provide any recommendations, though.

References:

  • Reinecke, M., Seljebotn, D. S. (2013) Libsharp - spherical harmonic transforms revisited. Astronomy and Astrophysics 554, A112, doi: 10.1051/0004-6361/201321494.

unsigned long charm_glob_polar_optimization_a1#

Polar optimization parameter (default is 100)

double charm_glob_polar_optimization_a2#

Polar optimization parameter (default is -1.0, that is, no polar optimization)

MPI specific global variables

Note

The variables that follow are available only when CHarm is compiled with the MPI support (--enable-mpi, refer to charm_mpi for further details).

unsigned long charm_glob_shc_block_nmax_multiplier#

Multiplying this variable by the maximum harmonic degree of spherical harmonic coefficients enlarged by 1 specifies the maximum number of coefficients that can be exchanged between MPI processes during spherical harmonic synthesis/analysis with distributed charm_shc.

Assume that charm_shc.distributed = 1. During the spherical harmonic synthesis/analysis, (charm_shc.nmax + 1) * charm_glob_shc_block_nmax_multiplier spherical harmonic coefficients \(\bar{C}_{nm}\) and the same amount of \(\bar{S}_{nm}\) coefficients will be sent between MPI processes at most. Too small value of charm_glob_shc_block_nmax_multiplier will cause too large overhead due to MPI calls. On the other hand, too large value will consume too much RAM on each MPI process.

Default value is 1000. The value must be larger than zero. If it is zero, CHarm will use the value 1.

size_t charm_glob_sha_block_lat_multiplier#

This variable helps to control the number of for loop iterations over latitudes in charm_sha_point(). It has enormous impact on the performance of charm_sha_point() when CHarm is compiled with the MPI support.

Let x by the number of latitudes, for which the for loop over latitudes in charm_sha_point() runs. Usually, this number is about half of the number of latitudes of quadrature grids due to their equatorial symmetry. Furthermore, let b be the value of charm_glob_sha_block_lat_multiplier. Finally, let s be the size of SIMD vectors (can be determined by charm_misc_buildopt_simd_vector_size()) and o be the number of OpenMP processes (if OpenMP parallelization is disabled, then o = 1). The for loop in charm_sha_point() will run roughly ceil(x / (b * s * o)) times. Thus, by increasing b (and optionally also o), the number of loop runs can be decreased.

The optimal value depends on your hardware (e.g., the number of shared-memory computing nodes, the network connection speed between the nodes, etc.) and also on the number of latitudes. Both too low and too high values may drastically deteriorate the performance.

  • It the value is too low, the for loop will run many times. This is a serious problem, because with each loop iteration, all spherical harmonic coefficients need to be distributed among all MPI processes within an MPI communicator. So by increasing charm_glob_sha_block_lat_multiplier, you can reduce the number of times the spherical harmonic coefficients will be sent between MPI processes which is desired in general.

  • If the value is too high, the CPU caching within a single computing node may be significantly deteriorated, thereby decreasing the performance.

The optimum value should be determined by the trial and error method.

The parameter affects charm_sha_point() regardless of whether or not charm_shc and charm_point are distributed. The value must be larger than 0. If it is 0, CHarm will use the value 1.

Default value is 4.

size_t charm_glob_shs_block_lat_multiplier#

The same as charm_glob_sha_block_lat_multiplier but for spherical harmonic synthesis in charm_shs_point().

Default value is 8.