DKRZ projects: Issueshttps://swprojects.dkrz.de/redmine/https://swprojects.dkrz.de/redmine/redmine/favicon.ico?17095821032020-08-13T18:32:36ZDKRZ projects
Redmine ScalES-PPM - Bug #350 (Resolved): pkg-config file libraries results in overlinkinghttps://swprojects.dkrz.de/redmine/issues/3502020-08-13T18:32:36ZMatthew Krupcale
<p>The <code>scales-ppm{,-core}.pc.in</code> pkg-config files specifies several libraries in the <code>Libs:</code> portion which are actually internal, private library dependencies of scales-ppm. That is, they are not part of the public interface/API and are just implementation details. These should be specified instead in <code>Libs.private:</code> so that they are only added to the link when creating a static library.</p>
<p>Similarly, the MPI includes are part of the public scales-ppm-core API, so we need to include those, but we don't need parmetis or metis includes here (in any case, they should have been in <code>scales-ppm.pc.in</code>, not <code>scales-ppm-core.pc.in</code>, but they're not needed there either since they're again not part of the public API).</p>
<p>See attached patch and [1-2] for how libraries should be specified for linking.</p>
<p>[1] <a class="external" href="https://people.freedesktop.org/~dbn/pkg-config-guide.html">https://people.freedesktop.org/~dbn/pkg-config-guide.html</a><br />[2] <a class="external" href="https://cmake.org/pipermail/cmake/2016-May/063400.html">https://cmake.org/pipermail/cmake/2016-May/063400.html</a></p> ScalES-PPM - Bug #349 (Resolved): Missing MPI include file for non-MPI buildshttps://swprojects.dkrz.de/redmine/issues/3492020-08-13T16:37:05ZMatthew Krupcale
<p>The include file <code>mpi_fc_conf.inc</code> is generated only for MPI builds. This causes a problem when configuring with <code>--disable-MPI</code> and attempting to build the doxygen documentation: processing <code>src/core/ppm_std_type_kinds_mp.f90</code> causes a fatal error due to the missing include file.</p>
<p>The attached patch fixes this by only including <code>mpi_fc_conf.inc</code> when <code>USE_MPI</code> is defined.</p> ScalES-PPM - Feature #347 (Feedback): Add support for METIS v4-5 and ParMETIS v3-4https://swprojects.dkrz.de/redmine/issues/3472017-09-26T01:13:39ZMatthew Krupcale
<a name="Introduction"></a>
<h2 >Introduction<a href="#Introduction" class="wiki-anchor">¶</a></h2>
<p>The current ScalES-PPM API appears to be designed around the METIS v4 and ParMETIS v3 APIs. The attached patch aims to add support for both METIS v4 and v5 as well as ParMETIS v3 and v4.</p>
<a name="Summary-of-the-patched-files"></a>
<h2 >Summary of the patched files<a href="#Summary-of-the-patched-files" class="wiki-anchor">¶</a></h2>
<ul>
<li><code>configure.ac</code>: detect METIS and ParMETIS version; check <code>idxtype</code>/<code>idx_t</code> compatibility; check <code>real_t</code> compatibility</li>
<li><code>include/f77/ppm.inc.in</code>: define parameter <code>PPM_REAL</code> to be the width of <code>real_t</code></li>
<li><code>ppm.settings.in</code>: define ParMETIS/METIS <code>fcrealkind</code> to be the width of <code>real_t</code></li>
<li><code>src/Makefile.am</code>: define HAVE_METIS_V4 and HAVE_PARMETIS_V3 compiler preprocessor macros as needed; only in ParMETIS v3 include <code>src/ppm/parmetis_wrap.c</code> in <code>libscalesppm</code></li>
<li><code>src/ppm/ppm_graph_partition_mpi.f90</code>: adjust <code>INTERFACE SUBROUTINE parmetis_v3_partkway</code> C-binding for particular version of ParMETIS; use <code>c_null_ptr</code> instead of all combinations of <code>dummy_{balance,weights}</code></li>
<li><code>ppm_graph_partition_serial.f90</code>: define <code>INTERFACE SUBROUTINE metis_mcpartgraphkway</code>, <code>INTERFACE SUBROUTINE metis_setdefaultoptions</code>, and <code>INTERFACE SUBROUTINE metis_partgraphkway</code> C-bindings for particular version of METIS; use <code>c_null_ptr</code> instead of all combinations of <code>{v,e}w_dummy</code></li>
</ul>
<a name="Detailed-changes"></a>
<h2 >Detailed changes<a href="#Detailed-changes" class="wiki-anchor">¶</a></h2>
<a name="configureac"></a>
<h3 ><code>configure.ac</code><a href="#configureac" class="wiki-anchor">¶</a></h3>
<p>There have been several changes made between METIS v4 and v5 in particular, making <code>configure.ac</code> and <code>ppm/ppm_graph_partition_serial.f90</code> inoperable with METIS v5. Some of the particular changes between METIS v4 and v5 are:</p>
<ul>
<li>Changed <code>idxtype</code> -> <code>idx_t</code></li>
<li>Unified routines (<code>METIS_PartGraphKway</code>, <code>METIS_mCPartGraphKway</code>, <code>METIS_WPartGraphKway</code>, <code>METIS_PartGraphVKway</code>, <code>METIS_WPartGraphVKway</code>) -> <code>METIS_PartGraphKway</code> (with a change in API)</li>
</ul>
<p>ParMETIS v4 now also directly relies on METIS v5 and thus should not in principle have a different <code>idx_t</code> than METIS, unless the user explicitly builds METIS with <code>IDXTYPEWIDTH 32</code> and <code>IDXTYPEWIDTH 64</code>, then builds a modified version of ParMETIS v4. This would have to be a modified ParMETIS v4 because it directly includes the <code>metis.h</code>, where <code>idx_t</code> is <code>typedef</code>'d. Obviously this is not a supported configuration, and I would hope that the user would not attempt this. Nevertheless, if the user does this, the patched <code>configure.ac</code> still checks both <code>parmetis.h</code> and <code>metis.h</code> for the <code>idxtype</code>/<code>idx_t</code> and ensures that they are compatible.</p>
<p>Similarly to <code>idx_t</code>, METIS v5 now defines <code>real_t</code> rather than <code>float</code> for some of the balancing constraints. Thus, the patched <code>configure.ac</code> also checks for <code>real_t</code> (again in both <code>parmetis.h</code> and <code>metis.h</code>, in case the user tries to do something strange) and makes sure they are compatible. Note that this relies on the patch in <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: ACX_FORTRAN_RUN_CHECK_SIZEOF in acx_fc_real_size.m4 does not check size of its argument (Closed)" href="https://swprojects.dkrz.de/redmine/issues/343">#343</a>.</p>
<p>These are the major changes to <code>configure.ac</code>. In addition to the above changes, <code>METIS_HEADER='metis/metis.h'</code> was removed because I could find no reference to such a file in any of the METIS installations and assumed this must be some custom installation location. In this case, the <code>metis.h</code> header location should be specified in the <code>configure</code> <code>CFLAGS</code> instead. If desired I can break this change out into a separate bug/patch, but it was causing issues when configuring my METIS/ParMETIS setups.</p>
<a name="srcMakefileam"></a>
<h3 ><code>src/Makefile.am</code><a href="#srcMakefileam" class="wiki-anchor">¶</a></h3>
<p>A Fortran to C wrapper for <code>ParMETIS_V3_PartKway</code> is included in <code>src/ppm/parmetis_wrap.c</code> since the ParMETIS v3 Fortran wrapper does not include the conversion from Fortran MPI communicator to C MPI communicator, while the v4 API does this properly (see macro <code>FRENAME</code> in <code>parmetis-4.0.3/libparmetis/frename.c</code>), so this wrapper is only included in <code>libscalesppm</code> when using ParMETIS v3.</p>
<a name="srcppmppm_graph_partition_mpif90"></a>
<h3 ><code>src/ppm/ppm_graph_partition_mpi.f90</code><a href="#srcppmppm_graph_partition_mpif90" class="wiki-anchor">¶</a></h3>
<p><code>ParMETIS_V3_PartKway</code> function has the following prototypes in v 3.2.0 and v 4.0.3, respectively:</p>
<pre>
// ParMETIS v 3.2.0
void __cdecl ParMETIS_V3_PartKway(
idxtype *vtxdist, idxtype *xadj, idxtype *adjncy, idxtype *vwgt,
idxtype *adjwgt, int *wgtflag, int *numflag, int *ncon, int *nparts,
float *tpwgts, float *ubvec, int *options, int *edgecut, idxtype *part,
MPI_Comm *comm);
// ParMETIS v 4.0.3
int __cdecl ParMETIS_V3_PartKway(
idx_t *vtxdist, idx_t *xadj, idx_t *adjncy, idx_t *vwgt,
idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts,
real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut, idx_t *part,
MPI_Comm *comm);
</pre>
<p>Thus, the types defined in the <code>INTERFACE SUBROUTINE parmetis_v3_partkway</code> had to be adjusted depending on the version of the library used.</p>
<p>The method for invoking <code>parmetis_v3_partkway</code> was also modified to require only one <code>CALL</code> site with the arguments passed depending on the presence of the <code>OPTIONAL</code> arguments to <code>graph_partition_parmetis</code>. In particular, several <code>TYPE(c_ptr)</code> were used and passed (by value) to the <code>parmetis_v3_partkway</code> Fortran C-binding rather than using an uninitialized dummy array. This scales much better with the number of optional arguments: for <em>N</em> optional arguments, there are 2^N combinations, while defining a pointer to either <code>NULL</code> or the location of the argument when present scales linearly in <em>N</em>.</p>
<p>Although <code>balance</code> is an <code>OPTIONAL</code> argument, <code>tpwgts</code> appears to be required in the ParMETIS v4 API (see <code>parmetis-4.0.3/libparmetis/weird.c:CheckInputsPartKway</code>), and thus <code>balance</code> is constructed with equal weights for each sub-domain/partition when it is not present.</p>
<a name="ppm_graph_partition_serialf90"></a>
<h3 ><code>ppm_graph_partition_serial.f90</code><a href="#ppm_graph_partition_serialf90" class="wiki-anchor">¶</a></h3>
<p>METIS v4 has separate routines for multi-constraint partitioning (<code>METIS_mCPartGraphKway</code>) and single-constraint partitioning (<code>METIS_PartGraphKway</code>); in METIS v5, these two partitioning methods are unified into <code>METIS_PartGraphKway</code> (albeit with different API than in v4). METIS v5 also has a larger set of <code>options</code> and a corresponding function <code>METIS_SetDefaultOptions</code> to initialize it with defaults. Only <code>METIS_OPTION_NUMBERING</code> is set (corresponding to <code>numflag</code> in the v4 API) to indicate Fortran-style numbering.</p>
<p>The <code>METIS_PartGraphKway</code> function has the following prototypes in v 4.0.3 and v 5.1.1, respectively:</p>
<pre>
// METIS v 4.0.3
void METIS_PartGraphKway(int *nvtxs, idxtype *xadj, idxtype *adjncy, idxtype *vwgt,
idxtype *adjwgt, int *wgtflag, int *numflag, int *nparts,
int *options, int *edgecut, idxtype *part);
// METIS v 5.1.0
METIS_API(int) METIS_PartGraphKway(idx_t *nvtxs, idx_t *ncon, idx_t *xadj,
idx_t *adjncy, idx_t *vwgt, idx_t *vsize, idx_t *adjwgt,
idx_t *nparts, real_t *tpwgts, real_t *ubvec, idx_t *options,
idx_t *edgecut, idx_t *part);
</pre>
<p>Thus, similar to in <code>src/ppm/ppm_graph_partition_mpi.f90</code>, the types and parameters defined in the <code>INTERFACE SUBROUTINE metis_partgraphkway</code> had to be adjusted depending on the version of the library used. Additionally, like in <code>src/ppm/ppm_graph_partition_mpi.f90</code>, <code>c_null_ptr</code> was used to indicate the absence of <code>OPTIONAL</code> arguments, and a single <code>CALL</code> site was used for all combinations of <code>OPTIONAL</code> arguments.</p>
<a name="Supported-combinations-of-METIS-and-ParMETIS"></a>
<h2 >Supported combinations of METIS and ParMETIS<a href="#Supported-combinations-of-METIS-and-ParMETIS" class="wiki-anchor">¶</a></h2>
<p>While this patch is designed for use with arbitrary combinations of METIS v4/v5 and ParMETIS v3/v4, in practice, when using ParMETIS, I expect this to only work when building ParMETIS with the internally-bundled version of METIS or with ParMETIS v4.0.3 and an internal or external METIS v5.</p>
<p>By default, it is not possible to build binaries linked against ParMETIS v3+ and external METIS v4 package because the ParMETIS v3 version of <code>parmetis.c</code> (modified from the METIS version) defines functions <code>METIS_NodeRefine</code> (needed by ParMETIS v3.2+) and/or <code>METIS_mCPartGraphRecursive2</code> (needed by ParMETIS v3). So in order to use an external METIS v4 with ParMETIS v3, the METIS v4 <code>parmetis.c</code> would have to be similarly modified to define these functions. However, it is possible to build binaries linked against ParMETIS v4 and external METIS v5 since it defines <code>METIS_NodeRefine</code>. This should also be clear from the fact that ParMETIS v4 now uses and bundles METIS v5 directly.</p>
<p>In spite of still technically supporting METIS v4 and ParMETIS v3, the only combination really recommended is METIS v5 and ParMETIS v4. In testing (using <code>example/graph_partition</code>), METIS v4 seems to be unreliable, returning <code>SIGABRT</code> double free or corrupted double-linked list or <code>SIGSEGV</code> segmentation fault errors within <code>METIS_mCPartGraphKway</code>. Thus, the multi-constraint procedure in METIS v4 seems altogether un-usable. These errors do not appear in the METIS v5 version. Both of these codes are more than 5 years old at this point, so I would hope that this is not much of an issue.</p>
<a name="Testing"></a>
<h2 >Testing<a href="#Testing" class="wiki-anchor">¶</a></h2>
<p>Since my goal is to package ScalES-PPM for Fedora, I am working on this on Fedora, and I have some Bash scripts which setup a mock chroot, build METIS v4 and ParMETIS v3/4, and build ScalES-PPM with various combinations of these. Let me know if you are interested in trying out these tests.</p> ScalES-PPM - Bug #346 (Feedback): Failure to apply doxygen CSS patch when building HTML documenta...https://swprojects.dkrz.de/redmine/issues/3462017-09-25T17:53:54ZMatthew Krupcale
<p>After applying the patches in <a class="issue tracker-1 status-3 priority-4 priority-default" title="Bug: autoreconf fails due to missing file m4/ac_fc_module_output_flag.m4 (Resolved)" href="https://swprojects.dkrz.de/redmine/issues/342">#342</a> and <a class="issue tracker-1 status-3 priority-4 priority-default" title="Bug: No rule to make target 'dist_array.f90' when building HTML documentation (Resolved)" href="https://swprojects.dkrz.de/redmine/issues/345">#345</a>, running <code>autoreconf -vif</code> to rebuild the <code>configure</code> and <code>Makefiles</code>, configuring the project and running</p>
<p><code>make -C doc/unitdoc html-local</code>,</p>
<p>from the top build directory to build the HTML documentation, <code>make</code> fails when building the target <code>html-local</code> when applying the final patch to the CSS:</p>
<pre>
patch -p1 <../../../doc/unitdoc/doxygen.css.patch
patching file html/doxygen.css
Hunk #1 FAILED at 944.
1 out of 1 hunk FAILED -- saving rejects to file html/doxygen.css.rej
</pre>
<p>This might be due to a different version of <code>doxygen</code> being used and producing different CSS, but I have fixed this issue by creating a new <code>doc/unitdoc/doxygen.css.patch</code> file. The attached patch fixes this patch file at least when using <code>doxygen</code> version <code>1.8.13</code>.</p> ScalES-PPM - Bug #345 (Resolved): No rule to make target 'dist_array.f90' when building HTML docu...https://swprojects.dkrz.de/redmine/issues/3452017-09-25T17:41:08ZMatthew Krupcale
<p>After configuring the project and running</p>
<p><code>make -C doc/unitdoc html-local</code></p>
<p>from the top build directory to build the HTML documentation, <code>make</code> fails with the following error:</p>
<p><code>make[1]: *** No rule to make target 'dist_array.f90'. Stop.</code></p>
<p>This appears to be due to <code>doc/unitdoc/Makefile.am</code> attempting to build <code>src/ppm/dist_array.f90</code> using the wrong target: from <code>src</code>, it attempts to build the target <code>dist_array.f90</code>, but it should be <code>ppm/dist_array.f90</code> as defined in <code>src/Makefile.am</code>.</p>
<p>The attached patch fixes this issue.</p> ScalES-PPM - Bug #342 (Resolved): autoreconf fails due to missing file m4/ac_fc_module_output_fla...https://swprojects.dkrz.de/redmine/issues/3422017-09-25T16:17:00ZMatthew Krupcale
<p>When attempting to run <code>autoreconf</code> on <code>ppm-1.0.4</code>, <code>aclocal</code> fails with error:</p>
<p><code>aclocal: error: acinclude.m4:52: file 'm4/ac_fc_module_output_flag.m4' does not exist</code></p>
<p>Indeed, the file does not appear to be in the <code>m4</code> directory shipped with ScalES-PPM v1.0.4.</p>
<p>Furthermore, <code>acinclude.m4</code> attempts to include this file only if the <code>autoconf</code> version is less than <code>2.70</code> (which does not yet exist), but <code>AC_FC_MODULE_OUTPUT_FLAG</code> was introduced in version <code>2.69</code> (specifically commit <code>ac427166c5945445e307c82d44301da9480f017a</code>).</p>
<p>The attached patch fixes these two issues.</p> YAXT - Feature #341 (New): Call MPI_Testany from time to time to improve performancehttps://swprojects.dkrz.de/redmine/issues/3412017-08-25T14:23:05ZMoritz Hanke
<p>While doing some tests with YAC, I noticed that: when setting up a list of MPI_Isends, the performance improved, if from time to time MPI_Testany is called (MPI_Testsome did produce better results).</p>
<p>We should check whether YAXT could also benefit from this.</p>
<p>The following is a pseudo code showing the idea, which is based an the Routine psmile_bsend written by Hubert Ritzdorf for OASIS4.</p>
<pre>
int num_open_requests = 0
MPI_Requests requests[total_num_send_recv_msgs]
for all send/recv msgs
MPI_Request * request = &requests[num_open_requests++]
set up MPI_Isend/MPI_Irecv
int flag = 1, idx
while (flag && (num_open_requests >= 64))
MPI_Testany(num_open_requests, requests, &idx, &flag, MPI_STATUS_IGNORE)
if (flag && (idx != MPI_UNDEFINED))
requests[idx] = requests[--num_open_requests]
MPI_Waitall(num_open_requests, requests, MPI_STATUSES_IGNORE)
</pre> YAXT - Feature #340 (New): Improve message send order in Xt_exchangerhttps://swprojects.dkrz.de/redmine/issues/3402017-08-25T08:03:24ZMoritz Hanke
<p>Under certain circumstances the message order produced by the routine xt_exchanger_internal_optimize might not be optimal.<br />Example:<br />Program with a total of 20 proces. Each of the first 10 procs sends a message to the last 10 proces.</p>
<p>We might have to think about another way of improving the message order.<br />An Evolutionary algorithm could be a solution.</p> YAXT - Feature #338 (New): User-creatable index listshttps://swprojects.dkrz.de/redmine/issues/3382016-02-05T16:23:47ZThomas Jahnsjahns@dkrz.de
<p>Users should be able to create index list classes if they can describe their indices more succinct this way. The following parts would be needed for such:</p>
<ul>
<li>Registering a pack tag and making sure it's identical on all ranks.</li>
<li>Registering an unpack function.</li>
<li>Optionally registering intersection function(s) for different (other) index lists and extending the table in xt_idxlist_intersection.c correspondingly.</li>
</ul>
<p>Anything I forgot?</p> YAXT - Feature #337 (New): check possibility of usage of mpi_type_create_hvector in xt_redist_col...https://swprojects.dkrz.de/redmine/issues/3372015-04-27T12:34:37ZJoerg Behrensbehrens@dkrz.de
<p>If the sequence of redists given in the constructor always uses the same redist then we should reduce the created datatype to a simple form (using mpi_type_create_hvector instead of MPI_Type_create_struct).</p> YAXT - Task #336 (New): Compare performance of YAXT and other coupling middlewarehttps://swprojects.dkrz.de/redmine/issues/3362015-04-24T13:13:20ZThomas Jahnsjahns@dkrz.de
<p>The following projects deliver partially similar functionality and should be investigated for relative performance:</p>
<ul>
<li>MCT</li>
<li>ESMF</li>
</ul> YAXT - Bug #335 (New): passing pointers of zero-size arrays to redistshttps://swprojects.dkrz.de/redmine/issues/3352015-04-23T09:45:59ZMoritz Hanke
<p>The addresses of arrays passed to xt_redist_s_exchange can be NULL, when the size of the array is zero. This can cause problems especially when redist collections are used.<br />For zero-sized arrays a redist should not contain any message. When this happens we might have to consider applying some kind of special handling.<br />When the Fortran interface is used, no appropriate C_LOC-pointer can be generated.</p>
Tasks:
<ul>
<li>reproduce problem with test (only occurs with certain MPIs)</li>
<li>fix problem of define that NULL pointers are not allowed</li>
</ul> YAXT - Feature #333 (Resolved): Caching of communicatorshttps://swprojects.dkrz.de/redmine/issues/3332014-08-06T16:29:45ZThomas Jahnsjahns@dkrz.de
<p>YAXT currently creates communicators internally to provide isolation from other parts of the system. This can be potentially costly when it introduces additional synchronization. For this reason it seems sensible to cache previously created communicators instead of destroying them immediately.</p>
<p>An alternative scheme requires managing tags in the library more closely and is potentially less resource intensive (depending on how costly a communicator is).</p> YAXT - Bug #331 (New): Make xt_xmap_distdir work when less data than expected gets packedhttps://swprojects.dkrz.de/redmine/issues/3312014-07-31T08:37:03ZThomas Jahnsjahns@dkrz.de
<p>commit:daacfe17cb9124f4f0f3763858cc94ff666efb4a fixes a problem in xmap_all2all that is also present in the distributed directory variant: if the get_pack_size method of an index list returns a value that is larger than the actual advance of position that happens due to the pack method, distdir fails.</p>
<p>Steps to reproduce: simply add one to the MPI_Packsize count argument of e.g. source:src/xt_idxvec.c#L369</p> libaec - Feature #330 (New): Come up with soname and soversionhttps://swprojects.dkrz.de/redmine/issues/3302014-05-21T21:35:12ZThomas Jahnsjahns@dkrz.de
<p>To include the library with Debian, the Debian Science Maintainers ask us to provide sonames and versions for shared libraries. See <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=740613" class="external">the corresponding Debian bug report to include libaec in Debian</a></p>