Broadcasting and Paralel Prefix operations

Broadcasting Operations A variety of functions have been implemented on top of BSPlib (v1.4) and LAM-MPI (on top of MPI-2 primitives MPI_Put and MPI_Get). Named functions have been tested under both libraries.

  1. Binary and d-ary tree replication of scalars and arrays of arbitrary size.
  2. 2-phase algorithm arrays of arbitrary size. This algorithm is preferable on large arrays since it only requires two supersteps (BSP speaking) or two rounds of communication. In each round an n-relation is realized (each processor sends/receives n elements where n is the array size).
  3. Wrapper broadcasting functions that chooses the best ones (2-phase vs d-ary tree and in the latter case choosed the value of d). However at the moment the choices are too generic and unreliable as they depend on the bsp parameters of an individual cluster and the default values are obsolete.

Parallel Prefix Operations Functions for binary and d-ary parallel prefix, including an implementation of a generalization of the Ladner-Fisher algorithm (with binary and d-ary combining). An implementation of a generalization of the butterfly algorithm is also included.

  1. Binary(Ladner-Fisher oriented) and d-ary tree (Ladner Fisher variant) parallel prefix (scan) on arrays of arbitrary size (one/scalar or more). The scan operator needs to be associative. The implementation is ugly and requires for rounds of phases instead of the Ladner-Fisher 2-round algorithm (up and down the tree). However the algorithm is quite memory efficient. In the d-ary case each processor sends 1 word and receives at most d words per superstep.
  2. 2-phase algorithm on arrays of arbitrary size. This algorithm is preferable on large arrays since it only requires two supersteps (BSP speaking) or two rounds of communication. In each round an n-relation is realized (each processor sends/receives n elements where n is the array size).
  3. Binary (Butterfly algorithm oriented) and d-ary tree (butterfly algorithm oriented) parallel prefix (scan) on arrays of arbitrary size (one/scalar or more). The scan operator MUST BE COMMUTATIVE. The algorithm is based on the butterfly algorithm (embedding of a binary tree on the butterfly) and is generalized (in the d-tree case). The difference between this and the first algorithm is a reduction in the phases and rounds of communication (half as many). However each processor sends and receives d words per superstep.
  4. Faithful Ladner-Fisher binary and d-ary tree extension parallel prefix (scan) on arrays of arbitrary size (one/scalar or more). The scan operator needs to be associative. Two phases up and down the tree, but to achieve this a penalty is paid in memory efficiency terms. The algorithms seems to suffer because of this from significant performance degradation. We plan to clean it up in the future.
  5. Wrapper scan/prefix functions that chooses the best ones (2-phase vs d-ary tree and in the latter case choosed the value of d). However at the moment the choices are too generic and unreliable as they depend on the bsp parameters of an individual cluster and the default values are obsolete.

    tars/bp03v2.tar An earlier version is tars/bp03v1.tar

Last Update: April 24, 2003