A setup for an integer sorting study on multicores.

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher's bitonic sorting algorithm. In this work we present an experimental study of integer sorting on multicore processors. Towards this we have implemented (a) serial and parallel radix-sort for various radixes, named SR4 for the serial radix-256 version, PR4 for the parallel radix-256, and PR2 for the parallel radix-65536 versions, (b) previously little explored or unexplored variants of bitonic-sort named BTN, and odd-even transposition sort named OET respectively, (c) a variant of the random oversampling parallel sorting algorithm of Gerbessiotis&Valiant (JPDC 1994) named GVR, (d) the deterministic regular oversampling parallel sorting algorithm of the Gerbessiotis&Siniolakis algorithm (ACM SPAA 1996 and PPL 1999) named GSD, and (e) a random oversampling parallel sorting algorithm named GER that differs from other random sample sorting approaches in that it follows instead the skeleton of deterministic regular sampling or oversampling methods. The study uses multithreading and multiprocessing parallel programming libraries with the C language implementations working under Open MPI, MulticoreBSP, and BSPlib utilizing the same source code. A secondary objective is to attempt to model the performance of these algorithm implementations under the MBSP (Multi-memory BSP) model. We first provide some general high-level observations on the performance of these implementations. If we can conclude anything is that accurate prediction of performance by taking into consideration architecture dependent features such as the structure and characteristics of multiple memory hierarchies is difficult and more often than not untenable. To some degree this is affected by the overhead imposed by the high-level library used in the programming effort. We can still draw however some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable.

The experimental results described and the algorithms implemented in a paper are available for download, installation, compilation and execution through the following links. Some batch files (files starting with run) can automate the execution of several experiments).

  1. tars/18pisrt.tar Use tar xvf 18pisrt.tar and will create a directory 18pisrt. Read copyright notice copyag.txt. Read then file 18pisrt.txt with some instructions. This involves minimally the editing of avgaaih (to enable or disable BSP or MPI style execution) and if you enable a BSP library execution to decide whether BSPlib or MulticoreBSP is to be used It also provides some instructions in how to install your own OpenMPI installation, BSPlib version 1.4 (an editing of a Makefile is needed) and also installing MulticoreBSP 1.2 in BSP compatibility mode. Beyond that a make -f Makefile.bsl clean;make -f Makefile.bsl all or make -f Makefile.bsp clean;make -f Makefile.bsp all allow you to compile under BSPlib and MulticoreBSP provided you have already installations for the two libraries. For OpenMPI make -f Makefile.mpi clean;make -f Makefile.mpi all works for me.

Last Update: Jul 13, 2018