REDUCED HYPERCUBE INTERCONNECTION NETWORKS FOR MASSIVELY PARALLEL COMPUTERS

Sotirios G. Ziavras

Several networks have been proposed for the interconnection of processors in parallel computers. The (direct binary) hypercube has been one of the most popular interconnection networks because it provides small diameter, and is so robust that it can simulate very efficiently a wide variety of other frequently used interconnection networks. Nevertheless, a major drawback of the hypercube is the dramatic increase in the total number of communication channels with any increase in the total number of processors in the system. This drawback has significant impact on the VLSI (very large scale integration) complexity of hypercube systems. For this reason, existing commercial hypercube systems have either a large number of primitive processors (e.g., the Connection Machine system CM-2 has 65,536 simple processors) or a relatively small number of powerful processors (e.g., the nCUBE with 1,024 processors). These systems, however, cannot achieve 1 Teraflop (flop: floating-point operations per second) performance; the development of technologies for the construction of massively parallel systems (i.e., systems with thousands of processors) that could achieve 1 Teraflop peak performance is a significant goal of the U.S. government in the HPCC (High Performance Computing and Communications) program. 1 Teraflop performance is required by many important applications, such as accurate weather prediction, simulation of physical phenomena (e.g., molecular dynamics, high energy physics, plasma physics, astrophysics), aerodynamics (e.g., design of new aircraft), simulation of neural networks, simulation of chips, structural analysis, real-time image processing and robotics, artificial intelligence, seismology, animation, real-time processing of large databases, etc. This performance goal could be achieved in this decade only with systems containing thousands of powerful processors. Despite the hypercube's popularity, its high VLSI complexity proves to be a major drawback toward the goal of implementing high performance, massively parallel systems.

Several hypercube variations have been proposed in the literature. Nevertheless, several of them were developed to improve the topological properties of the hypercube. It is well-known that the hypercube does have very good topological properties, therefore, the viability of the latter variations is questionable; in fact, they often increase its already high VLSI complexity! Other variations employ hypercube building blocks that contain a special communication processor. Building blocks are connected together through their communication processors. Not only are such structures irregular but for large building blocks the employment of the communication processors results in substantial bottlenecks.

In contrast, I have introduced the family of reduced hypercube (RH) interconnection networks that have smaller VLSI complexity than hypercubes of similar size (i.e., with the same number of processors). RHs are produced from regular hypercubes by a uniform reduction in the number of channels(edges) for each processor(node)^*. This edge reduction technique produces networks with lower VLSI complexity than hypercubes that maintain, to a large extent, the powerful hypercube properties. Because of their reduced VLSI complexity, RHs facilitate the construction of massively parallel systems with powerful processors.

Extensive comparison of RH interconnection networks with conventional hypercubes of the same size has proven that they achieve comparable performance. It has been also shown that any RH can simulate simultaneously in an optimal manner several popular cube-connected cycles interconnection networks. Many algorithms have been developed for cube-connected cycles networks; these algorithms can be easily adapted for execution on RH systems. The performance of RHs for important algorithms, such as data broadcasting/reduction, prefix computation, and sorting, has been investigated and the results are very encouraging. Additionally, techniques have been developed for the efficient embedding of linear arrays, multidimensional meshes, and binary trees into RHs; these are data structures commonly used in the development of parallel algorithms. Finally, generalized reduced hypercube interconnection networks have been introduced for even more versatility.

* An RH can also be obtained by substituting a hypercube for each node in another hypercube; distinct subcubes in the former system are then used to implement connections in distinct dimensions of the latter hypercube.

Return to the "summary of recent research contributions" page

Return to my home page

Last updated 11/02/98, SGZ