Professor Sotirios G. Ziavras

A New-Generation Parallel Computer and its Performance Evaluation

S.G. Ziavras, H. Grebel, A.T. Chronopoulos, and F. Marcelli

ABSTRACT, Future Generation Computer Systems, 2000

An innovative design is proposed for an MIMD distributed shared-memory (DSM) parallel computer capable of

achieving gracious performance with technology expected to become feasible/viablein less than a decade.

This New Millennium Computing Point Design was chosen by NSF, DARPA, and NASA as having the potential to

deliver 100 TeraFLOPS and 1 PetaFLOPS performance by the year 2005 and 2007,

respectively. Its scalability guarantees a lifetime extending well into the next century.

Our design takes advantage of free-space optical technologies, with simple guided-wave concepts, to produce a 1-D

building block (BB) that implements efficiently a large, fully-connected system of processors. Designing

fully-connected, large systems of electronic processors could be a very beneficial impact of optics on

massively-parallel processing. A 2-D structure is proposed for the complete system, where the

aforementioned 1-D BB is extended into two dimensions. This architecture behaves like

a 2-D generalized hypercube, which is characterized by outstanding performance and extremely high wiring

complexity that prohibits its electronics-only implementation. With readily available technology, a mesh of clear

plastic/glass bars in our design facilitate point-to-point bit-parallel transmissions

that utilize wavelength-division multiplexing (WDM) and follow dedicated optical paths.

Each processor is mounted on a card. Each card contains eight processors interconnected locally via an

electronic crossbar. Taking advantage of higher-speed optical technologies, all eight processors share the

same communications interface to the optical medium using time-division multiplexing (TDM). A case study for 100 TeraFLOPS

performance by the year 2005 is investigated in detail; the characteristics of chosen hardware components in the case study

conform to SIA (Semiconductor Industry Association) projections. An impressive property of our system is that its

bisection bandwidth matches, within an order of magnitude, the performance of its computation engine.

Performance results based on the implementation of various important algorithmic kernels show that

our design could have a tremendous, positive impact on massively-parallel computing.

2-D and 3-D implementations of our design could achieve gracious (i.e., sustained) PetaFLOPS performance before the end of the next decade.