A New-Generation Parallel Computer and its Performance Evaluation

S.G. Ziavras, H. Grebel, A.T. Chronopoulos, and F. Marcelli

ABSTRACT, Future Generation Computer Systems, 2000

An innovative design is proposed for an MIMD distributed shared-memory (DSM) parallel computer capable of 
achieving gracious performance with technology expected to become feasible/viablein less than a decade. 
This New Millennium Computing Point Design was chosen by NSF, DARPA, and NASA as having the potential to
deliver 100 TeraFLOPS and 1 PetaFLOPS performance by the year 2005 and 2007,
respectively. Its scalability guarantees a lifetime extending well into the next century.
Our design takes advantage of free-space optical technologies, with simple guided-wave concepts, to produce a 1-D 
building block (BB) that implements efficiently a large, fully-connected system of processors. Designing
fully-connected, large systems of electronic processors could be a very beneficial impact of optics on 
massively-parallel processing. A 2-D structure is proposed for the complete system, where the
aforementioned 1-D BB is extended into two dimensions. This architecture behaves like 
a 2-D generalized hypercube, which is characterized by outstanding performance and extremely high wiring
complexity that prohibits its electronics-only implementation. With readily available technology, a mesh of clear 
plastic/glass bars in our design facilitate point-to-point bit-parallel transmissions 
that utilize wavelength-division multiplexing (WDM) and follow dedicated optical paths.
Each processor is mounted on a card. Each card contains eight processors interconnected locally via an 
electronic crossbar. Taking advantage of higher-speed optical technologies, all eight processors share the
same communications interface to the optical medium using time-division multiplexing (TDM). A case study for 100 TeraFLOPS

performance by the year 2005 is investigated in detail; the characteristics of chosen hardware components in the case study

conform to SIA (Semiconductor Industry Association) projections. An impressive property of our system is that its

bisection bandwidth matches, within an order of magnitude, the performance of its computation engine.

Performance results based on the implementation of various important algorithmic kernels show that

our design could have a tremendous, positive impact on massively-parallel computing.

2-D and 3-D implementations of our design could achieve gracious (i.e.,   sustained) PetaFLOPS performance before the end of the next decade.

 


* Return to the "selected publications" page

* Return to my home page