A New-Generation Parallel Computer and its Performance Evaluation
S.G. Ziavras, H. Grebel, A.T. Chronopoulos, and F. Marcelli
ABSTRACT, Future Generation Computer Systems, 2000
An innovative design is proposed for an MIMD distributed shared-memory (DSM) parallel computer capable ofachieving gracious performance with technology expected to become feasible/viablein less than a decade.
This New Millennium Computing Point Design was chosen by NSF, DARPA, and NASA as having the potential to
deliver 100 TeraFLOPS and 1 PetaFLOPS performance by the year 2005 and 2007,
respectively. Its scalability guarantees a lifetime extending well into the next century.
Our design takes advantage of free-space optical technologies, with simple guided-wave concepts, to produce a 1-D
building block (BB) that implements efficiently a large, fully-connected system of processors. Designing
fully-connected, large systems of electronic processors could be a very beneficial impact of optics on
massively-parallel processing. A 2-D structure is proposed for the complete system, where the
aforementioned 1-D BB is extended into two dimensions. This architecture behaves like
a 2-D generalized hypercube, which is characterized by outstanding performance and extremely high wiring
complexity that prohibits its electronics-only implementation. With readily available technology, a mesh of clear
plastic/glass bars in our design facilitate point-to-point bit-parallel transmissions
that utilize wavelength-division multiplexing (WDM) and follow dedicated optical paths.
Each processor is mounted on a card. Each card contains eight processors interconnected locally via an
electronic crossbar. Taking advantage of higher-speed optical technologies, all eight processors share the
same communications interface to the optical medium using time-division multiplexing (TDM). A case study for 100 TeraFLOPSperformance by the year 2005 is investigated in detail; the characteristics of chosen hardware components in the case study
conform to SIA (Semiconductor Industry Association) projections. An impressive property of our system is that its
bisection bandwidth matches, within an order of magnitude, the performance of its computation engine.
Performance results based on the implementation of various important algorithmic kernels show that
our design could have a tremendous, positive impact on massively-parallel computing.
2-D and 3-D implementations of our design could achieve gracious (i.e., sustained) PetaFLOPS performance before the end of the next decade.
Return to the "selected publications" page