Change search
ReferencesLink to record
Permanent link

Direct link
Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1991 (English)Conference paper (Refereed)
Abstract [en]

For matrix multiplication on hypercube multiproces- sors with the product matrix accumulated in place a processor must receive about P2= p N elements of each input operand, with operands of size P P distributed evenly over N processors. With concurrent communi- cation on all ports, the number of element transfers in sequence can be reduced to P2= p N logN for each input operand. We present a two-level partitioning of the matrices and an algorithm for the matrix multipli- cation with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P3, and parallel arithmetic complexity 2P 3=N. The algorithm has been implemented on the Connection Machine model CM-2. For the performance on the 8K CM-2, we measured about 1.6 G ops, which would scale up to about 13 G ops for a 64K full machine.

Place, publisher, year, edition, pages
1991. 447-451 p.
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-65263OAI: diva2:483281
Sixth Distributed Memory Computing Conference
NR 20140805Available from: 2012-01-25 Created: 2012-01-25Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 76 hits
ReferencesLink to record
Permanent link

Direct link