Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage
1991 (English)Conference paper (Refereed)
For matrix multiplication on hypercube multiproces- sors with the product matrix accumulated in place a processor must receive about P2= p N elements of each input operand, with operands of size P P distributed evenly over N processors. With concurrent communi- cation on all ports, the number of element transfers in sequence can be reduced to P2= p N logN for each input operand. We present a two-level partitioning of the matrices and an algorithm for the matrix multipli- cation with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P3, and parallel arithmetic complexity 2P 3=N. The algorithm has been implemented on the Connection Machine model CM-2. For the performance on the 8K CM-2, we measured about 1.6 G ops, which would scale up to about 13 G ops for a 64K full machine.
Place, publisher, year, edition, pages
1991. 447-451 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-65263OAI: oai:DiVA.org:kth-65263DiVA: diva2:483281
Sixth Distributed Memory Computing Conference
NR 201408052012-01-252012-01-25Bibliographically approved