Matrix Multiplication on the Connection Machine
1989 (English)Conference paper (Refereed)
A data parallel implementation of the multiplication of matrices of arbitrary shapes and sizes is presented. A systolic algorithm based on a rectangular processor layout is used by the implementation. All processors contain submatrices of the same size for a given operand. Matrix-vector multiplication is used as a primitive for local matrix-matrix multiplication in the Connection Machine system CM-2 implementation. The peak performance of the local matrix-matrix multiplication is in excess of 20 Gflops s-1. The overall algorithm including all required data motion has a peak performance of 5.8 Gflops s-1.
Place, publisher, year, edition, pages
1989. 326-332 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-65562DOI: 10.1145/76263.76298OAI: oai:DiVA.org:kth-65562DiVA: diva2:483484
NR 201408052012-01-252012-01-25Bibliographically approved