Minimizing the Communication Time for Matrix Multiplication on Multiprocessors
1993 (English)In: Parallel Computing, ISSN 0167-8191, Vol. 19, no 11, 1235-1257 p.Article in journal (Refereed) Published
We present one matrix multiplication algorithm for two-dimensional arrays of processing nodes, and one algorithm for three-dimensional nodal arrays. One-dimensional nodal arrays are treated as a degenerate case. The algorithms are designed to utilize fully the communications bandwidth in high degree networks in which the one-, two-, or three-dimensional arrays may be embedded. For binary n-cubes, our algorithms offer a speedup of the communication over previous algorithms for square matrices and square two-dimensional arrays by a factor of n/2. Configuring the N= 2(n) processing nodes as a three-dimensional array may reduce the communication complexity by a factor of N-1/6 compared to a two-dimensional nodal array. The three-dimensional algorithm requires temporary storage proportional to the length of the nodal array axis aligned with the axis shared between the multiplier and the multiplicand. The optimal two-dimensional nodal array shape with respect to communication has a ratio between the numbers of node rows and columns equal to the ratio between the numbers of matrix rows and columns of the product matrix, with the product matrix accumulated in-place. The optimal three-dimensional nodal array shape has a ratio between the lengths of the machine axes equal approximately to the ratio between the lengths of the three axes in matrix multiplication. For product matrices of extreme shape, one-dimensional nodal array shapes are optimal when N/n less than or similar to 2 P/R for P > R, or N/n less than or similar to 2R/P for R greater than or equal to P, where P is the number of rows and R the number of columns of the product matrix. All our algorithms use standard communication functions.
Place, publisher, year, edition, pages
1993. Vol. 19, no 11, 1235-1257 p.
Matrix multiplication; array of processor nodes; complexity analysis; optimal communication; binary-cube networks; experimental implementation
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-90997DOI: 10.1016/0167-8191(93)90029-KOAI: oai:DiVA.org:kth-90997DiVA: diva2:507651
NR 201408052012-03-052012-03-052012-03-05Bibliographically approved