Change search
ReferencesLink to record
Permanent link

Direct link
Minimizing the Communication Time for Matrix Multiplication on Multiprocessors
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1993 (English)In: Parallel Computing, ISSN 0167-8191, Vol. 19, no 11, 1235-1257 p.Article in journal (Refereed) Published
Abstract [en]

We present one matrix multiplication algorithm for two-dimensional arrays of processing nodes, and one algorithm for three-dimensional nodal arrays. One-dimensional nodal arrays are treated as a degenerate case. The algorithms are designed to utilize fully the communications bandwidth in high degree networks in which the one-, two-, or three-dimensional arrays may be embedded. For binary n-cubes, our algorithms offer a speedup of the communication over previous algorithms for square matrices and square two-dimensional arrays by a factor of n/2. Configuring the N= 2(n) processing nodes as a three-dimensional array may reduce the communication complexity by a factor of N-1/6 compared to a two-dimensional nodal array. The three-dimensional algorithm requires temporary storage proportional to the length of the nodal array axis aligned with the axis shared between the multiplier and the multiplicand. The optimal two-dimensional nodal array shape with respect to communication has a ratio between the numbers of node rows and columns equal to the ratio between the numbers of matrix rows and columns of the product matrix, with the product matrix accumulated in-place. The optimal three-dimensional nodal array shape has a ratio between the lengths of the machine axes equal approximately to the ratio between the lengths of the three axes in matrix multiplication. For product matrices of extreme shape, one-dimensional nodal array shapes are optimal when N/n less than or similar to 2 P/R for P > R, or N/n less than or similar to 2R/P for R greater than or equal to P, where P is the number of rows and R the number of columns of the product matrix. All our algorithms use standard communication functions.

Place, publisher, year, edition, pages
1993. Vol. 19, no 11, 1235-1257 p.
Keyword [en]
Matrix multiplication; array of processor nodes; complexity analysis; optimal communication; binary-cube networks; experimental implementation
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-90997DOI: 10.1016/0167-8191(93)90029-KOAI: diva2:507651
NR 20140805Available from: 2012-03-05 Created: 2012-03-05 Last updated: 2012-03-05Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
In the same journal
Parallel Computing
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 15 hits
ReferencesLink to record
Permanent link

Direct link