Change search
ReferencesLink to record
Permanent link

Direct link
Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1994 (English)In: Parallel Computing, ISSN 0167-8191, Vol. 20, no 7, 919-951 p.Article in journal (Refereed) Published
Abstract [en]

Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or size of the operands. For matrix-matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in-place is described in detail. We show that a level-3 DBLAS yields better performance than a level-2 DBLAS. On the Connection Machine system CM-200, blocking yields a performance improvement by a factor of up to three over level-2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms.

We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix-matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e. the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of five performance enhancement for the multiplication of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.

Place, publisher, year, edition, pages
1994. Vol. 20, no 7, 919-951 p.
Keyword [en]
Linear algebra; Matrix multiplication; Nonsystolic algorithm; Systolic algorithm; Distributed BLAS; Connection machine CM-200; Performance results
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-90990DOI: 10.1016/0167-8191(94)90011-6OAI: diva2:507641
NR 20140805Available from: 2012-03-05 Created: 2012-03-05Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
In the same journal
Parallel Computing
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 12 hits
ReferencesLink to record
Permanent link

Direct link