Change search
ReferencesLink to record
Permanent link

Direct link
Block Cyclic Dense Linear Algebra
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1993 (English)In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 14, no 6, 1257-1286 p.Article in journal (Refereed) Published
Abstract [en]

 Block-cyclic order elimination algorithms for LU and QR factorization and solve routines are described for distributed memory architectures with processing nodes configured as two-dimensional arrays of arbitrary shape. The cyclic-order elimination, together with a consecutive data allocation, yields good load balance for both the factorization and solution phases for the solution of dense systems of equations by LU and QR decomposition. Blocking may offer a substantial performance enhancement on architectures for which the level-2 or level-3 BLAS (basic linear algebra subroutines) are ideal for operations local to a node. High-rank updates local to a node may have a performance that is a factor of four or more higher than a rank-1 update. This paper shows that in many parallel implementations, the O(N2) work in the factorization may be of the same significance as the O(N3) work, even for large matrices. The O(N2) work is poorly load balanced in two-dimensional nodal arrays, which are shown to be optimal with respect to communication for consecutive data allocation, block-cyclic order elimination, and a simple, but fairly general, communications model. In this Connection Machine system CM-200 implementation, the peak performance for LU factorization is about 9.4 Gflops/s in 64-bit precision and 16 Gflops/s in 32-bit precision. Blocking offers an overall performance enhancement of an approximate factor of two. The broadcast-and-reduce operations fully utilize the bandwidth available in the Boolean cube network interconnecting the nodes along each axis of the two-dimensional nodal array embedded in the cube network.

Place, publisher, year, edition, pages
1993. Vol. 14, no 6, 1257-1286 p.
Keyword [en]
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-90998DOI: 10.1137/0914075OAI: diva2:507652
NR 20140805Available from: 2012-03-05 Created: 2012-03-05Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
In the same journal
SIAM Journal on Scientific Computing
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 13 hits
ReferencesLink to record
Permanent link

Direct link