Block Cyclic Dense Linear Algebra
1993 (English)In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 14, no 6, 1257-1286 p.Article in journal (Refereed) Published
Block-cyclic order elimination algorithms for LU and QR factorization and solve routines are described for distributed memory architectures with processing nodes configured as two-dimensional arrays of arbitrary shape. The cyclic-order elimination, together with a consecutive data allocation, yields good load balance for both the factorization and solution phases for the solution of dense systems of equations by LU and QR decomposition. Blocking may offer a substantial performance enhancement on architectures for which the level-2 or level-3 BLAS (basic linear algebra subroutines) are ideal for operations local to a node. High-rank updates local to a node may have a performance that is a factor of four or more higher than a rank-1 update. This paper shows that in many parallel implementations, the O(N2) work in the factorization may be of the same significance as the O(N3) work, even for large matrices. The O(N2) work is poorly load balanced in two-dimensional nodal arrays, which are shown to be optimal with respect to communication for consecutive data allocation, block-cyclic order elimination, and a simple, but fairly general, communications model. In this Connection Machine system CM-200 implementation, the peak performance for LU factorization is about 9.4 Gflops/s in 64-bit precision and 16 Gflops/s in 32-bit precision. Blocking offers an overall performance enhancement of an approximate factor of two. The broadcast-and-reduce operations fully utilize the bandwidth available in the Boolean cube network interconnecting the nodes along each axis of the two-dimensional nodal array embedded in the cube network.
Place, publisher, year, edition, pages
1993. Vol. 14, no 6, 1257-1286 p.
LINEAR ALGEBRA; DISTRIBUTED MEMORY; LU; QR; CYCLIC ORDER
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-90998DOI: 10.1137/0914075OAI: oai:DiVA.org:kth-90998DiVA: diva2:507652
NR 201408052012-03-052012-03-05Bibliographically approved