Change search
ReferencesLink to record
Permanent link

Direct link
Solving Tridiagonal Systems on Ensemble Architectures
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1987 (English)In: SIAM Journal on Scientific and Statistical Computing, Vol. 8, no 3, 354-392 p.Article in journal (Refereed) Published
Abstract [en]

The concurrent solution of tridiagonal systems on linear and 2-dimensional arrays, complete binary trees, shuffle-exchange and perfect shuffle networks, and boolean cubes by elimination methods are devised and analyzed. The methods can be obtained by symmetric permutations of some rows and columns, and amounts to cyclic reduction or a combination of Gaussian elimination and cyclic reduction (GECR). The ensembles have only local storage and no global control. Synchronization is accomplished via message passing to neighboring processors. The parallel arithmetic complexity of GECR for $N$ equations on a $K$ processor ensemble is $O({ N / K } + \log _2 K)$, and the communication complexity is $O(K)$ for the linear array, $O(\sqrt K )$ for the 2-dimensional mesh, and $O(\log _2 K)$ for the networks of diameter $O(\log _2 K)$. The maximum speed-up for the linear array is attained at $K \approx {({ N / \alpha })}^{1/2} $ and for the 2-d mesh at $K \approx ({N / 2\alpha })^{2/3} $, where $\alpha $ (the time to communicate one floating-point number)/(the time for a floating-point arithmetic operation). For the binary tree the maximum speed-up is attained at $K = N$, and for the perfect shuffle and boolean $k$-cube networks, $K = N/(1 + \alpha )$ yields the maximum speed-up. The minimum time complexity is of order $O(N^{1/2} )$ for the linear array, of order $O(N^{1/3} )$ for the mesh, and of order $O(\log _2 N)$ for the binary tree, the shuffle-exchange, the perfect shuffle and the boolean $k$-cube. The relative decrease in computational complexity due to a truncation of the reduction process in a highly concurrent system is much greater than on a uniprocessor. The reduction in the arithmetic complexity is proportional to the number of steps avoided, if the number of processing elements equals the number of equations. So also is the reduction in the communication complexity for ensembles configured as binary trees, shuffle-exchange and perfect shuffle networks, and boolean cubes. Partitioning the ensemble into subsets of processors is shown to be more efficient for the solution of multiple independent problems than pipelining the solutions over the entire ensemble. A balanced cyclic reduction algorithm is presented for the case where each system is spread uniformly over the processing elements, and its complexity is compared with Gaussian elimination.

Place, publisher, year, edition, pages
1987. Vol. 8, no 3, 354-392 p.
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-91067DOI: 10.1137/0908040OAI: diva2:507924
NR 20140805Available from: 2012-03-06 Created: 2012-03-06Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 14 hits
ReferencesLink to record
Permanent link

Direct link