Change search

Cite
Citation style
• apa
• harvard1
• ieee
• modern-language-association-8th-edition
• vancouver
• Other style
More styles
Language
• de-DE
• en-GB
• en-US
• fi-FI
• nn-NO
• nn-NB
• sv-SE
• Other locale
More languages
Output format
• html
• text
• asciidoc
• rtf
Solving Tridiagonal Systems on Ensemble Architectures
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1987 (English)In: SIAM Journal on Scientific and Statistical Computing, Vol. 8, no 3, 354-392 p.Article in journal (Refereed) Published
##### Abstract [en]

The concurrent solution of tridiagonal systems on linear and 2-dimensional arrays, complete binary trees, shuffle-exchange and perfect shuffle networks, and boolean cubes by elimination methods are devised and analyzed. The methods can be obtained by symmetric permutations of some rows and columns, and amounts to cyclic reduction or a combination of Gaussian elimination and cyclic reduction (GECR). The ensembles have only local storage and no global control. Synchronization is accomplished via message passing to neighboring processors. The parallel arithmetic complexity of GECR for $N$ equations on a $K$ processor ensemble is $O({ N / K } + \log _2 K)$, and the communication complexity is $O(K)$ for the linear array, $O(\sqrt K )$ for the 2-dimensional mesh, and $O(\log _2 K)$ for the networks of diameter $O(\log _2 K)$. The maximum speed-up for the linear array is attained at $K \approx {({ N / \alpha })}^{1/2}$ and for the 2-d mesh at $K \approx ({N / 2\alpha })^{2/3}$, where $\alpha$ (the time to communicate one floating-point number)/(the time for a floating-point arithmetic operation). For the binary tree the maximum speed-up is attained at $K = N$, and for the perfect shuffle and boolean $k$-cube networks, $K = N/(1 + \alpha )$ yields the maximum speed-up. The minimum time complexity is of order $O(N^{1/2} )$ for the linear array, of order $O(N^{1/3} )$ for the mesh, and of order $O(\log _2 N)$ for the binary tree, the shuffle-exchange, the perfect shuffle and the boolean $k$-cube. The relative decrease in computational complexity due to a truncation of the reduction process in a highly concurrent system is much greater than on a uniprocessor. The reduction in the arithmetic complexity is proportional to the number of steps avoided, if the number of processing elements equals the number of equations. So also is the reduction in the communication complexity for ensembles configured as binary trees, shuffle-exchange and perfect shuffle networks, and boolean cubes. Partitioning the ensemble into subsets of processors is shown to be more efficient for the solution of multiple independent problems than pipelining the solutions over the entire ensemble. A balanced cyclic reduction algorithm is presented for the case where each system is spread uniformly over the processing elements, and its complexity is compared with Gaussian elimination.

##### Place, publisher, year, edition, pages
1987. Vol. 8, no 3, 354-392 p.
##### National Category
Computer and Information Sciences
##### Identifiers
DOI: 10.1137/0908040OAI: oai:DiVA.org:kth-91067DiVA: diva2:507924
##### Note
NR 20140805Available from: 2012-03-06 Created: 2012-03-06 Last updated: 2018-01-12Bibliographically approved

#### Open Access in DiVA

No full text

Publisher's full text

#### Search in DiVA

##### By author/editor
Johnsson, Lennart
##### By organisation
Centre for High Performance Computing, PDC
##### On the subject
Computer and Information Sciences

doi
urn-nbn

#### Altmetric score

doi
urn-nbn
Total: 51 hits

Cite
Citation style
• apa
• harvard1
• ieee
• modern-language-association-8th-edition
• vancouver
• Other style
More styles
Language
• de-DE
• en-GB
• en-US
• fi-FI
• nn-NO
• nn-NB
• sv-SE
• Other locale
More languages
Output format
• html
• text
• asciidoc
• rtf
v. 2.30.0
| | | |