Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Solving Tridiagonal Systems on Ensemble Architectures
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1987 (English)In: SIAM Journal on Scientific and Statistical Computing, Vol. 8, no 3, p. 354-392Article in journal (Refereed) Published
Abstract [en]

The concurrent solution of tridiagonal systems on linear and 2-dimensional arrays, complete binary trees, shuffle-exchange and perfect shuffle networks, and boolean cubes by elimination methods are devised and analyzed. The methods can be obtained by symmetric permutations of some rows and columns, and amounts to cyclic reduction or a combination of Gaussian elimination and cyclic reduction (GECR). The ensembles have only local storage and no global control. Synchronization is accomplished via message passing to neighboring processors. The parallel arithmetic complexity of GECR for $N$ equations on a $K$ processor ensemble is $O({ N / K } + \log _2 K)$, and the communication complexity is $O(K)$ for the linear array, $O(\sqrt K )$ for the 2-dimensional mesh, and $O(\log _2 K)$ for the networks of diameter $O(\log _2 K)$. The maximum speed-up for the linear array is attained at $K \approx {({ N / \alpha })}^{1/2} $ and for the 2-d mesh at $K \approx ({N / 2\alpha })^{2/3} $, where $\alpha $ (the time to communicate one floating-point number)/(the time for a floating-point arithmetic operation). For the binary tree the maximum speed-up is attained at $K = N$, and for the perfect shuffle and boolean $k$-cube networks, $K = N/(1 + \alpha )$ yields the maximum speed-up. The minimum time complexity is of order $O(N^{1/2} )$ for the linear array, of order $O(N^{1/3} )$ for the mesh, and of order $O(\log _2 N)$ for the binary tree, the shuffle-exchange, the perfect shuffle and the boolean $k$-cube. The relative decrease in computational complexity due to a truncation of the reduction process in a highly concurrent system is much greater than on a uniprocessor. The reduction in the arithmetic complexity is proportional to the number of steps avoided, if the number of processing elements equals the number of equations. So also is the reduction in the communication complexity for ensembles configured as binary trees, shuffle-exchange and perfect shuffle networks, and boolean cubes. Partitioning the ensemble into subsets of processors is shown to be more efficient for the solution of multiple independent problems than pipelining the solutions over the entire ensemble. A balanced cyclic reduction algorithm is presented for the case where each system is spread uniformly over the processing elements, and its complexity is compared with Gaussian elimination.

Place, publisher, year, edition, pages
1987. Vol. 8, no 3, p. 354-392
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-91067DOI: 10.1137/0908040OAI: oai:DiVA.org:kth-91067DiVA, id: diva2:507924
Note
NR 20140805Available from: 2012-03-06 Created: 2012-03-06 Last updated: 2018-01-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 56 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf