Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer
KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. (Parallelldatorcentrum)
1994 (engelsk)Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 20, nr 7, s. 919-951Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or size of the operands. For matrix-matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in-place is described in detail. We show that a level-3 DBLAS yields better performance than a level-2 DBLAS. On the Connection Machine system CM-200, blocking yields a performance improvement by a factor of up to three over level-2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms.

We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix-matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e. the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of five performance enhancement for the multiplication of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.

sted, utgiver, år, opplag, sider
1994. Vol. 20, nr 7, s. 919-951
Emneord [en]
Linear algebra; Matrix multiplication; Nonsystolic algorithm; Systolic algorithm; Distributed BLAS; Connection machine CM-200; Performance results
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-90990DOI: 10.1016/0167-8191(94)90011-6OAI: oai:DiVA.org:kth-90990DiVA, id: diva2:507641
Merknad
NR 20140805Tilgjengelig fra: 2012-03-05 Laget: 2012-03-05 Sist oppdatert: 2018-01-12bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Johnsson, Lennart
Av organisasjonen
I samme tidsskrift
Parallel Computing

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 33 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf