Change search
ReferencesLink to record
Permanent link

Direct link
All–to–all Communication Algorithms for Distributed BLAS
KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. (Parallelldatorcentrum)
1993 (English)Conference paper (Refereed)
Abstract [en]

based on all--to--all broadcast and all--to--all reduce are presented. For DBLAS, at each all--to--all step, it is necessary to know the data values and the indices of the data values as well. This is in contrast to the more traditional applications of all--to--all broadcast (such as a N--body solver) where the identity of the data values is not of much interest. Detailed schedules for all--to--all broadcast and reduction are given for the data motion of arrays mapped to the processing nodes of binary cube networks using binary encoding and binary--reflected Gray encoding. The algorithms compute the indices for the communicated data locally. No communication bandwidth is consumed for data array indices. For the Connection Machine system CM--200, Hamiltonian cycle based all--to--all communication algorithms improve the performance by a factor of two to ten over a combination of tree, butterfly network, and router based algorithms. The data rate achieved for all--to--all broadcast on a 256 node Connection Machine system CM--200 is 0.3 Gbytes/sec. The data motion rate for all--to--all broadcast, including the time for index computations and local data reordering, is about 2.8 Gbytes/sec for a 2048 node system. Excluding the time for index computation and local memory reordering the measured data motion rate for all--to--all broadcast is 5.6 Gbytes/s. On a Connection Machine system, CM--200, with 2048 processing nodes, the overall performance of the distributed matrix vector multiply (DGEMV) and vector matrix multiply (DGEMV with TRANS) is 10.5 Gflops/s and 13.7 Gflops/s respectively.

Place, publisher, year, edition, pages
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-65264OAI: diva2:483284
6th SIAM Conference on Parallel Processing for Scientific Computing
NR 20140805Available from: 2012-01-25 Created: 2012-01-25Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Johnsson, Lennart
By organisation
Centre for High Performance Computing, PDC
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 10 hits
ReferencesLink to record
Permanent link

Direct link