All–to–All Communication on the Connection Machine system CM–200
1995 (English)In: Scientific Programming, ISSN 1058-9244, E-ISSN 1875-919X, Vol. 4, no 4, 251-273 p.Article in journal (Refereed) Published
based on all--to--all broadcast and all--to--all reduce are presented. For DBLAS, at each all--to--all step, it is necessary to know the data values and the indices of the data values as well. This is in contrast to the more traditional applications of all--to--all broadcast (such as a N--body solver) where the identity of the data values is not of much interest. Detailed schedules for all--to--all broadcast and reduction are given for the data motion of arrays mapped to the processing nodes of binary cube networks using binary encoding and binary--reflected Gray encoding. The algorithms compute the indices for the communicated data locally. No communication bandwidth is consumed for data array indices. For the Connection Machine system CM--200, Hamiltonian cycle based all--to--all communication algorithms improve the performance by a factor of two to ten over a combination of tree, butterfly network, and router based algorithms. The data rate achieved for all--to--all broadcast on a 256 node Connection Machine system CM--200 is 0.3 Gbytes/sec. The data motion rate for all--to--all broadcast, including the time for index computations and local data reordering, is about 2.8 Gbytes/sec for a 2048 node system. Excluding the time for index computation and local memory reordering the measured data motion rate for all--to--all broadcast is 5.6 Gbytes/s. On a Connection Machine system, CM--200, with 2048 processing nodes, the overall performance of the distributed matrix vector multiply (DGEMV) and vector matrix multiply (DGEMV with TRANS) is 10.5 Gflops/s and 13.7 Gflops/s respectively
Place, publisher, year, edition, pages
1995. Vol. 4, no 4, 251-273 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-90981OAI: oai:DiVA.org:kth-90981DiVA: diva2:507631
NR 201408052012-03-052012-03-05Bibliographically approved