In a multiprocessor with distributed storage the data structures have a significant impact on the communication complexity. In this paper we present a few algorithms for performing matrix transposition on a Boolean $n$-cube. One algorithm performs the transpose in a time proportional to the lower bound both with respect to communication start-ups and to element transfer times. We present algorithms for transposing a matrix embedded in the cube by a binary encoding, a binary-reflected Gray code encoding of rows and columns, or combinations of these two encodings. The transposition of a matrix when several matrix elements are identified to a node by consecutive or cyclic partitioning is also considered and lower bound algorithms given. Experimental data are provided for the Intel iPSC and the Connection Machine.
NR 20140805