References$(function(){PrimeFaces.cw("TieredMenu","widget_formSmash_upper_j_idt152",{id:"formSmash:upper:j_idt152",widgetVar:"widget_formSmash_upper_j_idt152",autoDisplay:true,overlay:true,my:"left top",at:"left bottom",trigger:"formSmash:upper:referencesLink",triggerEvent:"click"});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_upper_j_idt153_j_idt156",{id:"formSmash:upper:j_idt153:j_idt156",widgetVar:"widget_formSmash_upper_j_idt153_j_idt156",target:"formSmash:upper:j_idt153:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});

Multiplication of Matrices of Arbitrary Shape on a Data Parallel ComputerPrimeFaces.cw("AccordionPanel","widget_formSmash_some",{id:"formSmash:some",widgetVar:"widget_formSmash_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_all",{id:"formSmash:all",widgetVar:"widget_formSmash_all",multiple:true});
function selectAll()
{
var panelSome = $(PrimeFaces.escapeClientId("formSmash:some"));
var panelAll = $(PrimeFaces.escapeClientId("formSmash:all"));
panelAll.toggle();
toggleList(panelSome.get(0).childNodes, panelAll);
toggleList(panelAll.get(0).childNodes, panelAll);
}
/*Toggling the list of authorPanel nodes according to the toggling of the closeable second panel */
function toggleList(childList, panel)
{
var panelWasOpen = (panel.get(0).style.display == 'none');
// console.log('panel was open ' + panelWasOpen);
for (var c = 0; c < childList.length; c++) {
if (childList[c].classList.contains('authorPanel')) {
clickNode(panelWasOpen, childList[c]);
}
}
}
/*nodes have styleClass ui-corner-top if they are expanded and ui-corner-all if they are collapsed */
function clickNode(collapse, child)
{
if (collapse && child.classList.contains('ui-corner-top')) {
// console.log('collapse');
child.click();
}
if (!collapse && child.classList.contains('ui-corner-all')) {
// console.log('expand');
child.click();
}
}
PrimeFaces.cw("AccordionPanel","widget_formSmash_responsibleOrgs",{id:"formSmash:responsibleOrgs",widgetVar:"widget_formSmash_responsibleOrgs",multiple:true}); 1994 (English)In: Parallel Computing, ISSN 0167-8191, Vol. 20, no 7, 919-951 p.Article in journal (Refereed) Published
##### Abstract [en]

##### Place, publisher, year, edition, pages

1994. Vol. 20, no 7, 919-951 p.
##### Keyword [en]

Linear algebra; Matrix multiplication; Nonsystolic algorithm; Systolic algorithm; Distributed BLAS; Connection machine CM-200; Performance results
##### National Category

Computer and Information Science
##### Identifiers

URN: urn:nbn:se:kth:diva-90990DOI: 10.1016/0167-8191(94)90011-6OAI: oai:DiVA.org:kth-90990DiVA: diva2:507641
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt455",{id:"formSmash:j_idt455",widgetVar:"widget_formSmash_j_idt455",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt461",{id:"formSmash:j_idt461",widgetVar:"widget_formSmash_j_idt461",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt467",{id:"formSmash:j_idt467",widgetVar:"widget_formSmash_j_idt467",multiple:true});
##### Note

NR 20140805Available from: 2012-03-05 Created: 2012-03-05Bibliographically approved

Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or size of the operands. For matrix-matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in-place is described in detail. We show that a level-3 DBLAS yields better performance than a level-2 DBLAS. On the Connection Machine system CM-200, blocking yields a performance improvement by a factor of up to three over level-2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms.

We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix-matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e. the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of five performance enhancement for the multiplication of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.

References$(function(){PrimeFaces.cw("TieredMenu","widget_formSmash_lower_j_idt1196",{id:"formSmash:lower:j_idt1196",widgetVar:"widget_formSmash_lower_j_idt1196",autoDisplay:true,overlay:true,my:"left top",at:"left bottom",trigger:"formSmash:lower:referencesLink",triggerEvent:"click"});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_lower_j_idt1197_j_idt1199",{id:"formSmash:lower:j_idt1197:j_idt1199",widgetVar:"widget_formSmash_lower_j_idt1197_j_idt1199",target:"formSmash:lower:j_idt1197:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});