Change search
ReferencesLink to record
Permanent link

Direct link
Development, Implementation, Optimization and Performance Analysis of Matrix-Vector Multiplication on Eight-Core Digital Signal Processor
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Numerical Analysis, NA.
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis work aims at implementing the sparse matrix vector multiplication on eight-core Digital Signal Processor (DSP) and giving insights on how to optimize matrix multiplication on DSP to achieve high energy efficiency. We used two sparse matrix formats: the Compressed Sparse Row (CSR) and the Block Compressed Sparse Row (BCSR) formats. We carried out loop unrolling optimization of the naive algorithm. In addition, we implemented the Registerblocked and the Cache-blocked sparse matrix vector multiplications to optimize the naive algorithm.

The computational performance improvement with loop unrolling technique was promising (≈12%). With this optimization, we observed a decrease of power usage (0.3 W) when using a matrix size of 600 and an increase of power usage (1.2 W), when using larger size matrices. The Register-blocked algorithm resulted to be the most efficient technique on DSP. With this algorithm, we were able to increase performance by a factor of six when compared to the naive algorithm, still retaining low power consumption (≈ 14 W). The Cache-blocked sparse matrix vector multiplication is known to be most convenient for large number of architectures with coherent caches. However, because DSP does not support coherency between caches, this method did not show large improvement in computational performance. In fact, we confirm that power consumption for the Cache-blocked method was higher when compared to other effective algorithms such as Register-blocked sparse matrix vector multiplication and loop unrolling of naive algorithm. In conclusion, we found that the DSP delivers low power consumption, excellent computational performance and energy efficiency when the Register-blocked sparse matrix vector  multiplication technique is used.

Place, publisher, year, edition, pages
2013. , 67 p.
TRITA-MAT-E, 2013:50
National Category
Computational Mathematics
URN: urn:nbn:se:kth:diva-131289OAI: diva2:656708
Subject / course
Scientific Computing
Educational program
Master of Science - Scientific Computing
Available from: 2013-10-16 Created: 2013-10-11 Last updated: 2013-10-16Bibliographically approved

Open Access in DiVA

fulltext(12694 kB)1049 downloads
File information
File name FULLTEXT01.pdfFile size 12694 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Numerical Analysis, NA
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 1049 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 107 hits
ReferencesLink to record
Permanent link

Direct link