Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimering av blockbaserad Choleskyfaktorisering för moderna datorsystem: En studie om vektorisering och dess effekt på en befintlig blockalgoritm
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2016 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Optimization of Block-based Cholesky Factorization for Modern Computer Systems (English)
Abstract [en]

Linear systems and the solving of those is an important tool in many areas of science. The solving of linear systems is an operation of high complexity, and there are applications where systems of thousands of variables are used. Therefore, it is important to use methods and algorithms that can take full advantage of the performance of modern computers. Factorizing a matrix that represents a linear system makes solving it faster. If the matrix is symmetrical and positive definite Cholesky factorization can be used. J. Chen et. al. (2013) studied a block-based algorithm that gives better performance by using the cache memory more efficently when the matrix size increases. Since then, the conditions have changed. The cache memory of modern processors are subject to constant change, and modern processors capability of improving performance by vectorization have been vastly improved. This report examine how this block-based Cholesky factorization can be optimized for modern Intel processors. By using AVX2 instructions, the parts of the algorithm where most of the arithmetic operations are performed are vectorized. The report also study how the optimal block size, as well as how the breaking point between the naive algorithm and the block-based algorithm changes as the hardware develops. Using a fairly simple implementation with vectorization, the time required to factorize matrices of all sizes are cut in half. The breaking point between the naive and the block-based algorithm is now at matrices of sizes as small as 100 × 100. This is an interesting fact as prior research showed a trend where the breaking point seemed to move towards bigger matrices as the hardware developed.

Abstract [sv]

Linjära system och lösning av dessa är ett viktigt verktyg inom många vetenskapliga områden. Lösning av linjära system är en operation med hög komplexitet, samtidigt som det finns tillämpningar där system med tusentals variabler är vanligt. Därför är det viktigt att använda metoder och algoritmer som utnyttjar datorns prestanda maximalt. Genom att faktorisera en matris som representerar ett linjärt system kan det lösas snabbare. Är matrisen symmetrisk och positivt definit kan Choleskyfaktorisering användas. J. Chen et. al (2013) undersökte en blockbaserad algoritm som ger bättre prestanda genom förbättrad cachehantering. Förutsättningarna har dock förändrats sedan deras undersökning. Dels är processorns cacheminnen under ständig utveckling, och dels så har moderna processorer kraftigt förbättrade möjligheter att öka prestandan med hjälp av vektorisering. Denna rapport undersöker hur denna blockbaserade Choleskyfaktorisering kan optimeras för moderna Intelprocessorer. Genom att använda AVX2-instruktioner vektoriseras de delar av algoritmen där flest operationer utförs. Samtidigt undersöks hur valet av blockstorlek påverkas av, samt hur brytpunkten mellan en klassisk, naiv algoritm och den blockbaserade algoritmen förändras i takt med att hårdvaran utvecklas. Med hjälp av en relativt enkel implementation av vektorisering halveras tiden för att faktorisera matriser oavsett storlek. Brytpunkten mellan den naiva och de blockbaserade algoritmerna sker nu redan runt 100 × 100-matriser. Detta är extra intressant då utvecklingen tidigare gått mot allt större matriser.

Place, publisher, year, edition, pages
2016. , 37 p.
Keyword [sv]
Cholesky
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-186379OAI: oai:DiVA.org:kth-186379DiVA: diva2:927102
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2016-05-18 Created: 2016-05-11 Last updated: 2016-05-18Bibliographically approved

Open Access in DiVA

fulltext(880 kB)148 downloads
File information
File name FULLTEXT01.pdfFile size 880 kBChecksum SHA-512
b951e2b235922b5bc3d6f89f7ee62bb8f62350303b8d9e8f49712c658ca3aba4dc322d113cee35e2d1fbc6f5a8fb627811bfba80caf2861bf76270ddd8a00020
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 148 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 103 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf