Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Graphical lasso for covariance structure learning in the high dimensional setting
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Graphical lasso för kovariansstrukturs inlärning i högdimensionell miljö (Swedish)
Abstract [en]

This thesis considers the estimation of undirected Gaussian graphical models especially in the high dimensional setting where the true observations are assumed to be non-Gaussian distributed.

The first aim is to present and compare the performances of existing Gaussian graphical model estimation methods. Furthermore since the models rely heavily on the normality assumption, various methods for relaxing the normal assumption are presented. In addition to the existing methods, a modified version of the joint graphical lasso method is introduced which monetizes on the strengths of the community Bayes method. The community Bayes method is used to partition the features (or variables) of datasets consisting of several classes into several communities which are estimated to be mutually independent within each class which allows the calculations when performing the joint graphical lasso method, to be split into several smaller parts. The method is also inspired by the cluster graphical lasso and is applicable to both Gaussian and non-Gaussian data, assuming that the normal assumption is relaxed.

Results show that the introduced cluster joint graphical lasso method outperforms com-peting methods, producing graphical models which are easier to comprehend due to the added information obtained from the clustering step of the method. The cluster joint graphical lasso is applied to a real dataset consisting of p = 12582 features which resulted in computation gain of a factor 35 when comparing to the competing method which is very significant when analysing large datasets. The method also allows for parallelization where computations can be spread across several computers greatly increasing the computational efficiency.

Abstract [sv]

Denna rapport behandlar uppskattningen av oriktade Gaussiska grafiska modeller speciellt i högdimensionell miljö där dom verkliga observationerna antas vara icke-Gaussiska fördelade.

Det första målet är att presentera och jämföra prestandan av befintliga metoder för uppskattning av Gaussiska grafiska modeller. Eftersom modellerna är starkt beroende av normalantagandet, så kommer flertalet metoder för att relaxa normalantagandet att presenteras. Utöver dom befintliga metoderna, kommer en modifierad version av joint graphical lasso att introduceras som bygger på styrkan av community Bayes metod. Community Bayes metod används för att partitionera variabler från datamängder som består av flera klasser i flera samhällen (eller communities) som antas vara oberoende av varandra i varje klass. Detta innebär att beräkningarna av joint graphical lasso kan delas upp i flera mindre problem. Metoden är också inspirerad av cluster graphical lasso och applicerbar för både Gaussisk och icke-gaussisk data, förutsatt att det normala antagandet är relaxed.

Resultaten visar att den introducerade cluster joint graphical lasso metoden utklassar konkurrerande metoder, som producerar grafiska modeller som är lättare att förstå på grund av den extra information som erhålls från klustringssteget av metoden. Joint graphical lasso appliceras även på en verklig datauppsättning bestående av p = 12582 variabler som resulterade i minskad beräkningstid av en faktor 35 vid jämförelse av konkurrerande metoder. Detta är mycket betydande när man analyserar stora datamängder. Metoden möjliggör också parallellisering där beräkningar kan spridas över flera datorer vilket ytterligare kraftigt ökar beräkningseffektiviteten.

Place, publisher, year, edition, pages
2015.
Series
TRITA-MAT-E, 2015:76
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-176485OAI: oai:DiVA.org:kth-176485DiVA: diva2:869025
Subject / course
Mathematical Statistics
Educational program
Master of Science - Mathematics
Supervisors
Examiners
Available from: 2015-11-12 Created: 2015-11-06 Last updated: 2015-11-12Bibliographically approved

Open Access in DiVA

fulltext(3384 kB)207 downloads
File information
File name FULLTEXT01.pdfFile size 3384 kBChecksum SHA-512
6edb3781f893c40433ee665252376376c690a82569ffbe8670111ce8e74f38ca1c06284fa61bf49b7d58fc0c59118c51493db1d7b53a710a9b2f94201c4fc8e9
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 207 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 262 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf