kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pathway analysis: methods and perspectives
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-4438-2325
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The amount of data being generated by high throughput molecular biology experiments grows every day, both in quantity and quality. With this comes the desire to have more powerful and comprehensive methods for statistical analysis that have been developed with the nature of this data in mind.

One of the lines of research that has been developed with this specific goal in mind is pathway analysis. Here, pathways are units of information that have been curated in a way that makes biological knowledge of cellular processes available in a programmatic way, and pathway analysis methods make use of this information to help understand the results of high throughput experiments.

This is an exploratory thesis on the field of pathway analysis. I give a brief introduction to the field, what motivated its development, the problems it tries to solve, and some of the proposed statistical methods, together with some discussion on the implications of this type of analysis.

I then present three original works on pathway analysis, each with a different perspective on the task. First, we present a more reliable null model for pathway analysis methods that use functional association networks, which results in better-calibrated statistics. Second, we show how we can combine pathway analysis methods with other statistical methods, such as survival analysis. We applied this method to a large breast cancer cohort and show that in this case pathways provide better prognostic power than individual genes. Third, we leverage concepts from information theory to design an original pathway analysis method that is very sensitive and flexible, while being practically without parameters. Together, all three papers contribute to furthering the field's usefulness and to the understanding of this type of analysis. 

Abstract [sv]

Mängden data som genereras i storskaliga molekylärbiologiska experiment ökar stadigt, både i kvantitet och kvalitet. Som en konsekvens ökar behovet av kraftfullare och mer omfattande metoder för tolkning och statistisk analys av sådan data.

En forskningsmetodik som försöker lösa problem associerade med den statistiska analysen utav stora blandade biologiska datamängder är pathway-analys (från engelskans pathway; gångväg eller sekvens av steg). En biologisk eller biomedicinsk pathway är en enhet av annoterad information, som har kurerats på ett sådant sätt att den representerar tidigare biologisk kunskap. Den programmatiskt tillgängliga informationen över rimliga kopplingar i den stora datamängden kan innefatta metabola processer, cellulär lokalisering eller biokemisk funktion. Den stora mängden pathways möjliggör sedan systematisk dataintegrering och ökad förståelse utav stora datamängder från hög-kapacitets experiment.

I denna avhandling beskriver vi pathway-analys genom att först ge en kort introduktion till teknikerna, vad som motiverade dess utveckling, de problem pathway-analys försöker lösa och några av de föreslagna statistiska metoderna, tillsammans med en del diskussion om implikationerna av denna typ av analys.

Jag presenterar sedan tre publikationer om pathway-analys, var och en med olika perspektiv på uppgiften. Först presenterar vi en mer tillförlitlig, graf baserad, statistisk null-modell för pathway-analysmetoder som bygger på funktionella associationsnätverk, vilket resulterar i bättre kalibrerad statistik. I den andra artikeln visar vi hur vi kan kombinera pathway-analysmetodik med andra statistiska metoder, såsom överlevnadsanalys. Vi tillämpade denna metod på en stor bröstcancerkohort och visar att i detta fall ger pathways bättre prognostisk kraft än enskilda gener. I den tredje artikeln utnyttjar vi begrepp från informationsteori för att designa en förbättrad pathway-analysmetodik, som är mycket känslig och flexibel, samtidigt som den är praktiskt taget utan parametrar. Tillsammans bidrar alla tre artiklarna till att öka fältets användbarhet och förståelsen för denna typ av analys.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2022. , p. 53
Series
TRITA-CBH-FOU ; 2022:41
Keywords [en]
pathway analysis, mutual information, survival analysis, enrichment analysis, transcriptomics.
National Category
Bioinformatics (Computational Biology)
Research subject
Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-316604ISBN: 978-91-8040-322-1 (print)OAI: oai:DiVA.org:kth-316604DiVA, id: diva2:1689972
Public defence
2022-09-28, Air & Fire, Tomtebodavägen 23A, via Zoom: https://kth-se.zoom.us/j/61760412942, Solna, 14:00 (English)
Opponent
Supervisors
Note

QC 2022-08-25

Available from: 2022-08-25 Created: 2022-08-24 Last updated: 2022-09-16Bibliographically approved
List of papers
1. A simple null model for inferences from network enrichment analysis
Open this publication in new window or tab >>A simple null model for inferences from network enrichment analysis
2018 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 13, no 11, article id e0206864Article in journal (Refereed) Published
Abstract [en]

A prevailing technique to infer function from lists of identifications, from molecular biological high-throughput experiments, is over-representation analysis, where the identifications are compared to predefined sets of related genes often referred to as pathways. As at least some pathways are known to be incomplete in their annotation, algorithmic efforts have been made to complement them with information from functional association networks. While the terminology varies in the literature, we will here refer to such methods as Network Enrichment Analysis (NEA). Traditionally, the significance of inferences from NEA has been assigned using a null model constructed from randomizations of the network. Here we instead argue for a null model that more directly relates to the set of genes being studied, and have designed one dynamic programming algorithm that calculates the score distribution of NEA scores that makes it possible to assign unbiased mid p values to inferences. We also implemented a random sampling method, carrying out the same task. We demonstrate that our method obtains a superior statistical calibration as compared to the popular NEA inference engine, BinoX, while also providing statistics that are easier to interpret.

Place, publisher, year, edition, pages
PUBLIC LIBRARY SCIENCE, 2018
National Category
Genetics and Genomics
Identifiers
urn:nbn:se:kth:diva-239780 (URN)10.1371/journal.pone.0206864 (DOI)000449772600027 ()30412619 (PubMedID)2-s2.0-85056317407 (Scopus ID)
Note

QC 20190108

Available from: 2019-01-08 Created: 2019-01-08 Last updated: 2025-02-07Bibliographically approved
2. Survival analysis of pathway activity as a prognostic determinant in breast cancer
Open this publication in new window or tab >>Survival analysis of pathway activity as a prognostic determinant in breast cancer
2022 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 18, no 3, article id e1010020Article in journal (Refereed) Published
Abstract [en]

High throughput biology enables the measurements of relative concentrations of thousands of biomolecules from e.g. tissue samples. The process leaves the investigator with the problem of how to best interpret the potentially large numbers of differences between samples. Many activities in a cell depend on ordered reactions involving multiple biomolecules, often referred to as pathways. It hence makes sense to study differences between samples in terms of altered pathway activity, using so-called pathway analysis. Traditional pathway analysis gives significance to differences in the pathway components' concentrations between sample groups, however, less frequently used methods for estimating individual samples' pathway activities have been suggested. Here we demonstrate that such a method can be used for pathway-based survival analysis. Specifically, we investigate the pathway activities' association with patients' survival time based on the transcription profiles of the METABRIC dataset. Our implementation shows that pathway activities are better prognostic markers for survival time in METABRIC than the individual transcripts. We also demonstrate that we can regress out the effect of individual pathways on other pathways, which allows us to estimate the other pathways' residual pathway activity on survival. Furthermore, we illustrate how one can visualize the often interdependent measures over hierarchical pathway databases using sunburst plots.

Place, publisher, year, edition, pages
Public Library of Science (PLoS), 2022
National Category
Cancer and Oncology
Identifiers
urn:nbn:se:kth:diva-311539 (URN)10.1371/journal.pcbi.1010020 (DOI)000778272600002 ()35344554 (PubMedID)2-s2.0-85128145575 (Scopus ID)
Note

QC 20220429

Available from: 2022-04-29 Created: 2022-04-29 Last updated: 2022-08-24Bibliographically approved
3. Pathway Analysis Through Mutual Information
Open this publication in new window or tab >>Pathway Analysis Through Mutual Information
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Pathway analysis comes in many forms. Most are seeking to establish a connection between the activity of a certain biological pathway and a difference in phenotype, often relying on an upstream differential expression analysis to establish the difference between case and control. This process usually models this relationship using many assumptions, often of a linear nature, and may also involve statistical tests where the calculation of false discovery rates is not trivial.

Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles, and therefore is absent of a model for the nature of the association between pathway activity and phenotype, resulting on a very minimal set of assumptions. For this, we construct a different graph of samples for each pathway and score the association between the structure of this graph and any phenotype variable using Mutual Information, while adjusting for the effects of random chance in each score.

Our experiments show that this method produces robust and reproducible scores that successfully result in a high rank for target pathways on single cell datasets, outperforming established methods for pathway analysis on these same conditions.

National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-316544 (URN)
Note

QC 20220831

Available from: 2022-08-22 Created: 2022-08-22 Last updated: 2022-08-31Bibliographically approved

Open Access in DiVA

summary(2750 kB)925 downloads
File information
File name FULLTEXT01.pdfFile size 2750 kBChecksum SHA-512
c79eb6adb9fd3ac97f4ff9eadfde4dd685274d775177401f8882a3d7692d9e7eb99ff28be8492d7e635588ae6ac0265aedefb01a66afe04674cea18a76ce9959
Type fulltextMimetype application/pdf

Authority records

Stolf Jeuken, Gustavo

Search in DiVA

By author/editor
Stolf Jeuken, Gustavo
By organisation
Gene TechnologyScience for Life Laboratory, SciLifeLab
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 926 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1182 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf