kth.sePublikationer KTH
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (6 of 6) Visa alla publikationer
Jeuken, G. S. & Käll, L. (2024). Pathway analysis through mutual information. Bioinformatics, 40(1)
Öppna denna publikation i ny flik eller fönster >>Pathway analysis through mutual information
2024 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 40, nr 1Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

MOTIVATION: In pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships.

RESULTS: Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores.

AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/statisticalbiotechnology/mipath, or through Python Package Index as "mipathway."

Ort, förlag, år, upplaga, sidor
Oxford University Press (OUP), 2024
Nationell ämneskategori
Bioinformatik (beräkningsbiologi) Bioinformatik och beräkningsbiologi
Identifikatorer
urn:nbn:se:kth:diva-342621 (URN)10.1093/bioinformatics/btad776 (DOI)001141007500001 ()38195928 (PubMedID)2-s2.0-85182501807 (Scopus ID)
Anmärkning

QC 20240130

Tillgänglig från: 2024-01-25 Skapad: 2024-01-25 Senast uppdaterad: 2025-02-05Bibliografiskt granskad
Stolf Jeuken, G. (2022). Pathway analysis: methods and perspectives. (Doctoral dissertation). Stockholm, Sweden: KTH Royal Institute of Technology
Öppna denna publikation i ny flik eller fönster >>Pathway analysis: methods and perspectives
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The amount of data being generated by high throughput molecular biology experiments grows every day, both in quantity and quality. With this comes the desire to have more powerful and comprehensive methods for statistical analysis that have been developed with the nature of this data in mind.

One of the lines of research that has been developed with this specific goal in mind is pathway analysis. Here, pathways are units of information that have been curated in a way that makes biological knowledge of cellular processes available in a programmatic way, and pathway analysis methods make use of this information to help understand the results of high throughput experiments.

This is an exploratory thesis on the field of pathway analysis. I give a brief introduction to the field, what motivated its development, the problems it tries to solve, and some of the proposed statistical methods, together with some discussion on the implications of this type of analysis.

I then present three original works on pathway analysis, each with a different perspective on the task. First, we present a more reliable null model for pathway analysis methods that use functional association networks, which results in better-calibrated statistics. Second, we show how we can combine pathway analysis methods with other statistical methods, such as survival analysis. We applied this method to a large breast cancer cohort and show that in this case pathways provide better prognostic power than individual genes. Third, we leverage concepts from information theory to design an original pathway analysis method that is very sensitive and flexible, while being practically without parameters. Together, all three papers contribute to furthering the field's usefulness and to the understanding of this type of analysis. 

Abstract [sv]

Mängden data som genereras i storskaliga molekylärbiologiska experiment ökar stadigt, både i kvantitet och kvalitet. Som en konsekvens ökar behovet av kraftfullare och mer omfattande metoder för tolkning och statistisk analys av sådan data.

En forskningsmetodik som försöker lösa problem associerade med den statistiska analysen utav stora blandade biologiska datamängder är pathway-analys (från engelskans pathway; gångväg eller sekvens av steg). En biologisk eller biomedicinsk pathway är en enhet av annoterad information, som har kurerats på ett sådant sätt att den representerar tidigare biologisk kunskap. Den programmatiskt tillgängliga informationen över rimliga kopplingar i den stora datamängden kan innefatta metabola processer, cellulär lokalisering eller biokemisk funktion. Den stora mängden pathways möjliggör sedan systematisk dataintegrering och ökad förståelse utav stora datamängder från hög-kapacitets experiment.

I denna avhandling beskriver vi pathway-analys genom att först ge en kort introduktion till teknikerna, vad som motiverade dess utveckling, de problem pathway-analys försöker lösa och några av de föreslagna statistiska metoderna, tillsammans med en del diskussion om implikationerna av denna typ av analys.

Jag presenterar sedan tre publikationer om pathway-analys, var och en med olika perspektiv på uppgiften. Först presenterar vi en mer tillförlitlig, graf baserad, statistisk null-modell för pathway-analysmetoder som bygger på funktionella associationsnätverk, vilket resulterar i bättre kalibrerad statistik. I den andra artikeln visar vi hur vi kan kombinera pathway-analysmetodik med andra statistiska metoder, såsom överlevnadsanalys. Vi tillämpade denna metod på en stor bröstcancerkohort och visar att i detta fall ger pathways bättre prognostisk kraft än enskilda gener. I den tredje artikeln utnyttjar vi begrepp från informationsteori för att designa en förbättrad pathway-analysmetodik, som är mycket känslig och flexibel, samtidigt som den är praktiskt taget utan parametrar. Tillsammans bidrar alla tre artiklarna till att öka fältets användbarhet och förståelsen för denna typ av analys.

Ort, förlag, år, upplaga, sidor
Stockholm, Sweden: KTH Royal Institute of Technology, 2022. s. 53
Serie
TRITA-CBH-FOU ; 2022:41
Nyckelord
pathway analysis, mutual information, survival analysis, enrichment analysis, transcriptomics.
Nationell ämneskategori
Bioinformatik (beräkningsbiologi)
Forskningsämne
Bioteknologi
Identifikatorer
urn:nbn:se:kth:diva-316604 (URN)978-91-8040-322-1 (ISBN)
Disputation
2022-09-28, Air & Fire, Tomtebodavägen 23A, via Zoom: https://kth-se.zoom.us/j/61760412942, Solna, 14:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 2022-08-25

Tillgänglig från: 2022-08-25 Skapad: 2022-08-24 Senast uppdaterad: 2022-09-16Bibliografiskt granskad
Jeuken, G. S., Tobin, N. P. & Käll, L. (2022). Survival analysis of pathway activity as a prognostic determinant in breast cancer. PloS Computational Biology, 18(3), Article ID e1010020.
Öppna denna publikation i ny flik eller fönster >>Survival analysis of pathway activity as a prognostic determinant in breast cancer
2022 (Engelska)Ingår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 18, nr 3, artikel-id e1010020Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

High throughput biology enables the measurements of relative concentrations of thousands of biomolecules from e.g. tissue samples. The process leaves the investigator with the problem of how to best interpret the potentially large numbers of differences between samples. Many activities in a cell depend on ordered reactions involving multiple biomolecules, often referred to as pathways. It hence makes sense to study differences between samples in terms of altered pathway activity, using so-called pathway analysis. Traditional pathway analysis gives significance to differences in the pathway components' concentrations between sample groups, however, less frequently used methods for estimating individual samples' pathway activities have been suggested. Here we demonstrate that such a method can be used for pathway-based survival analysis. Specifically, we investigate the pathway activities' association with patients' survival time based on the transcription profiles of the METABRIC dataset. Our implementation shows that pathway activities are better prognostic markers for survival time in METABRIC than the individual transcripts. We also demonstrate that we can regress out the effect of individual pathways on other pathways, which allows us to estimate the other pathways' residual pathway activity on survival. Furthermore, we illustrate how one can visualize the often interdependent measures over hierarchical pathway databases using sunburst plots.

Ort, förlag, år, upplaga, sidor
Public Library of Science (PLoS), 2022
Nationell ämneskategori
Cancer och onkologi
Identifikatorer
urn:nbn:se:kth:diva-311539 (URN)10.1371/journal.pcbi.1010020 (DOI)000778272600002 ()35344554 (PubMedID)2-s2.0-85128145575 (Scopus ID)
Anmärkning

QC 20220429

Tillgänglig från: 2022-04-29 Skapad: 2022-04-29 Senast uppdaterad: 2022-08-24Bibliografiskt granskad
Liao, Y., Yeh, S. & Jeuken, G. S. (2019). From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data. EPJ DATA SCIENCE, 8(1), Article ID 34.
Öppna denna publikation i ny flik eller fönster >>From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data
2019 (Engelska)Ingår i: EPJ DATA SCIENCE, ISSN 2193-1127, Vol. 8, nr 1, artikel-id 34Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This paper examines the population heterogeneity of travel behaviours from a combined perspective of individual actors and collective behaviours. We use a social media dataset of 652,945 geotagged tweets generated by 2,933 Swedish Twitter users covering an average time span of 3.6 years. No explicit geographical boundaries, such as national borders or administrative boundaries, are applied to the data. We use spatial features, such as geographical characteristics and network properties, and apply a clustering technique to reveal the heterogeneity of geotagged activity patterns. We find four distinct groups of travellers: local explorers (78.0%), local returners (14.4%), global explorers (7.3%), and global returners (0.3%). These groups exhibit distinct mobility characteristics, such as trip distance, diffusion process, percentage of domestic trips, visiting frequency of the most-visited locations, and total number of geotagged locations. Geotagged social media data are gradually being incorporated into travel behaviour studies as user-contributed data sources. While such data have many advantages, including easy access and the flexibility to capture movements across multiple scales (individual, city, country, and globe), more attention is still needed on data validation and identifying potential biases associated with these data. We validate against the data from a household travel survey and find that despite good agreement of trip distances (one-day and long-distance trips), we also find some differences in home location and the frequency of international trips, possibly due to population bias and behaviour distortion in Twitter data. Future work includes identifying and removing additional biases so that results from geotagged activity patterns may be generalised to human mobility patterns. This study explores the heterogeneity of behavioural groups and their spatial mobility including travel and day-to-day displacement. The findings of this paper could be relevant for disease prediction, transport modelling, and the broader social sciences.

Ort, förlag, år, upplaga, sidor
SPRINGEROPEN, 2019
Nyckelord
Geotagged activity patterns, Individual mobility, Data mining, Hierarchical clustering
Nationell ämneskategori
Transportteknik och logistik
Identifikatorer
urn:nbn:se:kth:diva-264880 (URN)10.1140/epjds/s13688-019-0212-x (DOI)000496585500001 ()2-s2.0-85075102041 (Scopus ID)
Anmärkning

QC 20191218

Tillgänglig från: 2019-12-18 Skapad: 2019-12-18 Senast uppdaterad: 2022-06-26Bibliografiskt granskad
Jeuken, G. S. & Käll, L. (2018). A simple null model for inferences from network enrichment analysis. PLOS ONE, 13(11), Article ID e0206864.
Öppna denna publikation i ny flik eller fönster >>A simple null model for inferences from network enrichment analysis
2018 (Engelska)Ingår i: PLOS ONE, E-ISSN 1932-6203, Vol. 13, nr 11, artikel-id e0206864Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

A prevailing technique to infer function from lists of identifications, from molecular biological high-throughput experiments, is over-representation analysis, where the identifications are compared to predefined sets of related genes often referred to as pathways. As at least some pathways are known to be incomplete in their annotation, algorithmic efforts have been made to complement them with information from functional association networks. While the terminology varies in the literature, we will here refer to such methods as Network Enrichment Analysis (NEA). Traditionally, the significance of inferences from NEA has been assigned using a null model constructed from randomizations of the network. Here we instead argue for a null model that more directly relates to the set of genes being studied, and have designed one dynamic programming algorithm that calculates the score distribution of NEA scores that makes it possible to assign unbiased mid p values to inferences. We also implemented a random sampling method, carrying out the same task. We demonstrate that our method obtains a superior statistical calibration as compared to the popular NEA inference engine, BinoX, while also providing statistics that are easier to interpret.

Ort, förlag, år, upplaga, sidor
PUBLIC LIBRARY SCIENCE, 2018
Nationell ämneskategori
Genetik och genomik
Identifikatorer
urn:nbn:se:kth:diva-239780 (URN)10.1371/journal.pone.0206864 (DOI)000449772600027 ()30412619 (PubMedID)2-s2.0-85056317407 (Scopus ID)
Anmärkning

QC 20190108

Tillgänglig från: 2019-01-08 Skapad: 2019-01-08 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Jeuken, G. S. & Käll, L.Pathway Analysis Through Mutual Information.
Öppna denna publikation i ny flik eller fönster >>Pathway Analysis Through Mutual Information
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Pathway analysis comes in many forms. Most are seeking to establish a connection between the activity of a certain biological pathway and a difference in phenotype, often relying on an upstream differential expression analysis to establish the difference between case and control. This process usually models this relationship using many assumptions, often of a linear nature, and may also involve statistical tests where the calculation of false discovery rates is not trivial.

Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles, and therefore is absent of a model for the nature of the association between pathway activity and phenotype, resulting on a very minimal set of assumptions. For this, we construct a different graph of samples for each pathway and score the association between the structure of this graph and any phenotype variable using Mutual Information, while adjusting for the effects of random chance in each score.

Our experiments show that this method produces robust and reproducible scores that successfully result in a high rank for target pathways on single cell datasets, outperforming established methods for pathway analysis on these same conditions.

Nationell ämneskategori
Bioinformatik (beräkningsbiologi)
Identifikatorer
urn:nbn:se:kth:diva-316544 (URN)
Anmärkning

QC 20220831

Tillgänglig från: 2022-08-22 Skapad: 2022-08-22 Senast uppdaterad: 2022-08-31Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-4438-2325

Sök vidare i DiVA

Visa alla publikationer