kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (6 of 6) Show all publications
Jeuken, G. S. & Käll, L. (2024). Pathway analysis through mutual information. Bioinformatics, 40(1)
Open this publication in new window or tab >>Pathway analysis through mutual information
2024 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 40, no 1Article in journal (Refereed) Published
Abstract [en]

MOTIVATION: In pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships.

RESULTS: Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores.

AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/statisticalbiotechnology/mipath, or through Python Package Index as "mipathway."

Place, publisher, year, edition, pages
Oxford University Press (OUP), 2024
National Category
Bioinformatics (Computational Biology) Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:kth:diva-342621 (URN)10.1093/bioinformatics/btad776 (DOI)001141007500001 ()38195928 (PubMedID)2-s2.0-85182501807 (Scopus ID)
Note

QC 20240130

Available from: 2024-01-25 Created: 2024-01-25 Last updated: 2025-02-05Bibliographically approved
Stolf Jeuken, G. (2022). Pathway analysis: methods and perspectives. (Doctoral dissertation). Stockholm, Sweden: KTH Royal Institute of Technology
Open this publication in new window or tab >>Pathway analysis: methods and perspectives
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The amount of data being generated by high throughput molecular biology experiments grows every day, both in quantity and quality. With this comes the desire to have more powerful and comprehensive methods for statistical analysis that have been developed with the nature of this data in mind.

One of the lines of research that has been developed with this specific goal in mind is pathway analysis. Here, pathways are units of information that have been curated in a way that makes biological knowledge of cellular processes available in a programmatic way, and pathway analysis methods make use of this information to help understand the results of high throughput experiments.

This is an exploratory thesis on the field of pathway analysis. I give a brief introduction to the field, what motivated its development, the problems it tries to solve, and some of the proposed statistical methods, together with some discussion on the implications of this type of analysis.

I then present three original works on pathway analysis, each with a different perspective on the task. First, we present a more reliable null model for pathway analysis methods that use functional association networks, which results in better-calibrated statistics. Second, we show how we can combine pathway analysis methods with other statistical methods, such as survival analysis. We applied this method to a large breast cancer cohort and show that in this case pathways provide better prognostic power than individual genes. Third, we leverage concepts from information theory to design an original pathway analysis method that is very sensitive and flexible, while being practically without parameters. Together, all three papers contribute to furthering the field's usefulness and to the understanding of this type of analysis. 

Abstract [sv]

Mängden data som genereras i storskaliga molekylärbiologiska experiment ökar stadigt, både i kvantitet och kvalitet. Som en konsekvens ökar behovet av kraftfullare och mer omfattande metoder för tolkning och statistisk analys av sådan data.

En forskningsmetodik som försöker lösa problem associerade med den statistiska analysen utav stora blandade biologiska datamängder är pathway-analys (från engelskans pathway; gångväg eller sekvens av steg). En biologisk eller biomedicinsk pathway är en enhet av annoterad information, som har kurerats på ett sådant sätt att den representerar tidigare biologisk kunskap. Den programmatiskt tillgängliga informationen över rimliga kopplingar i den stora datamängden kan innefatta metabola processer, cellulär lokalisering eller biokemisk funktion. Den stora mängden pathways möjliggör sedan systematisk dataintegrering och ökad förståelse utav stora datamängder från hög-kapacitets experiment.

I denna avhandling beskriver vi pathway-analys genom att först ge en kort introduktion till teknikerna, vad som motiverade dess utveckling, de problem pathway-analys försöker lösa och några av de föreslagna statistiska metoderna, tillsammans med en del diskussion om implikationerna av denna typ av analys.

Jag presenterar sedan tre publikationer om pathway-analys, var och en med olika perspektiv på uppgiften. Först presenterar vi en mer tillförlitlig, graf baserad, statistisk null-modell för pathway-analysmetoder som bygger på funktionella associationsnätverk, vilket resulterar i bättre kalibrerad statistik. I den andra artikeln visar vi hur vi kan kombinera pathway-analysmetodik med andra statistiska metoder, såsom överlevnadsanalys. Vi tillämpade denna metod på en stor bröstcancerkohort och visar att i detta fall ger pathways bättre prognostisk kraft än enskilda gener. I den tredje artikeln utnyttjar vi begrepp från informationsteori för att designa en förbättrad pathway-analysmetodik, som är mycket känslig och flexibel, samtidigt som den är praktiskt taget utan parametrar. Tillsammans bidrar alla tre artiklarna till att öka fältets användbarhet och förståelsen för denna typ av analys.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2022. p. 53
Series
TRITA-CBH-FOU ; 2022:41
Keywords
pathway analysis, mutual information, survival analysis, enrichment analysis, transcriptomics.
National Category
Bioinformatics (Computational Biology)
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-316604 (URN)978-91-8040-322-1 (ISBN)
Public defence
2022-09-28, Air & Fire, Tomtebodavägen 23A, via Zoom: https://kth-se.zoom.us/j/61760412942, Solna, 14:00 (English)
Opponent
Supervisors
Note

QC 2022-08-25

Available from: 2022-08-25 Created: 2022-08-24 Last updated: 2022-09-16Bibliographically approved
Jeuken, G. S., Tobin, N. P. & Käll, L. (2022). Survival analysis of pathway activity as a prognostic determinant in breast cancer. PloS Computational Biology, 18(3), Article ID e1010020.
Open this publication in new window or tab >>Survival analysis of pathway activity as a prognostic determinant in breast cancer
2022 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 18, no 3, article id e1010020Article in journal (Refereed) Published
Abstract [en]

High throughput biology enables the measurements of relative concentrations of thousands of biomolecules from e.g. tissue samples. The process leaves the investigator with the problem of how to best interpret the potentially large numbers of differences between samples. Many activities in a cell depend on ordered reactions involving multiple biomolecules, often referred to as pathways. It hence makes sense to study differences between samples in terms of altered pathway activity, using so-called pathway analysis. Traditional pathway analysis gives significance to differences in the pathway components' concentrations between sample groups, however, less frequently used methods for estimating individual samples' pathway activities have been suggested. Here we demonstrate that such a method can be used for pathway-based survival analysis. Specifically, we investigate the pathway activities' association with patients' survival time based on the transcription profiles of the METABRIC dataset. Our implementation shows that pathway activities are better prognostic markers for survival time in METABRIC than the individual transcripts. We also demonstrate that we can regress out the effect of individual pathways on other pathways, which allows us to estimate the other pathways' residual pathway activity on survival. Furthermore, we illustrate how one can visualize the often interdependent measures over hierarchical pathway databases using sunburst plots.

Place, publisher, year, edition, pages
Public Library of Science (PLoS), 2022
National Category
Cancer and Oncology
Identifiers
urn:nbn:se:kth:diva-311539 (URN)10.1371/journal.pcbi.1010020 (DOI)000778272600002 ()35344554 (PubMedID)2-s2.0-85128145575 (Scopus ID)
Note

QC 20220429

Available from: 2022-04-29 Created: 2022-04-29 Last updated: 2022-08-24Bibliographically approved
Liao, Y., Yeh, S. & Jeuken, G. S. (2019). From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data. EPJ DATA SCIENCE, 8(1), Article ID 34.
Open this publication in new window or tab >>From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data
2019 (English)In: EPJ DATA SCIENCE, ISSN 2193-1127, Vol. 8, no 1, article id 34Article in journal (Refereed) Published
Abstract [en]

This paper examines the population heterogeneity of travel behaviours from a combined perspective of individual actors and collective behaviours. We use a social media dataset of 652,945 geotagged tweets generated by 2,933 Swedish Twitter users covering an average time span of 3.6 years. No explicit geographical boundaries, such as national borders or administrative boundaries, are applied to the data. We use spatial features, such as geographical characteristics and network properties, and apply a clustering technique to reveal the heterogeneity of geotagged activity patterns. We find four distinct groups of travellers: local explorers (78.0%), local returners (14.4%), global explorers (7.3%), and global returners (0.3%). These groups exhibit distinct mobility characteristics, such as trip distance, diffusion process, percentage of domestic trips, visiting frequency of the most-visited locations, and total number of geotagged locations. Geotagged social media data are gradually being incorporated into travel behaviour studies as user-contributed data sources. While such data have many advantages, including easy access and the flexibility to capture movements across multiple scales (individual, city, country, and globe), more attention is still needed on data validation and identifying potential biases associated with these data. We validate against the data from a household travel survey and find that despite good agreement of trip distances (one-day and long-distance trips), we also find some differences in home location and the frequency of international trips, possibly due to population bias and behaviour distortion in Twitter data. Future work includes identifying and removing additional biases so that results from geotagged activity patterns may be generalised to human mobility patterns. This study explores the heterogeneity of behavioural groups and their spatial mobility including travel and day-to-day displacement. The findings of this paper could be relevant for disease prediction, transport modelling, and the broader social sciences.

Place, publisher, year, edition, pages
SPRINGEROPEN, 2019
Keywords
Geotagged activity patterns, Individual mobility, Data mining, Hierarchical clustering
National Category
Transport Systems and Logistics
Identifiers
urn:nbn:se:kth:diva-264880 (URN)10.1140/epjds/s13688-019-0212-x (DOI)000496585500001 ()2-s2.0-85075102041 (Scopus ID)
Note

QC 20191218

Available from: 2019-12-18 Created: 2019-12-18 Last updated: 2022-06-26Bibliographically approved
Jeuken, G. S. & Käll, L. (2018). A simple null model for inferences from network enrichment analysis. PLOS ONE, 13(11), Article ID e0206864.
Open this publication in new window or tab >>A simple null model for inferences from network enrichment analysis
2018 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 13, no 11, article id e0206864Article in journal (Refereed) Published
Abstract [en]

A prevailing technique to infer function from lists of identifications, from molecular biological high-throughput experiments, is over-representation analysis, where the identifications are compared to predefined sets of related genes often referred to as pathways. As at least some pathways are known to be incomplete in their annotation, algorithmic efforts have been made to complement them with information from functional association networks. While the terminology varies in the literature, we will here refer to such methods as Network Enrichment Analysis (NEA). Traditionally, the significance of inferences from NEA has been assigned using a null model constructed from randomizations of the network. Here we instead argue for a null model that more directly relates to the set of genes being studied, and have designed one dynamic programming algorithm that calculates the score distribution of NEA scores that makes it possible to assign unbiased mid p values to inferences. We also implemented a random sampling method, carrying out the same task. We demonstrate that our method obtains a superior statistical calibration as compared to the popular NEA inference engine, BinoX, while also providing statistics that are easier to interpret.

Place, publisher, year, edition, pages
PUBLIC LIBRARY SCIENCE, 2018
National Category
Genetics and Genomics
Identifiers
urn:nbn:se:kth:diva-239780 (URN)10.1371/journal.pone.0206864 (DOI)000449772600027 ()30412619 (PubMedID)2-s2.0-85056317407 (Scopus ID)
Note

QC 20190108

Available from: 2019-01-08 Created: 2019-01-08 Last updated: 2025-02-07Bibliographically approved
Jeuken, G. S. & Käll, L.Pathway Analysis Through Mutual Information.
Open this publication in new window or tab >>Pathway Analysis Through Mutual Information
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Pathway analysis comes in many forms. Most are seeking to establish a connection between the activity of a certain biological pathway and a difference in phenotype, often relying on an upstream differential expression analysis to establish the difference between case and control. This process usually models this relationship using many assumptions, often of a linear nature, and may also involve statistical tests where the calculation of false discovery rates is not trivial.

Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles, and therefore is absent of a model for the nature of the association between pathway activity and phenotype, resulting on a very minimal set of assumptions. For this, we construct a different graph of samples for each pathway and score the association between the structure of this graph and any phenotype variable using Mutual Information, while adjusting for the effects of random chance in each score.

Our experiments show that this method produces robust and reproducible scores that successfully result in a high rank for target pathways on single cell datasets, outperforming established methods for pathway analysis on these same conditions.

National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-316544 (URN)
Note

QC 20220831

Available from: 2022-08-22 Created: 2022-08-22 Last updated: 2022-08-31Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4438-2325

Search in DiVA

Show all publications