Change search
ReferencesLink to record
Permanent link

Direct link
A structural EM algorithm for learning hidden variable oncogenetic networks
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
URN: urn:nbn:se:kth:diva-121760OAI: diva2:619455

QS 2013

Available from: 2013-05-03 Created: 2013-05-03 Last updated: 2013-05-03Bibliographically approved
In thesis
1. Computational Modeling of Cancer Progression
Open this publication in new window or tab >>Computational Modeling of Cancer Progression
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Cancer is a multi-stage process resulting from accumulation of genetic mutations. Data obtained from assaying a tumor only contains the set of mutations in the tumor and lacks information about their temporal order. Learning the chronological order of the genetic mutations is an important step towards understanding the disease. The probability of introduction of a mutation to a tumor increases if certain mutations that promote it, already happened. Such dependencies induce what we call the monotonicity property in cancer progression. A realistic model of cancer progression should take this property into account.

In this thesis, we present two models for cancer progression and algorithms for learning them. In the first model, we propose Progression Networks (PNs), which are a special class of Bayesian networks. In learning PNs the issue of monotonicity is taken into consideration. The problem of learning PNs is reduced to Mixed Integer Linear Programming (MILP), which is a NP-hard problem for which very good heuristics exist. We also developed a program, DiProg, for learning PNs.

In the second model, the problem of noise in the biological experiments is addressed by introducing hidden variable. We call this model Hidden variable Oncogenetic Network (HON). In a HON, there are two variables assigned to each node, a hidden variable that represents the progression of cancer to the node and an observable random variable that represents the observation of the mutation corresponding to the node. We devised a structural Expectation Maximization (EM) algorithm for learning HONs. In the M-step of the structural EM algorithm, we need to perform a considerable number of inference tasks. Because exact inference is tractable only on Bayesian networks with bounded treewidth, we also developed an algorithm for learning bounded treewidth Bayesian networks by reducing the problem to a MILP.

Our algorithms performed well on synthetic data. We also tested them on cytogenetic data from renal cell carcinoma. The learned progression networks from both algorithms are in agreement with the previously published results.

MicroRNAs are short non-coding RNAs that are involved in post transcriptional regulation. A-to-I editing of microRNAs converts adenosine to inosine in the double stranded RNA. We developed a method for determining editing levels in mature microRNAs from the high-throughput RNA sequencing data from the mouse brain. Here, for the first time, we showed that the level of editing increases with development. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. iv, 54 p.
Trita-CSC-A, ISSN 1653-5723 ; 2013:05
Cancer progression, Bayesian networks, microRNA editing
National Category
Probability Theory and Statistics Bioinformatics and Systems Biology
Research subject
SRA - Molecular Bioscience
urn:nbn:se:kth:diva-121597 (URN)978-91-7501-724-2 (ISBN)
Public defence
2013-05-24, Petrén Salen, Wargentinhuset, Nobels Väg 12A, Solna, 13:30 (English)

QC 20130503

Available from: 2013-05-03 Created: 2013-05-02 Last updated: 2013-05-03Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Shahrabi Farahani, HosseinLagergren, Jens
By organisation
Computational Biology, CB
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 88 hits
ReferencesLink to record
Permanent link

Direct link