Change search
Refine search result
1 - 42 of 42
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Armerin, Fredrik
    et al.
    KTH, School of Architecture and the Built Environment (ABE), Real Estate and Construction Management, Building and Real Estate Economics.
    Hallgren, Jonas
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.).
    Forecasting Ranking in Harness Racing Using Probabilities Induced by Expected Positions2019In: Applied Artificial Intelligence, ISSN 0883-9514, E-ISSN 1087-6545, Vol. 33, no 2, p. 171-189Article in journal (Refereed)
    Abstract [en]

    Ranked events are pivotal in many important AI-applications such as Question Answering and recommendations systems. This paper studies ranked events in the setting of harness racing. For each horse there exists a probability distribution over its possible rankings. In the paper, it is shown that a set of expected positions (and more generally, higher moments) for the horses induces this probability distribution. The main contribution of the paper is a method, which extracts this induced probability distribution from a set of expected positions. An algorithm is proposed where the extraction of the induced distribution is given by the estimated expectations. MATLAB code is provided for the methodology. This approach gives freedom to model the horses in many different ways without the restrictions imposed by for instance logistic regression. To illustrate this point, we employ a neural network and ordinary ridge regression. The method is applied to predicting the distribution of the finishing positions for horses in harness racing. It outperforms both multinomial logistic regression and the market odds. The ease of use combined with fine results from the suggested approach constitutes a relevant addition to the increasingly important field of ranked events.

  • 2. Corander, J.
    et al.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Pavlenko, Tatjana
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Tillander, A.
    Bayesian block-diagonal predictive classifier for Gaussian data2013In: Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Springer, 2013, p. 543-551Conference paper (Refereed)
    Abstract [en]

    The paper presents a method for constructing Bayesian predictive classifier in a high-dimensional setting. Given that classes are represented by Gaussian distributions with block-structured covariance matrix, a closed form expression for the posterior predictive distribution of the data is established. Due to factorization of this distribution, the resulting Bayesian predictive and marginal classifier provides an efficient solution to the high-dimensional problem by splitting it into smaller tractable problems. In a simulation study we show that the suggested classifier outperforms several alternative algorithms such as linear discriminant analysis based on block-wise inverse covariance estimators and the shrunken centroids regularized discriminant analysis.

  • 3.
    Corander, Jukka
    et al.
    University of Helsinki.
    Cui, Yao
    University of Helsinki.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Sirén, Jukka
    University of Helsinki.
    Have I seen you before?: Principles of Bayesian predictive classification revisited2013In: Statistics and computing, ISSN 0960-3174, E-ISSN 1573-1375, Vol. 23, no 1, p. 59-73Article in journal (Refereed)
    Abstract [en]

    A general inductive Bayesian classification framework is considered using a simultaneous predictive distribution for test items. We introduce a principle of generative supervised and semi-supervised classification based on marginalizing the joint posterior distribution of labels for all test items. The simultaneous and marginalized classifiers arise under different loss functions, while both acknowledge jointly all uncertainty about the labels of test items and the generating probability measures of the classes. We illustrate for data from multiple finite alphabets that such classifiers achieve higher correct classification rates than a standard marginal predictive classifier which labels all test items independently, when training data are sparse. In the supervised case for multiple finite alphabets the simultaneous and the marginal classifiers are proven to become equal under generalized exchangeability when the amount of training data increases. Hence, the marginal classifier can be interpreted as an asymptotic approximation to the simultaneous classifier for finite sets of training data. It is also shown that such convergence is not guaranteed in the semi-supervised setting, where the marginal classifier does not provide a consistent approximation.

  • 4.
    Corander, Jukka
    et al.
    University of Helsinki .
    Cui, Yaqiong
    University of Helsinki .
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Inductive Inference and Partition Exchangeability in Classification2013In: Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence: Papers from the Ray Solomonoff 85th Memorial Conference. / [ed] Dowe, David L., Springer Berlin/Heidelberg, 2013, p. 91-105Conference paper (Refereed)
    Abstract [en]

    Inductive inference has been a subject of intensive research efforts over several decades. In particular, for classification problems substantial advances have been made and the field has matured into a wide range of powerful approaches to inductive inference. However, a considerable challenge arises when deriving principles for an inductive supervised classifier in the presence of unpredictable or unanticipated events corresponding to unknown alphabets of observable features. Bayesian inductive theories based on de Finetti type exchangeability which have become popular in supervised classification do not apply to such problems. Here we derive an inductive supervised classifier based on partition exchangeability due to John Kingman. It is proven that, in contrast to classifiers based on de Finetti type exchangeability which can optimally handle test items independently of each other in the presence of infinite amounts of training data, a classifier based on partition exchangeability still continues to benefit from a joint prediction of labels for the whole population of test items. Some remarks about the relation of this work to generic convergence results in predictive inference are also given.

  • 5. Corander, Jukka
    et al.
    Diekmann, Odo
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    A tribute to Mats Gyllenberg, on the occasion of his 60th birthday2016In: Journal of Mathematical Biology, ISSN 0303-6812, E-ISSN 1432-1416, Vol. 72, no 4, p. 793-795Article in journal (Refereed)
  • 6. Corander, Jukka
    et al.
    Ekdahl, Magnus
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Parallell interacting MCMC for learning of topologies of graphical models2008In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 17, no 3, p. 431-456Article in journal (Refereed)
    Abstract [en]

    Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis-Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.

  • 7.
    Corander, Jukka
    et al.
    University of Helsinki .
    Ekdhal, Magnus
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Bayesian Unsupervised Learning of DNA Regulatory Binding Regions2009In: Advances in Artificial Intelligence, ISSN 1687-7470, E-ISSN 1687-7489, p. 219743-Article in journal (Refereed)
    Abstract [en]

    Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Mostapproaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and theirpositions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.

  • 8. Corander, Jukka
    et al.
    Gyllenberg, Mats
    Koski, Timo
    Bayesian model learning based on a parallel MCMC strategy2006In: Statistics and computing, ISSN 0960-3174, E-ISSN 1573-1375, Vol. 16, no 4, p. 355-362Article in journal (Refereed)
    Abstract [en]

    We introduce a novel Markov chain Monte Carlo algorithm for estimation of posterior probabilities over discrete model spaces. Our learning approach is applicable to families of models for which the marginal likelihood can be analytically calculated, either exactly or approximately, given any fixed structure. It is argued that for certain model neighborhood structures, the ordinary reversible Metropolis-Hastings algorithm does not yield an appropriate solution to the estimation problem. Therefore, we develop an alternative, non-reversible algorithm which can avoid the scaling effect of the neighborhood. To efficiently explore a model space, a finite number of interacting parallel stochastic processes is utilized. Our interaction scheme enables exploration of several local neighborhoods of a model space simultaneously, while it prevents the absorption of any particular process to a relatively inferior state. We illustrate the advantages of our method by an application to a classification model. In particular, we use an extensive bacterial database and compare our results with results obtained by different methods for the same data.

  • 9.
    Corander, Jukka
    et al.
    University of Helsinki.
    Gyllenberg, Mats
    University of Helsinki.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy2009In: Advances in Data Analysis and Classification, ISSN 1862-5347, E-ISSN 1862-5355, Vol. 3, no 1, p. 3-24Article in journal (Refereed)
    Abstract [en]

    Advantages of statistical model-based unsupervised classification over heuristic alternatives have been widely demonstrated in the scientific literature. However, the existing model-based approaches are often both conceptually and numerically instable for large and complex data sets. Here we consider a Bayesian model-based method for unsupervised classification of discrete valued vectors, that has certain advantages over standard solutions based on latent class models. Our theoretical formulation defines a posterior probability measure on the space of classification solutions corresponding to stochastic partitions of observed data. To efficiently explore the classification space we use a parallel search strategy based on non-reversible stochastic processes. A decision-theoretic approach is utilized to formalize the inferential process in the context of unsupervised classification. Both real and simulated data sets are used for the illustration of the discussed methods.

  • 10. Corander, Jukka
    et al.
    Gyllenberg, Mats
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Learning Genetic Population Structures Using Minimization of Stochastic Complexity2010In: Entropy, ISSN 1099-4300, E-ISSN 1099-4300, Vol. 12, no 5, p. 1102-1124Article in journal (Refereed)
    Abstract [en]

    Considerable research efforts have been devoted to probabilistic modeling of genetic population structures within the past decade. In particular, a wide spectrum of Bayesian models have been proposed for unlinked molecular marker data from diploid organisms. Here we derive a theoretical framework for learning genetic population structure of a haploid organism from bi-allelic markers for which potential patterns of dependence are a priori unknown and to be explicitly incorporated in the model. Our framework is based on the principle of minimizing stochastic complexity of an unsupervised classification under tree augmented factorization of the predictive data distribution. We discuss a fast implementation of the learning framework using deterministic algorithms.

  • 11. Corander, Jukka
    et al.
    Gyllenberg, Mats
    Koski, Timo
    Random partition models and exchangeability for Bayesian identification of population structure2007In: Bulletin of Mathematical Biology, ISSN 0092-8240, E-ISSN 1522-9602, Vol. 69, no 3, p. 797-815Article in journal (Refereed)
    Abstract [en]

    We introduce a Bayesian theoretical formulation of the statistical learning problem concerning the genetic structure of populations. The two key concepts in our derivation are exchangeability in its various forms and random allocation models. Implications of our results to empirical investigation of the population structure are discussed.

  • 12. Corander, Jukka
    et al.
    Xiong, Jie
    Cui, Yaqiong
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Optimal Viterbi Bayesian predictive classification for data from finite alphabets2013In: Journal of Statistical Planning and Inference, ISSN 0378-3758, E-ISSN 1873-1171, Vol. 143, no 2, p. 261-275Article in journal (Refereed)
    Abstract [en]

    A family of Viterbi Bayesian predictive classifiers has been recently popularized for speech recognition applications with continuous acoustic signals modeled by finite mixture densities embedded in a hidden Markov framework. Here we generalize such classifiers to sequentially observed data from multiple finite alphabets and derive the optimal predictive classifier under exchangeability of the emitted symbols. We demonstrate that the optimal predictive classifier which learns from unlabelled test items improves considerably upon marginal maximum a posteriori rule in the presence of sparse training data. It is shown that the learning process saturates when the amount of test data tends to infinity, such that no further gain in classification accuracy is possible upon arrival of new test items in the long run.

  • 13. Cui, Y.
    et al.
    Sirén, J.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Corander, J.
    Simultaneous Predictive Gaussian Classifiers2016In: Journal of Classification, ISSN 0176-4268, E-ISSN 1432-1343, p. 1-30Article in journal (Refereed)
    Abstract [en]

    Gaussian distribution has for several decades been ubiquitous in the theory and practice of statistical classification. Despite the early proposals motivating the use of predictive inference to design a classifier, this approach has gained relatively little attention apart from certain specific applications, such as speech recognition where its optimality has been widely acknowledged. Here we examine statistical properties of different inductive classification rules under a generic Gaussian model and demonstrate the optimality of considering simultaneous classification of multiple samples under an attractive loss function. It is shown that the simpler independent classification of samples leads asymptotically to the same optimal rule as the simultaneous classifier when the amount of training data increases, if the dimensionality of the feature space is bounded in an appropriate manner. Numerical investigations suggest that the simultaneous predictive classifier can lead to higher classification accuracy than the independent rule in the low-dimensional case, whereas the simultaneous approach suffers more from noise when the dimensionality increases.

  • 14. Ekdahl, Magnus
    et al.
    Koski, Timo
    Bounds for the loss in probability of correct classification under model based approximation2006In: Journal of machine learning research, ISSN 1532-4435, E-ISSN 1533-7928, Vol. 7, p. 2449-2480Article in journal (Refereed)
    Abstract [en]

    In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classification when compared to the optimal classifier. An example of an approximation is the Naive Bayes classifier. We show that the performance of the Naive Bayes depends on the degree of functional dependence between the features and labels. We provide a sufficient condition for zero loss of performance, too.

  • 15.
    Geilhufe, Matthias
    et al.
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA. Stockholm Univ, Roslagstullsbacken 23, SE-10691 Stockholm, Sweden..
    Olsthoorn, Bart
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA. Stockholm Univ, Roslagstullsbacken 23, SE-10691 Stockholm, Sweden..
    Ferella, Alfredo D.
    Fysikum Stockholm Univ, Oskar Klein Ctr Cosmoparticle Phys, Roslagstullsbacken 21, SE-10961 Stockholm, Sweden..
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Kahlhoefer, Felix
    Rhein Westfal TH Aachen, Inst Theoret Particle Phys & Cosmol TTK, D-52056 Aachen, Germany..
    Conrad, Jan
    Fysikum Stockholm Univ, Oskar Klein Ctr Cosmoparticle Phys, Roslagstullsbacken 21, SE-10961 Stockholm, Sweden..
    Balatsky, Alexander V.
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA. Stockholm Univ, Roslagstullsbacken 23, SE-10691 Stockholm, Sweden.;Univ Connecticut, 2152 Hillside Rd,U-3046, Storrs, CT 06269 USA.;Los Alamos Natl Lab, Inst Mat Sci, Los Alamos, NM 87545 USA..
    Materials Informatics for Dark Matter Detection2018In: Physica Status Solidi. Rapid Research Letters, ISSN 1862-6254, E-ISSN 1862-6270, Vol. 12, no 11, article id 1800293Article in journal (Refereed)
    Abstract [en]

    Dark Matter particles are commonly assumed to be weakly interacting massive particles (WIMPs) with a mass in the GeV to TeV range. However, recent interest has shifted toward lighter WIMPs, which are more difficult to probe experimentally. A detection of sub-GeV WIMPs will require the use of small gap materials in sensors. Using recent estimates of the WIMP mass, we identify the relevant target space toward small gap materials (100 to 10 meV). Dirac Materials, a class of small- or zero-gap materials, emerge as natural candidates for sensors for Dark Matter detection. We propose the use of informatics tools to rapidly assay materials band structures to search for small gap semiconductors and semimetals, rather than focusing on a few preselected compounds. As a specific example of the proposed strategy, we use the organic materials database () to identify organic candidates for sensors: the narrow band gap semiconductors BNQ-TTF and DEBTTT with gaps of 40 and 38 meV, and the Dirac-line semimetal (BEDT-TTF)center dot Br which exhibits a tiny gap of approximate to 50 meV when spin-orbit coupling is included. We outline a novel and powerful approach to search for dark matter detection sensor materials by means of a rapid assay of materials using informatics tools.

  • 16.
    Hallgren, Jonas
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Decomposition Sampling Applied to Parallelization of Metropolis-HastingsManuscript (preprint) (Other academic)
  • 17.
    Hallgren, Jonas
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Testing for Causality in Continuous time Bayesian Network Models of High-Frequency DataManuscript (preprint) (Other academic)
  • 18. Jääskinen, Väinö
    et al.
    Xiong, Jie
    Corander, Jukka
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Sparse Markov Chains for Sequence Data. Scandinavian Journal of Statistics2014In: Scandinavian Journal of Statistics, ISSN 0303-6898, E-ISSN 1467-9469, Vol. 41, no 3, p. 639-655Article in journal (Refereed)
    Abstract [en]

    Finite memory sources and variable-length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree-based approaches. This can lead to a substantially higher rate of data compression, and such non-hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.

  • 19. Koski, Timo
    A Bayesian molecular interaction library2003In: Journal of Computer-Aided Molecular Design, ISSN 0920-654X, E-ISSN 1573-4951, Vol. 17, no 7, p. 435-461Article in journal (Refereed)
    Abstract [en]

    We describe a library of molecular fragments designed to model and predict non-bonded interactions between atoms. We apply the Bayesian approach, whereby prior knowledge and uncertainty of the mathematical model are incorporated into the estimated model and its parameters. The molecular interaction data are strengthened by narrowing the atom classification to 14 atom types, focusing on independent molecular contacts that lie within a short cutoff distance, and symmetrizing the interaction data for the molecular fragments. Furthermore, the location of atoms in contact with a molecular fragment are modeled by Gaussian mixture densities whose maximum a posteriori estimates are obtained by applying a version of the expectation-maximization algorithm that incorporates hyperparameters for the components of the Gaussian mixtures. A routine is introduced providing the hyperparameters and the initial values of the parameters of the Gaussian mixture densities. A model selection criterion, based on the concept of a 'minimum message length' is used to automatically select the optimal complexity of a mixture model and the most suitable orientation of a reference frame for a fragment in a coordinate system. The type of atom interacting with a molecular fragment is predicted by values of the posterior probability function and the accuracy of these predictions is evaluated by comparing the predicted atom type with the actual atom type seen in crystal structures. The fact that an atom will simultaneously interact with several molecular fragments forming a cohesive network of interactions is exploited by introducing two strategies that combine the predictions of atom types given by multiple fragments. The accuracy of these combined predictions is compared with those based on an individual fragment. Exhaustive validation analyses and qualitative examples ( e. g., the ligand-binding domain of glutamate receptors) demonstrate that these improvements lead to effective modeling and prediction of molecular interactions.

  • 20. Koski, Timo
    A dissimilarity matrix between protein atom classes based on Gaussian mixtures2002In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 18, no 9, p. 1257-1263Article in journal (Refereed)
    Abstract [en]

    Motivation: Previously, Rantanen et al. (2001; J. Mol. Biol., 313, 197-214) constructed a protein atom-ligand fragment interaction library embodying experimentally solved, high-resolution three-dimensional (3D) structural data from the Protein Data Bank (PDB). The spatial locations of protein atoms that surround ligand fragments were modeled with Gaussian mixture models, the parameters of which were estimated with the expectation-maximization (EM) algorithm. In the validation analysis of this library, there was strong indication that the protein atom classification, 24 classes, was too large and that a reduction in the classes would lead to improved predictions. Results: Here, a dissimilarity (distance) matrix that is suitable for comparison and fusion of 24 pre-defined protein atom classes has been derived. Jeffreys' distances between Gaussian mixture models are used as a basis to estimate dissimilarities between protein atom classes. The dissimilarity data are analyzed both with a hierarchical clustering method and independently by using multidimensional scaling analysis. The results provide additional insight into the relationships between different protein atom classes, giving us guidance on, for example, how to readjust protein atom classification and, thus, they will help us to improve protein-ligand interaction predictions.

  • 21. Koski, Timo
    A fragment library based on gaussian mixtures predicting favorable molecular interactions2001In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 313, no 1, p. 197-214Article in journal (Refereed)
    Abstract [en]

    Here, a protein atom-ligand fragment interaction library is described. The library is based on experimentally solved structures of protein-ligand and protein-protein complexes deposited in the Protein Data Bank (PDB) and it is able to characterize binding sites given a ligand structure suitable for a protein. A set of 30 ligand fragment types were defined to include three or more atoms in order to unambiguously define a frame of referencefor interactions of ligand atoms with their receptor proteins. Interactions between ligand fragments and 24 classes of protein target atoms plus a water oxygen atom were collected and segregated according to type. The spatial distributions of individual fragment - target atom pairs were visually inspected in order to obtain rough-grained constraints on the interaction volumes. Data fulfilling these constraints were given as input to an iterative expectation-maximization algorithm that produces as output maximum likelihood estimates of the parameters of the finite Gaussian mixture models. Concepts of statistical pattern recognition and the resulting mixture model densities are used (i) to predict the detailed interactions between Chlorella virus DNA ligase and the adenine ring of its ligand and (ii) to evaluate the "error" in prediction for both the training and validation sets of protein-ligand interaction found in the PDB. These analyses demonstrate that this approach can successfully narrow down the possibilities for both the interacting protein atom type and its location relative to a ligand fragment.

  • 22. Koski, Timo
    Application of sliding-window discretization and minimization of stochastic complexity for the analysis of fAFLP genotyping fingerprint patterns of Vibrionaceae2005In: International Journal of Systematic and Evolutionary Microbiology, ISSN 1466-5026, E-ISSN 1466-5034, Vol. 55, p. 57-66Article in journal (Refereed)
    Abstract [en]

    Minimization of stochastic complexity (SC) was used as a method for classification of genotypic fingerprints. The method was applied to fluorescent amplified fragment length polymorphism (fAFLP) fingerprint patterns of 507 Vibrionaceae representatives. As the current BinClass implementation of the optimization algorithm for classification only works on binary vectors, the original fingerprints were discretized in a preliminary step using the sliding-window band-matching method, in order to maximally preserve the information content of the original band patterns. The novel classification generated using the BinClass software package was subjected to an in-depth comparison with a hierarchical classification of the same dataset, in order to acknowledge the applicability of the new classification method as a more objective algorithm for the classification of genotyping fingerprint patterns. Recent DNA-DNA hybridization and 16S rRNA gene sequence experiments proved that the classification based on SC-minimization forms separate clusters that contain the fAFLP patterns for all representatives of the species Enterovibrio norvegicus, Vibrio fortis, Vibrio diazotrophicus or Vibrio campbellii, while previous hierarchical cluster analysis had suggested more heterogeneity within the fAFLP patterns by splitting the representatives of the above-mentioned species into multiple distant clusters. As a result, the new classification methodology has highlighted some previously unseen relationships within the biodiversity of the family Vibrionaceae.

  • 23. Koski, Timo
    Clustering by Adaptive Local Search with multiple search operators2000In: Pattern Analysis and Applications, ISSN 1433-7541, E-ISSN 1433-755X, Vol. 3, no 4, p. 348-357Article in journal (Refereed)
    Abstract [en]

    Local Search (LS) has proven to be an efficient optimisation technique in clustering applications and in the minimisation of stochastic complexity of a data set. In the present paper, we propose two ways of organising LS in these contexts, the Multi-operator Local Search (MOLS) and the Adaptive Multi-Operator Local Search (AMOLS), and compare their performance to single operator (random swap) LS method and repeated GLA (Generalised Lloyd Algorithm). Both of the proposed methods use several different LS operators to solve the problem. MOLS applies the operators cyclically in the same order, whereas AMOLS adapts itself to favour the operators which manage to improve the result more frequently. We use a large database of binary vectors representing strains of bacteria belonging to the family Enterobacteriaceae and a binary image as our test materials. The new techniques turn out to be very promising in these tests.

  • 24. Koski, Timo
    Hidden Markov models for bioinformatics2001Book (Refereed)
  • 25.
    Koski, Timo
    KTH, Superseded Departments, Mathematics.
    Hidden Markov Models for Bioinformatics2001Book (Refereed)
  • 26.
    Koski, Timo
    Linköpings universitet.
    Lectures at RNI on probabilistic models and inference for phylogenetics2004 (ed. 1)Book (Other academic)
    Abstract [en]

    The core of these lecture notes corresponds to the contents of a series of seminars held during November-December, 2003, at the Division of Biometry of the Rolf Nevanlinna Institute (RNI) (a research institute of mathematics, computer science and statistics, University of Helsinki). The seminars were organized within the “Centre of Population Genetic Analyses” at RNI. The centre is funded by a grant from the Academy of Finland.

    The author thanks Professor Elja Arjas, head of the Division of Biometry, for the invitation to give these lectures and for several comments that have improved both the contents and of the presentation.

    The original impetus for the studies underlying these notes was to understand certain issues in bacterial taxonomy (see e.g., Busse et.al. 1996, Gaunt et.al. 2001, Mougel et.al. 2002, Van de Peer et.al. 1993) especially when construction hidden Markov models using Hobohnm algorithm for preprocessing of sequence data.

    The benefit of these lectures for their audience might be to have been exposed to many of the concepts and models that are being mentioned in the section on methods of a biological research paper like, e.g., (Gaunt et.al. 2001), and applied there, or in documents like PAML Manual (Yang 2002), MrByes Manual (Huelsenbeck and Ronquist 2001), and others.

    The current version of the notes is not final; several additional sections and chapters are under construction. One obvious shortcoming of this version is that every item in the Bibliography is not necessarily referred to in the text. Some of the figures in the text have a sketchy standard.

  • 27.
    Koski, Timo
    KTH, Superseded Departments, Mathematics.
    Minimizing stochastic complexity using local search and GLA with applications to classification of bacteria2000In: Biosystems (Amsterdam. Print), ISSN 0303-2647, E-ISSN 1872-8324, Vol. 57, no 1, p. 37-48Article in journal (Refereed)
    Abstract [en]

    In this paper, we compare the performance of two iterative clustering methods when applied to an extensive data set describing strains of the bacterial family Enterobacteriaceae. In both methods, the classification (i.e. the number of classes and the partitioning) is determined by minimizing stochastic complexity. The first method performs the minimization by repeated application of the generalized Lloyd algorithm (GLA). The second method uses an optimization technique known as local search (LS). The method modifies the current solution by making global changes to the class structure and it, then, performs local fine-tuning to find a local optimum. II is observed that if we fix the number of classes, the LS finds a classification with a lower stochastic complexity value than GLA. In addition, the valiance of the solutions is much smaller for the LS due to its more systematic method of searching. Overall, the two algorithms produce similar classifications but they merge cel tain natural classes with microbiological relevance in different ways.

  • 28. Koski, Timo
    New methods for the analysis of binarized BIOLOG GN data of vibrio species: Minimization of stochastic complexity and cumulative classification2002In: Systematic and Applied Microbiology, ISSN 0723-2020, E-ISSN 1618-0984, Vol. 25, no 3, p. 403-415Article in journal (Refereed)
    Abstract [en]

    We apply minimization of stochastic complexity and the closely related method of cumulative classification to analyse the extensively studied BIOLOG GN data of Vibrio spp. Minimization of stochastic complexity provides an objective tool of bacterial taxonomy as it produces classifications that are optimal from the point of view of information theory. We compare the outcome of our results with previously published classifications of the same data set. Our results both confirm earlier detected relationships between species and discover new ones.

  • 29. Koski, Timo
    Sliding window discretization: a new method for multiple band matching of bacterial genotyping fingerprints2004In: Bulletin of Mathematical Biology, ISSN 0092-8240, E-ISSN 1522-9602, Vol. 66, no 6, p. 1575-1596Article in journal (Refereed)
    Abstract [en]

    Microbiologists have traditionally applied hierarchical clustering algorithms as their mathematical tool of choice to unravel the taxonomic relationships between micro-organisms. However, the interpretation of such hierarchical classifications suffers from being subjective, in that a variety of ad hoc choices must be made during their construction. On the other hand, the application of more profound and objective mathematical methods-such as the minimization of stochastic complexity-for the classification of bacterial genotyping fingerprints data is hampered by the prerequisite that such methods only act upon vectorized data. In this paper we introduce a new method, coined sliding window discretization, for the transformation of genotypic fingerprint patterns into binary vector format. In the context of an extensive amplified fragment length polymorphism (AFLP) data set of 507 strains from the Vibrionaceae family that has previously been analysed, we demonstrate by comparison with a number of other discretization methods that this new discretization method results in minimal loss of the original information content captured in the banding patterns. Finally, we investigate the implications of the different discretization methods on the classification of bacterial genotyping fingerprints by minimization of stochastic complexity, as it is implemented in the BinClass software package for probabilistic clustering of binary vectors. The new taxonomic insights learned from the resulting classification of the AFLP patterns will prove the value of combining sliding window discretization with minimization of stochastic complexity, as an alternative classification algorithm for bacterial genotyping fingerprints.

  • 30.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    The Likelihood Ratio Statistic for Testing Spatial Independence using a Separable Covariance Matrix2009Report (Other academic)
    Abstract [en]

    his paper deals with the problem of testing spatial independence for dependent observations. The sample observationmatrix is assumed to follow a matrix normal distribution with a separable covariance matrix, in other words it can be written as a Kronecker product of two positive definite matrices. Two cases are considered, when the temporal covariance is known and when it is unknown. When the temporal covariance is known, the maximum likelihood estimates are computed and the asymptotic null distribution is given. In the case when the temporal covariance is unknown the maximum likelihood estimates of the parameters are found by an iterative alternating algorithm

  • 31. Koski, Timo
    The Wold isomorphism for cyclostationary sequences2004In: Signal Processing, ISSN 0165-1684, E-ISSN 1872-7557, Vol. 84, no 5, p. 813-824Article in journal (Refereed)
    Abstract [en]

    In 1948 Wold introduced an isometric isomorphism between a Hilbert (linear) space formed from the weighted shifts of a numerical sequence and a suitable Hilbert space of values of a second-order stochastic sequence. Motivated by a recent resurrection of the idea in the context of cyclostationary sequences and processes, we present the details of the Wold isomorphism between cyclostationary stochastic sequences and cyclostationary numerical sequences. We show how Hilbert-space representations of cyclostationary sequences are interpreted in the case of numerical CS sequences.

  • 32.
    Koski, Timo
    et al.
    Department of Mathematics, Linköpings University.
    Ekdahl, Magnus
    Department of Mathematics, Linköpings University.
    On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers2007In: Machine Learning and Data Mining in Pattern Recognition, Proceedings / [ed] Perner, Petra, Berlin: Springer Berlin/Heidelberg, 2007, p. 2-16Conference paper (Refereed)
    Abstract [en]

    Computational procedures using independence assumptions in various forms are popular in machine learning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. This paper derives distributions that maximizes the statewise difference to the respective product of marginals. These distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent ('Naïve') classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.

  • 33.
    Koski, Timo J. T.
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Noble, John M.
    Rios, Felix L.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    The Minimal Hoppe-Beta Prior Distribution for Directed Acyclic Graphs and Structure LearningManuscript (preprint) (Other academic)
    Abstract [en]

    The main contribution of this article is a new prior distribution over directed acyclic graphs intended for structured Bayesian networks, where the structure is given by an ordered block model. That is, the nodes of the graph are objects which fall into categories or blocks; the blocks have a natural ordering or ranking. The presence of a relationship between two objects is denoted by a directed edge, from the object of category of lower rank to the object of higher rank. The models considered here were introduced in Kemp et al. [7] for relational data and extended to multivariate data in Mansinghka et al. [12].

    We consider the situation where the nodes of the graph represent random variables, whose joint probability distribution factorises along the DAG. We use a minimal layering of the DAG to express the prior. We describe Monte Carlo schemes, with a similar generative that was used for prior, for finding the optimal a posteriori structure given a data matrix and compare the performance with Mansinghka et al. and also with the uniform prior. 

  • 34.
    Koski, Timo
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.).
    Jung, Brita
    Abo Akad Univ, Dept Nat Sci, FIN-20500 Turku, Finland..
    Hognas, Goran
    Abo Akad Univ, Dept Nat Sci, FIN-20500 Turku, Finland..
    EXIT TIMES FOR ARMA PROCESSES2018In: Advances in Applied Probability, ISSN 0001-8678, E-ISSN 1475-6064, Vol. 50, no A, p. 191-195Article in journal (Refereed)
    Abstract [en]

    We study the asymptotic behaviour of the expected exit time from an interval for the ARMA process, when the noise level approaches 0.

  • 35.
    Koski, Timo
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Noble, John
    University of Warsaw .
    A Review of Bayesian Networks and Structure Learning2012In: Mathematica Applicanda (Matematyka Stosowana), ISSN 2299-4009, Vol. 40, no 1, p. 51-103Article in journal (Refereed)
    Abstract [en]

    This article reviews the topic of Bayesian networks. A Bayesian networkis a factorisation of a probability distribution along a directed acyclic graph. Therelation between graphicald-separation and independence is described. A short ar-ticle from 1853 by Arthur Cayley [8] is discussed, which contains several ideas laterused in Bayesian networks: factorisation, the noisy ‘or’ gate, applications of algebraicgeometry to Bayesian networks. The ideas behind Pearl’s intervention calculus whenthe DAG represents acausaldependence structure and the relation between the workof Cayley and Pearl is commented on.Most of the discussion is about structure learning, outlining the two main approaches,search and score versus constraint based. Constraint based algorithms often rely onthe assumption offaithfulness, that the data to which the algorithm is applied isgenerated from distributions satisfying a faithfulness assumption where graphicald-separation and independence are equivalent. The article presents some considerationsfor constraint based algorithms based on recent data analysis, indicating a variety ofsituations where the faithfulness assumption does not hold. There is a short discussionabout the causal discovery controversy, the idea thatcausalrelations may be learnedfrom data.

  • 36.
    Koski, Timo
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Noble, John
    Linkoping University .
    Bayesian Networks: An Introduction2009 (ed. 1st)Book (Refereed)
  • 37.
    Koski, Timo
    et al.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.).
    Sandström, Erik
    Sandström, Ulf
    KTH, School of Industrial Engineering and Management (ITM), Industrial Economics and Management (Dept.), Sustainability and Industrial Dynamics.
    Towards field-adjusted production: Estimating research productivity from a zero-truncated distribution2016In: Journal of Informetrics, ISSN 1751-1577, E-ISSN 1875-5879, Vol. 10, no 4, p. 1143-1152Article in journal (Refereed)
    Abstract [en]

    Measures of research productivity (e.g. peer reviewed papers per researcher) is a fundamental part of bibliometric studies, but is often restricted by the properties of the data available. This paper addresses that fundamental issue and presents a detailed method for estimation of productivity (peer reviewed papers per researcher) based on data available in bibliographic databases (e.g. Web of Science and Scopus). The method can, for example, be used to estimate average productivity in different fields, and such field reference values can be used to produce field adjusted production values. Being able to produce such field adjusted production values could dramatically increase the relevance of bibliometric rankings and other bibliometric performance indicators. The results indicate that the estimations are reasonably stable given a sufficiently large data set.

  • 38. Nyman, Henrik
    et al.
    Pensar, Johan
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Corander, Jukka
    Context-specific independence in graphical log-linear models2016In: Computational statistics (Zeitschrift), ISSN 0943-4062, E-ISSN 1613-9658, Vol. 31, no 4, p. 1493-1512Article in journal (Refereed)
    Abstract [en]

    Log-linear models are the popular workhorses of analyzing contingency tables. A log-linear parameterization of an interaction model can be more expressive than a direct parameterization based on probabilities, leading to a powerful way of defining restrictions derived from marginal, conditional and context-specific independence. However, parameter estimation is often simpler under a direct parameterization, provided that the model enjoys certain decomposability properties. Here we introduce a cyclical projection algorithm for obtaining maximum likelihood estimates of log-linear parameters under an arbitrary context-specific graphical log-linear model, which needs not satisfy criteria of decomposability. We illustrate that lifting the restriction of decomposability makes the models more expressive, such that additional context-specific independencies embedded in real data can be identified. It is also shown how a context-specific graphical model can correspond to a non-hierarchical log-linear parameterization with a concise interpretation. This observation can pave way to further development of non-hierarchical log-linear models, which have been largely neglected due to their believed lack of interpretability.

  • 39. Nyman, Henrik
    et al.
    Pensar, Johan
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Corander, Jukka
    Stratified Graphical Models: Context-Specific Independence in Graphical Models2014In: BAYESIAN ANAL, ISSN 1931-6690, Vol. 9, no 4, p. 883-908Article in journal (Refereed)
    Abstract [en]

    Theory of graphical models has matured over more than three decades to provide the backbone for several classes of models that are used in a myriad of applications such as genetic mapping of diseases, credit risk evaluation, reliability and computer security. Despite their generic applicability and wide adoption, the constraints imposed by undirected graphical models and Bayesian networks have also been recognized to be unnecessarily stringent under certain circumstances. This observation has led to the proposal of several generalizations that aim at more relaxed constraints by which the models can impose local or context-specific dependence structures. Here we consider an additional class of such models, termed stratified graphical models. We develop a method for Bayesian learning of these models by deriving an analytical expression for the marginal likelihood of data under a specific subclass of decomposable stratified models. A non-reversible Markov chain Monte Carlo approach is further used to identify models that are highly supported by the posterior distribution over the model space. Our method is illustrated and compared with ordinary graphical models through application to several real and synthetic datasets.

  • 40. Pensar, Johan
    et al.
    Nyman, Henrik
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    Corander, Jukka
    Labeled directed acyclic graphs: a generalization of context-specific independence in directed graphical models2015In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 29, no 2, p. 503-533Article in journal (Refereed)
    Abstract [en]

    We introduce a novel class of labeled directed acyclic graph (LDAG) models for finite sets of discrete variables. LDAGs generalize earlier proposals for allowing local structures in the conditional probability distribution of a node, such that unrestricted label sets determine which edges can be deleted from the underlying directed acyclic graph (DAG) for a given context. Several properties of these models are derived, including a generalization of the concept of Markov equivalence classes. Efficient Bayesian learning of LDAGs is enabled by introducing an LDAG-based factorization of the Dirichlet prior for the model parameters, such that the marginal likelihood can be calculated analytically. In addition, we develop a novel prior distribution for the model structures that can appropriately penalize a model for its labeling complexity. A non-reversible Markov chain Monte Carlo algorithm combined with a greedy hill climbing approach is used for illustrating the useful properties of LDAG models for both real and synthetic data sets.

  • 41.
    Singull, Martin
    et al.
    Linköpings universitet .
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
    On the Distribution of Matrix Quadratic Forms2012In: Communications in Statistics - Theory and Methods, ISSN 0361-0926, E-ISSN 1532-415X, Vol. 41, no 18, p. 3403-3415Article in journal (Refereed)
    Abstract [en]

     A characterization of the distribution of the multivariate quadratic form given by XAX', where X is a p x n normally distributed matrix and A is an n x n symmetric real matrix, is presented. We show that the distribution of the quadratic form is the same as the distribution of a weighted sum of non central Wishart distributed matrices. This is applied to derive the distribution of the sample covariance between the rows of X when the expectation is the same for every column and is estimated with the regular mean.

  • 42. Westerlind, H.
    et al.
    Imrell, K.
    Ramanujam, Ryan
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.). Karolinska Institutet, Sweden.
    Myhr, K. -M
    Celius, E. G.
    Harbo, H. F.
    Oturai, A. B.
    Hamsten, A.
    Alfredsson, L.
    Olsson, T.
    Kockum, I.
    Koski, Timo
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.).
    Hillert, J.
    Identity-by-descent mapping in a Scandinavian multiple sclerosis cohort2015In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 23, no 5, p. 688-692Article in journal (Refereed)
    Abstract [en]

    In an attempt to map chromosomal regions carrying rare gene variants contributing to the risk of multiple sclerosis (MS), we identified segments shared identical-by-descent (IBD) using the software BEAGLE 4.0's refined IBD analysis. IBD mapping aims at identifying segments inherited from a common ancestor and shared more frequently in case-case pairs. A total of 2106 MS patients of Nordic origin and 624 matched controls were genotyped on Illumina Human Quad 660 chip and an additional 1352 ethnically matched controls typed on Illumina HumanHap 550 and Illumina 1M were added. The quality control left a total of 441 731 markers for the analysis. After identification of segments shared by descent and significance testing, a filter function for markers with low IBD sharing was applied. Four regions on chromosomes 5, 9, 14 and 19 were found to be significantly associated with the risk for MS. However, all markers but for one were located telomerically, including the very distal markers. For methodological reasons, such segments have a low sharing of IBD signals and are prone to be false positives. One marker on chromosome 19 reached genome-wide significance and was not one of the distal markers. This marker was located within the GNA11 gene, which contains no previous association with MS. We conclude that IBD mapping is not sufficiently powered to identify MS risk loci even in ethnically relatively homogenous populations, or that alternatively rare variants are not adequately present.

1 - 42 of 42
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf