Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
QPS - quadratic programming sampler: a motif finder using biophysical modeling
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
2008 (English)Article in journal (Other academic) Submitted
Abstract [en]

We present a Markov chain Monte Carlo algorithm for local alignments of nucleotide sequences aiming to infer putative transcription factor binding sites, referred to as the quadratic programming sampler. The new motif nder incorporates detailed biophysical modeling of the transcription factor binding site recognition which arises an intrinsic threshold discriminating putative binding sites from other/background sequences.

We validate the principal functioning of the algorithm on a sample of four promoter regions from Escherichia coli. The resulting description of the motif can be readily evaluated on the whole genome to identify new putative binding sites.

Place, publisher, year, edition, pages
2008.
Keyword [en]
Transcription Factor Protein, Binding Site Inference, Energy Matrix, MCMC
National Category
Condensed Matter Physics
Identifiers
URN: urn:nbn:se:kth:diva-7973OAI: oai:DiVA.org:kth-7973DiVA: diva2:13167
Note
QS 20120314Available from: 2008-02-12 Created: 2008-02-12 Last updated: 2012-03-14Bibliographically approved
In thesis
1. Statistical models of TF/DNA interaction
Open this publication in new window or tab >>Statistical models of TF/DNA interaction
2008 (English)Licentiate thesis, comprehensive summary (Other scientific)
Abstract [en]

Gene expression is regulated in response to metabolic necessities and environmental changes throughout the life of a cell.

A major part of this regulation is governed at the level of transcription, deciding whether messengers to specific genes are produced or not.

This decision is triggered by the action of transcription factors, proteins which interact with specific sites on DNA and thus influence the rate of transcription of proximal genes.

Mapping the organisation of these transcription factor binding sites sheds light on potential causal relations between genes and is the key to establishing networks of genetic interactions, which determine how the cell adapts to external changes.

In this work I review briefly the basics of genetics and summarise popular approaches to describe transcription factor binding sites, from the most straight forward to finally discuss a biophysically motivated representation based on the estimation of free energies of molecular interactions.

Two articles on transcription factors are contained in this thesis, one published (Aurell, Fouquier d'Hérouël, Malmnäs and Vergassola, 2007) and one submitted (Fouquier d'Hérouël, 2008).

Both rely strongly on the representation of binding sites by matrices accounting for the affinity of the proteins to specific nucleotides at the different positions of the binding sites.

The importance of non-specific binding of transcription factors to DNA is briefly addressed in the text and extensively discussed in the first appended article:

In a study on the affinity of yeast transcription factors for their binding sites, we conclude that measured in vivo protein concentrations are marginally sufficient to guarantee the occupation of functional sites, as opposed to unspecific emplacements on the genomic sequence.

A common task being the inference of binding site motifs, the most common statistical method is reviewed in detail, upon which I constructed an alternative biophysically motivated approach, exemplified in the second appended article.

Place, publisher, year, edition, pages
Stockholm: KTH, 2008. x, 41 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2008:01
Keyword
gene expression, regulation, transcription factor, binding motif, matrix representations, gibbs sampling, binding affinity, non-specific binding
National Category
Condensed Matter Physics
Identifiers
urn:nbn:se:kth:diva-4633 (URN)978-91-7178-874-0 (ISBN)
Presentation
2008-02-29, RB 35, Roslagstullsbacken 35, Stockholm, 13:00
Opponent
Supervisors
Note
QC 20101110Available from: 2008-02-12 Created: 2008-02-12 Last updated: 2010-11-10Bibliographically approved
2. On diverse biophysical aspects of genetics: from the action of regulators to the characterization of transcripts
Open this publication in new window or tab >>On diverse biophysical aspects of genetics: from the action of regulators to the characterization of transcripts
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Genetics is among the most rewarding fields of biology for the theoretically inclined, offering both room and need for modeling approaches in the light of an abundance of experimental data of different kinds. Many aspects of the field are today understood in terms of physical and chemical models, joined by information theoretical descriptions. This thesis discusses different mechanisms and phenomena related to genetics, employing tools from statistical physics along with experimental biomolecular methods. Five articles support this work.

Two articles deal with interactions between proteins and DNA. The first one reports on the properties of non-specific binding of transcription factors proteins in the yeast Saccharomyces cerevisiae, due to an effective background free energy which describes the affinity of a single protein for random locations on DNA. We argue that a background pool of non-specific binding sites is filled up before specific binding sites can be occupied with high probability, thus presenting a natural filter for genetic responses to spurious transcription factor productions. The second article describes an algorithm for the inference of transcription factor binding sites for proteins using a realistic physical model. The functionality of the method is verified on a set of known binding sequences for Escherichia coli transcription factors.

The third article describes a possible genetic feedback mechanism between human cells and the ubiquitous Epstein-Barr virus (EBV). 40 binding regions for the major EBV transcription factor EBNA1 are identified in human DNA. Several of these are located nearby genes of particular relevance in the context of EBV infection and the most interesting ones are discussed.

The fourth article describes results obtained from a positional autocorrelation analysis of the human genome, a simple technique to visualize and classify sequence repeats, constituting large parts of eukaryotic genomes. Applying this analysis to genome sequences in which previously known repeats have been removed gives rise to signals corroborating the existence of yet unclassified repeats of surprisingly long periods.

The fifth article combines computational predictions with a novel molecular biological method based on the rapid amplification of cDNA ends (RACE), coined 5’tagRACE. The first search for non-coding RNAs encoded in the genome of the opportunistic bacterium Enterococcus faecalis is performed here. Applying 5’tagRACE allows us to discover and map 29 novel ncRNAs, 10 putative novelm RNAs and 16 antisense transcriptional organizations.

Further studies, which are not included as articles, on the monitoring of secondary structure formation of nucleic acids during thermal renaturation and the inference of genetic couplings of various kinds from massive gene expression data and computational predictions, are outlined in the central chapters.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2011. xxii, 98 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2011:04
Keyword
transcription regulation, regulatory motifs, binding affinity, genetic interactions, secondary structure, sequence repeats, transcript characterization
National Category
Condensed Matter Physics
Identifiers
urn:nbn:se:kth:diva-31490 (URN)978-91-7415-911-0 (ISBN)
Public defence
2011-04-06, Sal FB53, Roslagstullsbacken 21, AlbaNova, Stockholm, 10:00 (English)
Opponent
Supervisors
Note
QC 20110316Available from: 2011-03-17 Created: 2011-03-16 Last updated: 2011-03-17Bibliographically approved

Open Access in DiVA

No full text

Other links

http://arxiv.org/abs/0802.0258

Search in DiVA

By author/editor
Fouquier d'Hérouël, Aymeric
By organisation
Computational Biology, CB
Condensed Matter Physics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 58 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf