Advances and optimization of processing and analysis of CAGE data
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
In transcriptomics, Cap Analysis of Gene Expression (CAGE) has been recognized as a powerful tool to identify locations of transcription to start sites and to investigate gene transcription activity and promoter usage on the genome-wide scale. However, despite the substantial efforts from the FANTOM consortium, CAGE has not yet become the leding method of choice for the gene expressions investigations, with data processing and analysis methods awaiting further developments in order to be easily and commonly applicable as for instance the currently popular RNA-seq method. In this thesis, we aimed at developing such CAGE-oriented methods with the ultimate goal of enabling CAGE studies and broadening their applicability. In particular, wefocused on the computational prediction of the location of active enhancers and on application of CAGE to extending the reference transcriptome for a non-model species for which the referencegenome is not yet known.
In terms of enchancer prediction, we predicted the genomic locations of the active enhancerstranscribed in the white adipose tissues (WAT) using two methods termed Enhancer Intersecttion (EI) and Enhancer Prediction (EP). Both methods were developed and optimized around a previously published study which documented enhancer properties of transcribed bidirectional capped RNAs that can be measured via CAGE. Following the eliminiation of the false positivesvia a set of designed filters, 5.976 uniquely identified enhancer candidates were obtained andassessed in the biological context of obesity in WAT. In Terms of reference transcriptome, also two methods of extending the reference transcirptome of the red spotted salamnder (Notoph-thalmus viridescens) were developed and tested, i.e. de novo contigs-based assembly (DMA) and mapping assembly (MA). The two methods yielded comparable results of ca. 4% of the reference transcripts being extended with DMA method outperforming slightly the MA methodin terms of the newly added bases. Alongside the reference transcriptome extension, the lowerthan standard amount of TNA for preparing CAGE library were tested, showing that as low as100 ng total RNA could work well for the library preparation as well as the subsequen data processing and analysis.
Together, this thesis presents methods and their application results that could be viewed as new directions of the applicability of the CAGE technique in genome-wide gene expression and regulation studies. The limited size of the available testing data, unfortunately, does not allow drawing statistically valid conclusions, yet the results clearly highlight the potential of CAGE to computationally predict the location of enhancer candidates and to extend the reference transcriptome when the reference genome is not known. We hope that this work will increase awareness of CAGE and will direct its future application to investigate gene regulation via enhancers and to address genomic questions in the studies involving non-model species.
Place, publisher, year, edition, pages
RNA, gene expression, CAGE, enhancer, missing reference genome
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-173313OAI: oai:DiVA.org:kth-173313DiVA: diva2:852471
Daub, Carsten O.Hrydziuszko, Olga