Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Decoding a substantial set of samples in parallel by massive sequencing
KTH, Centres, Science for Life Laboratory, SciLifeLab. (Gene Technology)
KTH, Centres, Science for Life Laboratory, SciLifeLab.
KTH, School of Biotechnology (BIO), Molecular Biotechnology (closed 20130101).ORCID iD: 0000-0002-1495-8338
KTH, Centres, Science for Life Laboratory, SciLifeLab.
2011 (English)In: Plos One, ISSN 1932-6203, Vol. 6, no 3Article in journal (Refereed) Published
Abstract [en]

The dramatic increase of throughput seen in the eld of sequenceanalysis during the last years has opened up new possibilities of se-quencing a multitude of samples in parallel. Here we present a novelstrategy where the combination of two tags is used to link reads totheir origins in a pool of samples. The two tags are incorporated intwo steps leading to lowering of sample handling complexity by nearly100 times. The method described here enables accurate identi cationand typing of thousands of samples in parallel and is scalable. In thisstudy the system was designed to test 4992 samples using only 122 tags.

To proof the concept of two tagging method the highly polymor-phic 2nd exon of DLA-DRB1 in dogs and wolves was sequenced usingthe 454 GS FLX Titanium Chemistry. By requiring a minimum se-quence depth of 20 reads per sample, 94% of the successfully ampli edsamples were genotyped. In addition, the method allowed digital de-tection of chimeric fragments. These results demonstrate that it ispossible to sequence thousands of samples in parallel without complexpooling patterns or primer combinations. Furthermore, the method isscalable and increasing the sample size by 960 samples requires only10 additional tags.

Place, publisher, year, edition, pages
San Fransisco: PUBLIC LIBRARY SCIENCE , 2011. Vol. 6, no 3
Keyword [en]
HIGH-THROUGHPUT; HIGH-RESOLUTION; DNA; PROBES
National Category
Genetics
Identifiers
URN: urn:nbn:se:kth:diva-24353DOI: 10.1371/journal.pone.0017785ISI: 000288170900047Scopus ID: 2-s2.0-79952430921OAI: oai:DiVA.org:kth-24353DiVA: diva2:347893
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20100906 Uppdaterad från manuskript till artikel i tidskrift 20110407

Available from: 2010-09-06 Created: 2010-09-03 Last updated: 2012-11-16Bibliographically approved
In thesis
1. Tagging systems for sequencing large cohorts
Open this publication in new window or tab >>Tagging systems for sequencing large cohorts
2010 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets.

This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized.

The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set.

In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained.

Place, publisher, year, edition, pages
Stockholm, 2010. 38 p.
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2010:15
Keyword
next generation sequencing, genotyping, massive parallel sequencing, 454, Pyrosequencing, amplicon sequencing, enrichment, DNA barcodes
National Category
Genetics
Identifiers
urn:nbn:se:kth:diva-24365 (URN)978-91-7415-706-2 (ISBN)
Presentation
2010-09-24, FA32, Roslagstullsbacken 21, Stockholm, AlbaNova, 10:15 (Swedish)
Supervisors
Note
QC20100907Available from: 2010-09-07 Created: 2010-09-06 Last updated: 2012-03-23Bibliographically approved
2. Methods to Prepare DNA for Efficient Massive Sequencing
Open this publication in new window or tab >>Methods to Prepare DNA for Efficient Massive Sequencing
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Massive sequencing has transformed the field of genome biology due to the continuous introduction and evolution of new methods. In recent years, the technologies available to read through genomes have undergone an unprecedented rate of development in terms of cost-reduction. Generating sequence data has essentially ceased to be a bottleneck for analyzing genomes instead to be replaced by limitations in sample preparation and data analysis. In this work, new strategies are presented to increase both the throughput of library generation prior to sequencing, and the informational content of libraries to aid post-sequencing data processing. The protocols developed aim to enable new possibilities for genome research concerning project scale and sequence complexity.

The first two papers that underpin this thesis deal with scaling library production by means of automation. Automated library preparation is first described for the 454 sequencing system based on a generic solid-phase polyethylene-glycol precipitation protocol for automated DNA handling. This was one of the first descriptions of automated sample handling for producing next generation sequencing libraries, and substantially improved sample throughput. Building on these results, the use of a double precipitation strategy to replace the manual agarose gel excision step for Illumina sequencing is presented. This protocol considerably improved the scalability of library construction for Illumina sequencing. The third and fourth papers present advanced strategies for library tagging in order to multiplex the information available in each library. First, a dual tagging strategy for massive sequencing is described in which two sets of tags are added to a library to trace back the origins of up to 4992 amplicons using 122 tags. The tagging strategy takes advantage of the previously automated pipeline and was used for the simultaneous sequencing of 3700 amplicons. Following that, an enzymatic protocol was developed to degrade long range PCR-amplicons and forming triple-tagged libraries containing information of sample origin, clonal origin and local positioning for the short-read sequences. Through tagging, this protocol makes it possible to analyze a longer continuous sequence region than would be possible based on the read length of the sequencing system alone. The fifth study investigates commonly used enzymes for constructing libraries for massive sequencing. We analyze restriction enzymes capable of digesting unknown sequences located some distance from their recognition sequence. Some of these enzymes have previously been extensively used for massive nucleic acid analysis. In this first high throughput study of such enzymes, we investigated their restriction specificity in terms of the distance from the recognition site and their sequence dependence. The phenomenon of slippage is characterized and shown to vary significantly between enzymes. The results obtained should favor future protocol development and enzymatic understanding.

Through these papers, this work aspire to aid the development of methods for massive sequencing in terms of scale, quality and knowledge; thereby contributing to the general applicability of the new paradigm of sequencing instruments.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. ii, 61 p.
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2012:22
Keyword
DNA, Massive sequencing, Next Generation Sequencing, Library Preparation, Barcoding, Multiplexing
National Category
Other Industrial Biotechnology Biomedical Laboratory Science/Technology
Identifiers
urn:nbn:se:kth:diva-105116 (URN)978-91-7501-548-4 (ISBN)
Public defence
2012-12-07, Gardaulan, Smittshyddsinstitutet, Nobels väg 18, Solna, 10:00 (English)
Opponent
Supervisors
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20121126

Available from: 2012-11-16 Created: 2012-11-16 Last updated: 2013-04-15Bibliographically approved

Open Access in DiVA

fulltext(800 kB)395 downloads
File information
File name FULLTEXT01.pdfFile size 800 kBChecksum SHA-512
e254e8c6269a889ff1609598f6cc8b745ed3551ee6c85a92b84cf547c0db77a9094db0f176c368c0e5dbd8b174086d5e7be27f6ef8a8b3e5aa9be20c089096a0
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records BETA

Savolainen, Peter

Search in DiVA

By author/editor
Neiman, MårtenLundin, SverkerSavolainen, PeterAhmadian, Afshin
By organisation
Science for Life Laboratory, SciLifeLabMolecular Biotechnology (closed 20130101)
Genetics

Search outside of DiVA

GoogleGoogle Scholar
Total: 395 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 134 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf