Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Methods to Prepare DNA for Efficient Massive Sequencing
KTH, School of Biotechnology (BIO), Gene Technology.
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Massive sequencing has transformed the field of genome biology due to the continuous introduction and evolution of new methods. In recent years, the technologies available to read through genomes have undergone an unprecedented rate of development in terms of cost-reduction. Generating sequence data has essentially ceased to be a bottleneck for analyzing genomes instead to be replaced by limitations in sample preparation and data analysis. In this work, new strategies are presented to increase both the throughput of library generation prior to sequencing, and the informational content of libraries to aid post-sequencing data processing. The protocols developed aim to enable new possibilities for genome research concerning project scale and sequence complexity.

The first two papers that underpin this thesis deal with scaling library production by means of automation. Automated library preparation is first described for the 454 sequencing system based on a generic solid-phase polyethylene-glycol precipitation protocol for automated DNA handling. This was one of the first descriptions of automated sample handling for producing next generation sequencing libraries, and substantially improved sample throughput. Building on these results, the use of a double precipitation strategy to replace the manual agarose gel excision step for Illumina sequencing is presented. This protocol considerably improved the scalability of library construction for Illumina sequencing. The third and fourth papers present advanced strategies for library tagging in order to multiplex the information available in each library. First, a dual tagging strategy for massive sequencing is described in which two sets of tags are added to a library to trace back the origins of up to 4992 amplicons using 122 tags. The tagging strategy takes advantage of the previously automated pipeline and was used for the simultaneous sequencing of 3700 amplicons. Following that, an enzymatic protocol was developed to degrade long range PCR-amplicons and forming triple-tagged libraries containing information of sample origin, clonal origin and local positioning for the short-read sequences. Through tagging, this protocol makes it possible to analyze a longer continuous sequence region than would be possible based on the read length of the sequencing system alone. The fifth study investigates commonly used enzymes for constructing libraries for massive sequencing. We analyze restriction enzymes capable of digesting unknown sequences located some distance from their recognition sequence. Some of these enzymes have previously been extensively used for massive nucleic acid analysis. In this first high throughput study of such enzymes, we investigated their restriction specificity in terms of the distance from the recognition site and their sequence dependence. The phenomenon of slippage is characterized and shown to vary significantly between enzymes. The results obtained should favor future protocol development and enzymatic understanding.

Through these papers, this work aspire to aid the development of methods for massive sequencing in terms of scale, quality and knowledge; thereby contributing to the general applicability of the new paradigm of sequencing instruments.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. , ii, 61 p.
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2012:22
Keyword [en]
DNA, Massive sequencing, Next Generation Sequencing, Library Preparation, Barcoding, Multiplexing
National Category
Other Industrial Biotechnology Biomedical Laboratory Science/Technology
Identifiers
URN: urn:nbn:se:kth:diva-105116ISBN: 978-91-7501-548-4 (print)OAI: oai:DiVA.org:kth-105116DiVA: diva2:570053
Public defence
2012-12-07, Gardaulan, Smittshyddsinstitutet, Nobels väg 18, Solna, 10:00 (English)
Opponent
Supervisors
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20121126

Available from: 2012-11-16 Created: 2012-11-16 Last updated: 2013-04-15Bibliographically approved
List of papers
1. Increased Throughput by Parallelization of Library Preparation for Massive Sequencing
Open this publication in new window or tab >>Increased Throughput by Parallelization of Library Preparation for Massive Sequencing
Show others...
2010 (English)In: PLOS ONE, ISSN 1932-6203, Vol. 5, no 3, e10029- p.Article in journal (Refereed) Published
Abstract [en]

Background: Massively parallel sequencing systems continue to improve on data output, while leaving labor-intensive library preparations a potential bottleneck. Efforts are currently under way to relieve the crucial and time-consuming work to prepare DNA for high-throughput sequencing. Methodology/Principal Findings: In this study, we demonstrate an automated parallel library preparation protocol using generic carboxylic acid-coated superparamagnetic beads and polyethylene glycol precipitation as a reproducible and flexible method for DNA fragment length separation. With this approach the library preparation for DNA sequencing can easily be adjusted to a desired fragment length. The automated protocol, here demonstrated using the GS FLX Titanium instrument, was compared to the standard manual library preparation, showing higher yield, throughput and great reproducibility. In addition, 12 libraries were prepared and uniquely tagged in parallel, and the distribution of sequence reads between these indexed samples could be improved using quantitative PCR-assisted pooling. Conclusions/Significance: We present a novel automated procedure that makes it possible to prepare 36 indexed libraries per person and day, which can be increased to up to 96 libraries processed simultaneously. The yield, speed and robust performance of the protocol constitute a substantial improvement to present manual methods, without the need of extensive equipment investments. The described procedure enables a considerable efficiency increase for small to midsize sequencing centers.

Keyword
POLYETHYLENE-GLYCOL, DNA, PRECIPITATION
National Category
Other Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-28306 (URN)10.1371/journal.pone.0010029 (DOI)000276420400007 ()2-s2.0-77956313182 (Scopus ID)
Funder
Swedish Research CouncilKnut and Alice Wallenberg Foundation
Note
QC 20110113Available from: 2011-01-13 Created: 2011-01-12 Last updated: 2012-11-16Bibliographically approved
2. Large Scale Library Generation for High Throughput Sequencing Authors and Affiliations
Open this publication in new window or tab >>Large Scale Library Generation for High Throughput Sequencing Authors and Affiliations
2011 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 6, no 4, e19119- p.Article in journal (Refereed) Published
Abstract [en]

Background: Large efforts have recently been made to automatethe sample preparation protocols for massively parallel sequencing in order to match the increasing instrument throughput. Still, the size selection through agarose gel electrophoresis separation is a labor-intensive bottleneck of these protocols. Methodology/Principal Findings: In this study a method for automatic library preparation and size selection on a liquid handling robot is presented. The method utilizes selective precipitation of certain sizes of DNA molecules on to paramagnetic beads for cleanup and selection after standard enzymatic reactions. Conclusions/Significance: The method is used to generate libraries for de novo and re-sequencing on the Illumina HiSeq 2000 instrument with a throughput of 12 samples per instrument in approximately 4 hours. The resulting output data show quality scores and pass filter rates comparable to manually prepared samples. The sample size distribution can be adjusted for each application, and are suitable for all high throughput DNA processing protocols seeking to control size intervals.

Keyword
Cell lines
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-33950 (URN)10.1371/journal.pone.0019119 (DOI)000290019400031 ()2-s2.0-79955691833 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note
QC 20110609Available from: 2011-06-09 Created: 2011-05-23 Last updated: 2017-12-11Bibliographically approved
3. Decoding a substantial set of samples in parallel by massive sequencing
Open this publication in new window or tab >>Decoding a substantial set of samples in parallel by massive sequencing
2011 (English)In: Plos One, ISSN 1932-6203, Vol. 6, no 3Article in journal (Refereed) Published
Abstract [en]

The dramatic increase of throughput seen in the eld of sequenceanalysis during the last years has opened up new possibilities of se-quencing a multitude of samples in parallel. Here we present a novelstrategy where the combination of two tags is used to link reads totheir origins in a pool of samples. The two tags are incorporated intwo steps leading to lowering of sample handling complexity by nearly100 times. The method described here enables accurate identi cationand typing of thousands of samples in parallel and is scalable. In thisstudy the system was designed to test 4992 samples using only 122 tags.

To proof the concept of two tagging method the highly polymor-phic 2nd exon of DLA-DRB1 in dogs and wolves was sequenced usingthe 454 GS FLX Titanium Chemistry. By requiring a minimum se-quence depth of 20 reads per sample, 94% of the successfully ampli edsamples were genotyped. In addition, the method allowed digital de-tection of chimeric fragments. These results demonstrate that it ispossible to sequence thousands of samples in parallel without complexpooling patterns or primer combinations. Furthermore, the method isscalable and increasing the sample size by 960 samples requires only10 additional tags.

Place, publisher, year, edition, pages
San Fransisco: PUBLIC LIBRARY SCIENCE, 2011
Keyword
HIGH-THROUGHPUT; HIGH-RESOLUTION; DNA; PROBES
National Category
Genetics
Identifiers
urn:nbn:se:kth:diva-24353 (URN)10.1371/journal.pone.0017785 (DOI)000288170900047 ()2-s2.0-79952430921 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20100906 Uppdaterad från manuskript till artikel i tidskrift 20110407

Available from: 2010-09-06 Created: 2010-09-03 Last updated: 2012-11-16Bibliographically approved
4. Hierarchical molecular tagging to resolve long continuous sequences by massively parallel sequencing
Open this publication in new window or tab >>Hierarchical molecular tagging to resolve long continuous sequences by massively parallel sequencing
Show others...
2013 (English)In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 3, 1186- p.Article in journal (Refereed) Published
Abstract [en]

Here we demonstrate the use of short-read massive sequencing systems to in effect achieve longer read lengths through hierarchical molecular tagging. We show how indexed and PCR-amplified targeted libraries are degraded, sub-sampled and arrested at timed intervals to achieve pools of differing average length, each of which is indexed with a new tag. By this process, indices of sample origin, molecular origin, and degree of degradation is incorporated in order to achieve a nested hierarchical structure, later to be utilized in the data processing to order the reads over a longer distance than the sequencing system originally allows. With this protocol we show how continuous regions beyond 3000 bp can be decoded by an Illumina sequencing system, and we illustrate the potential applications by calling variants of the lambda genome, analysing TP53 in cancer cell lines, and targeting a variable canine mitochondrial region.

Keyword
Human Genome, Structural Variation, Domestic Dog, Dna, Mutations
National Category
Other Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-105124 (URN)10.1038/srep01186 (DOI)000315767100001 ()2-s2.0-84875360897 (Scopus ID)
Funder
EU, FP7, Seventh Framework Programme, 222913Swedish Research Council
Note

QC 20130405. Updated from submitted to published.

Available from: 2012-11-16 Created: 2012-11-16 Last updated: 2017-12-07Bibliographically approved
5. Endonuclease specificity and sequence dependence of Type IIS restriction enzymes
Open this publication in new window or tab >>Endonuclease specificity and sequence dependence of Type IIS restriction enzymes
Show others...
2015 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 10, no 1, e0117059Article in journal (Refereed) Published
Abstract [en]

Restriction enzymes that recognize specific sequences but cleave unknown sequence outside the recognition site are extensively utilized tools in molecular biology. Despite this, systematic functional categorization of cleavage performance has largely been lacking. We established a simple and automatable model system to assay cleavage distance variation (termed slippage) and the sequence dependence thereof. We coupled this to massively parallel sequencing in order to provide sensitive and accurate measurement. With this system 14 enzymes were assayed (AcuI, BbvI, BpmI, BpuEI, BseRI, BsgI, Eco57I, Eco57MI, EcoP15I, FauI, FokI, GsuI, MmeI and SmuI). We report significant variation of slippage ranging from 1-54%, variations in sequence context dependence, as well as variation between isoschizomers. We believe this largely overlooked property of enzymes with shifted cleavage would benefit from further large scale classification and engineering efforts seeking to improve performance. The gained insights of in-vitro performance may also aid the in-vivo understanding of these enzymes.

Place, publisher, year, edition, pages
Public Library of Science, 2015
National Category
Other Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-105132 (URN)10.1371/journal.pone.0117059 (DOI)000348732100060 ()2-s2.0-84922424353 (Scopus ID)
Funder
EU, FP7, Seventh Framework Programme, 222913Swedish Foundation for Strategic Research
Note

Updated from Submitted to Published. QC 20150407

Available from: 2012-11-16 Created: 2012-11-16 Last updated: 2017-12-07Bibliographically approved

Open Access in DiVA

fulltext(1584 kB)1496 downloads
File information
File name FULLTEXT01.pdfFile size 1584 kBChecksum SHA-512
fda9f0a69caf97e683b858453cda07ec22101b3dfc83673bfa35fcf7c4be2761220e1d163c8e3d22f7cf62e9736b521d35776db86d0325ae08ec1c3a77ce2fa0
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Lundin, Sverker
By organisation
Gene Technology
Other Industrial BiotechnologyBiomedical Laboratory Science/Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1496 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 408 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf