kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bioinformatics for microbiome analysis
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. (Envgen)ORCID iD: 0000-0001-7850-5285
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Marine ecosystems harbour a vast microbial diversity which play a crucial role in ecosystemfunctioning. Advancements in DNA sequencing technologies have transformed our ability to analyse microbial populations comprehensively. Metagenomic sequencing has emerged as a pivotal tool for characterising microbial communities across various environments. Bioinformatics, an interdisciplinary field, facilitates the analysis and interpretation of large biological datasets, including microbiome data.

This thesis aims to enhance bioinformatics approaches for analysing marine microbiomes. It comprises four papers covering bioinformatic developments and genomic data analysis across multiple topics, including metagenomics, pangenomics, comparative genomics and population genomics:

Paper I evaluated three assembly strategies for constructing gene catalogues from metagenomic samples: individual sample assembly with gene clustering, co-assembly of all samples, and a new hybrid approach, mix assembly. The efficacy of the mix-assembly approach was highlighted for maximising information extraction from metagenomic samples, offering opportunities for further exploration in microbial ecology and environmental genomics.

Using the mix-assembly approach, we conducted a comprehensive analysis of 124 metagenomic samples sourced from the Baltic Sea, resulting in the refinement of the Baltic Sea Gene Set (BAGS v1.1), which now encompasses 66.53 million genes annotated for both functionality and taxonomy. In Paper II, we introduced an open-access initiative that provided the mix-assembly pipeline code. We also developed the BAGS-Shiny web application to facilitate user interaction with this extensive gene catalogue.

Paper III focused on whole-genome sequencing and assembly of 82 environmental V. vulnificus strains from the Baltic Sea, enabling comprehensive comparative genomic analysis. I developed the PhyloBOTL pipeline, which uses a phylogeny-based approach to identify genes associated with pathogenicity. Comparative genomics of 208 clinical isolates and 199 environmental isolates revealed 58 enriched orthologs in pathogenic strains, including known virulence factors and novel genes. Potential biomarkers for pathogenic V. vulnificus were identified, and primers suitable for PCR-based environmental monitoring were designed (in silico).

In Paper IV population genomics analysis was carried out, using the Input_Pogenom pipeline and POGENOM tool, to explore intraspecific biogeographical patterns. Geographical barriers were found to significantly influence aquatic bacteria distribution, with greater genetic differentiation observed between Baltic and Caspian seas than within the Baltic Sea's salinity gradient.

Abstract [sv]

Havsmiljöer hyser en enorm mikrobiell mångfald som spelar en avgörande roll för ekosystemens funktion. Framsteg inom DNA-sekvenseringstekniker har revolutionerat vår förmåga att analysera

den mikrobiella populationen på ett omfattande sätt. Metagenomisk sekvensering har framträtt som ett centralt verktyg för att karakterisera mikrobiella samhällen i olika miljöer. Bioinformatik, ett tvärvetenskapligt fält, underlättar analys och tolkning av stora biologiska dataset, inklusive mikrobiomdata.

Den här avhandlingen syftar till att förbättra bioinformatiska metoder för att analysera marina mikrobiom. Den består av fyra artiklar som täcker bioinformatisk utveckling och analys av genomdata inom flera områden, inklusive metagenomik, pangenom, jämförande genomik och populationsgenetik:

Artikel I utvärderade tre monteringsstrategier för att konstruera genkataloger från metagenomiska prover: montering av enskilda prover med genglustering, sammontering av alla prover och en ny hybridmetod, mixmontering. Effektiviteten hos mixmonteringsmetoden lyftes fram för att maximera informationsutvinning från metagenomiska prover, vilket öppnar för vidare utforskning inom mikrobiell ekologi och miljögenomik.

Med hjälp av mixmonteringsmetoden genomförde vi en omfattande analys av 124 metagenomiska prover från Östersjön, vilket resulterade i förfiningen av Östersjöns gensets (BAGS v1.1), som nu omfattar 66,53 miljoner gener annoterade för både funktion och taxonomi. I artikel II introducerade vi ett öppet initiativ som tillhandahöll koden för mixmonterings-pipeline. Vi utvecklade också BAGS-Shiny webbapplikationen för att underlätta användarinteraktion med denna omfattande gensetskatalog.

Artikel III fokuserade på helgenomsekvensering och montering av 82 miljörelaterade V. vulnificus-stammar från Östersjön, vilket möjliggjorde omfattande jämförande genomisk analys.

Jag utvecklade PhyloBOTL-pipelinen, som använder en fylogenibaserad metod för att identifiera gener associerade med patogenicitet. Jämförande genomik av 208 kliniska isolat och 199 miljöisolerade isolat avslöjade 58 anrikade ortologer i patogena stammar, inklusive kända virulensfaktorer och nya gener. Potentiella biomarkörer för patogena V. vulnificus identifierades och primers lämpliga för PCR-baserad miljöövervakning designades (in silico).

I artikel IV utfördes populationsgenetisk analys med hjälp av Input_Pogenom-pipelinen och POGENOM-verktyget för att utforska intraspecifika biogeografiska mönster. Geografiska barriärer visade sig signifikant påverka vattenlevande bakteriers utbredning, med större genetisk differentiering observerad mellan Östersjön och Kaspiska havet än inom Östersjöns salthaltsgradient.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2024. , p. 51
Series
TRITA-CBH-FOU ; 2024:26
Keywords [en]
Baltic Sea, Bioinformatics, Comparative genomics, Metagenomics, Vibrio vulnificus
Keywords [sv]
Östersjön, Bioinformatik, Jämförande genomik, Metagenomik, Vibrio vulnificus
National Category
Bioinformatics (Computational Biology)
Research subject
Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-346285ISBN: 978-91-8040-956-8 (print)OAI: oai:DiVA.org:kth-346285DiVA, id: diva2:1857951
Public defence
2024-06-14, Marie, Widerströmska huset, Tomtebodeväegen 18a, via Zoom: https://kth-se.zoom.us/j/67263907871, Solna, 13:00 (English)
Opponent
Supervisors
Note

QC 2024-05-15

Available from: 2024-05-15 Created: 2024-05-15 Last updated: 2024-09-23Bibliographically approved
List of papers
1. Evaluating metagenomic assembly approaches for biome-specific gene catalogues
Open this publication in new window or tab >>Evaluating metagenomic assembly approaches for biome-specific gene catalogues
2022 (English)In: Microbiome, E-ISSN 2049-2618, Vol. 10, no 1, article id 72Article in journal (Refereed) Published
Abstract [en]

Background: For many environments, biome-specific microbial gene catalogues are being recovered using shotgun metagenomics followed by assembly and gene calling on the assembled contigs. The assembly is typically conducted either by individually assembling each sample or by co-assembling reads from all the samples. The co-assembly approach can potentially recover genes that display too low abundance to be assembled from individual samples. On the other hand, combining samples increases the risk of mixing data from closely related strains, which can hamper the assembly process. In this respect, assembly on individual samples followed by clustering of (near) identical genes is preferable. Thus, both approaches have potential pros and cons, but it remains to be evaluated which assembly strategy is most effective. Here, we have evaluated three assembly strategies for generating gene catalogues from metagenomes using a dataset of 124 samples from the Baltic Sea: (1) assembly on individual samples followed by clustering of the resulting genes, (2) co-assembly on all samples, and (3) mix assembly, combining individual and co-assembly. Results: The mix-assembly approach resulted in a more extensive nonredundant gene set than the other approaches and with more genes predicted to be complete and that could be functionally annotated.The mix assembly consists of 67 million genes (Baltic Sea gene set, BAGS) that have been functionally and taxonomically annotated. The majority of the BAGS genes are dissimilar (< 95% amino acid identity) to the Tara Oceans gene dataset, and hence, BAGS represents a valuable resource for brackish water research. Conclusion: The mix-assembly approach represents a feasible approach to increase the information obtained from metagenomic samples.

Place, publisher, year, edition, pages
Springer Nature, 2022
Keywords
Gene catalogue, Brackish water, Metagenomics, Assembly approach, Mix assembly, Baltic Sea
National Category
Medical Biotechnology Genetics and Genomics Geophysics
Identifiers
urn:nbn:se:kth:diva-312783 (URN)10.1186/s40168-022-01259-2 (DOI)000791807300002 ()35524337 (PubMedID)2-s2.0-85129664757 (Scopus ID)
Funder
Swedish Research Council, 2019-00242Swedish Research Council, 2021-05563Swedish Research Council Formas, 2019-2020KTH Royal Institute of Technology
Note

QC 20220523

Available from: 2022-05-23 Created: 2022-05-23 Last updated: 2025-02-01Bibliographically approved
2. BAGS-Shiny: a web-based interactive tool for exploring the Baltic Sea microbial gene set
Open this publication in new window or tab >>BAGS-Shiny: a web-based interactive tool for exploring the Baltic Sea microbial gene set
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Biome-specific gene catalogues have been recovered for many environments using shotgun metagenomics, followed by assembly and gene calling on the assembled contigs. We recently proposed a novel mix-assembly strategy, combining individual and co-assembly approaches, and used this approach to assemble an extensive non-redundant gene set from 124 Baltic Sea metagenome samples. The Baltic Sea Gene Set (BAGS v1.1) comprises 66.53 million functionally and taxonomically annotated genes. To enable interactive exploration of this gene catalogue we have developed an RShiny application, BAGS-Shiny, that allows users to perform searches by sequence similarity (BLAST) and/or taxonomic and functional annotation. The gene catalogue and web application will serve as valuable tools for exploring microbial gene functions in brackish ecosystems. In addition, we here make available a pipeline to create gene catalogues based on the mix-assembly approach.

National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-346269 (URN)
Note

QC 20240514

Available from: 2024-05-10 Created: 2024-05-10 Last updated: 2024-05-15Bibliographically approved
3. Phylogeny-based comparative genomics of Vibrio vulnificus links genetic traits to pathogenicity
Open this publication in new window or tab >>Phylogeny-based comparative genomics of Vibrio vulnificus links genetic traits to pathogenicity
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Vibrio vulnificus is a natural part of the microbiome of brackish waters worldwide. It is also an opportunistic pathogen that can cause severe infections and septicemia via consumption of seafood or through wound infections. The species possess diverse virulence factors, yet its precise disease mechanism remains undefined. Comparative genomics between clinical and environmental isolates offers a means to identify key virulence genes, but the scarcity of environmental isolates for V. vulnificus has constituted a significant limitation. Here we sequenced genomes of 82 V. vulnificus isolates from water, sediment and seagrass surface from stations along the Baltic Sea coast and complemented these with 208 and 117 previously sequenced clinical and environmental genomes, respectively, in a comparative analysis. Phylogenetic reconstruction corroborated earlier analysis with four main lineages forming within the species. Strains from the Baltic Sea region were confined to certain phylogenetic lineages (L4 and sublineages L2c and L2e) whereas clinical and environmental strains were found in all lineages, indicting that the phylogenetic structure of V. vulnificus reflects adaptations to specific environmental conditions rather than pathogenicity. Employing orthologue enrichment analysis in a phylogenetic framework using the PhyloBOTL pipeline developed in this work revealed 58 significantly enriched orthologs in clinical compared to environmental isolates. These orthologs were grouped into 18 co-localisation clusters based on the corresponding genes’ proximity in the genomes. The co-localisation clusters entailed clusters with 1 genes previously linked with pathogenicity in V. vulnificus, such as genes for capsular polysaccharide (CPS) synthesis and biofilm formation, but also clusters with genes not previously associated with virulence in the species. Examples of the latter were genes for pilus biosynthesis of the usher-chaperone (CU) pathway, for spermidine synthesis, and for effector proteins of the Type VI secretion system. Finally we leveraged on the clinically enriched genes to design PCR primers for detection and surveillance of pathogenic V. vulnificus strains.

National Category
Microbiology
Identifiers
urn:nbn:se:kth:diva-346270 (URN)
Note

QC 20240514

Available from: 2024-05-10 Created: 2024-05-10 Last updated: 2024-05-15Bibliographically approved
4. Large-scale phylogenomics of aquatic bacteria reveal molecular mechanisms for adaptation to salinity
Open this publication in new window or tab >>Large-scale phylogenomics of aquatic bacteria reveal molecular mechanisms for adaptation to salinity
Show others...
2023 (English)In: Science Advances, E-ISSN 2375-2548, Vol. 9, no 21, article id eadg2059Article in journal (Refereed) Published
Abstract [en]

The crossing of environmental barriers poses major adaptive challenges. Rareness of freshwater-marine transi-tions separates the bacterial communities, but how these are related to brackish counterparts remains elusive, as do the molecular adaptations facilitating cross-biome transitions. We conducted large-scale phylogenomic analysis of freshwater, brackish, and marine quality-filtered metagenome-assembled genomes (11,248). Average nucleotide identity analyses showed that bacterial species rarely existed in multiple biomes. In contrast, distinct brackish basins cohosted numerous species, but their intraspecific population structures displayed clear signs of geographic separation. We further identified the most recent cross-biome transitions, which were rare, ancient, and most commonly directed toward the brackish biome. Transitions were accompanied by systematic changes in amino acid composition and isoelectric point distributions of inferred proteomes, which evolved over millions of years, as well as convergent gains or losses of specific gene functions. Therefore, adaptive chal-lenges entailing proteome reorganization and specific changes in gene content constrains the cross-biome tran-sitions, resulting in species-level separation between aquatic biomes.

Place, publisher, year, edition, pages
American Association for the Advancement of Science (AAAS), 2023
National Category
Medical Biotechnology (with a focus on Cell Biology (including Stem Cell Biology), Molecular Biology, Microbiology, Biochemistry or Biopharmacy)
Identifiers
urn:nbn:se:kth:diva-332191 (URN)10.1126/sciadv.adg2059 (DOI)001009447100015 ()37235649 (PubMedID)2-s2.0-85160380925 (Scopus ID)
Note

QC 20231122

Available from: 2023-07-21 Created: 2023-07-21 Last updated: 2024-05-16Bibliographically approved

Open Access in DiVA

Kappa(3491 kB)1386 downloads
File information
File name FULLTEXT01.pdfFile size 3491 kBChecksum SHA-512
8edb9c76121fe5ba126c75fc70416cc62233da445caa690453c3096ff84cb5a16b4c192f25c6c475a5a65d49603239183c3c48d99f6419f911f110abdaa86d7f
Type summaryMimetype application/pdf

Authority records

Delgado, Luis Fernando

Search in DiVA

By author/editor
Delgado, Luis Fernando
By organisation
Gene Technology
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1431 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf