kth.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep learning approaches for denoising, artifact correction, and radiology report generation in CT and chest X-ray imaging
KTH, School of Engineering Sciences (SCI), Physics, Particle Physics, Astrophysics and Medical Imaging.
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Medical imaging is a cornerstone of modern healthcare delivery, providing essential insights for effective diagnosis and treatment planning. Among the myriad imaging modalities, computed tomography (CT) and chest X-rays stand out for their widespread clinical use with approximately 400 million CT and 1.4 billion chest X-ray examinations are performed globally each year. Recent advancements in detector technology have given rise to photon-counting CT, which promises improved spatial and energy resolution along with enhanced low-dose imaging capabilities. However, elevated image noise and ring artifacts–stemming from higher spatial and energy resolution and inconsistencies in detector elements–pose significant hurdles, degrading image quality and complicating the diagnostic process. Beyond CT imaging, the volume of chest X-ray examinations continues to grow, placing increasing pressure on radiology departments that are already stretched thin. Moreover, advanced and innovate techniques in CT leads to a steady increase in the number of images that the radiologist are required to read, further exacerbating the workloads. To address these challenges, this thesis leverages generative artificial intelligence methods throughout the medical imaging value chain. For photon-counting CT imaging, this thesis address inverse problems using diffusion and Poisson flow models. Syn2Real synthesizes realistic ring artifacts to effciently generate training data for deep learning-based artifact correction. For image denoising, the thesis introduces methods that capitalize on the robustness of PFGM++ in supervised and unsupervised versions of posterior sampling Poisson flow generative models, and finally culminating in Poisson flow consistency models—a novel family of deep generative models that combines the robustness of PFGM++ with the effcient single-step sampling and the flexibility of consistency models. Moreover, this thesis works towards addressing the global shortage of radiologists, by improving medical vision-language models through CheXalign: a novel framework that leverages publicly available datasets, containing paired chest X-rays and radiology reports written in a clinical setting, and reference-based metrics to generate high quality preference data. This in turns enables the application of direct alignment algorithms that increase the probability of good reports, while decreasing the probability of bad ones, improving the overall results. Partial automation of chest X-ray radiology report generation—in which language models are used to draft initial reports—hold great promise for more effcient workflows, reducing burn-out, and allowing radiologists to allocate more time to more advanced imaging studies, such as photon-counting CT.

Abstract [sv]

Medicinsk avbildning är en hörnsten i den moderna sjukvården och ger avgörande insikter för e!ektiv diagnos och behandlingsplanering. Bland de många bildbehandlingsmetoderna utmärker sig datortomografi (CT) och lungröntgen för sin utbredda kliniska användning, där årligen cirka 400 miljoner CT-undersökningar och 1,4 miljarder lungröntgenundersökningar utförs globalt. Nya framsteg inom detektorteknik har lett till utvecklingen av fotonuppräknande CT, vilket lovar förbättrad rumslig och energiresolution samt förbättrade möjligheter för lågdosavbildning. Emellertid utgör förhöjt bildbrus och ringartifakter—till följd av högre rumslig och energiresolution samt inkonsekvenser i detektorelement—betydande hinder, vilket försämrar bildkvaliteten och komplicerar den diagnostiska processen. Utöver CT-avbildning fortsätter volymen av lungröntgenundersökningar att öka, vilket sätter ytterligare press på redan överbelastade radiologiavdelningar. Dessutom leder avancerade och innovativa tekniker inom CT till en stadig ökning av antalet bilder som radiologerna måste tolka, vilket ytterligare förvärrar arbetsbelastningen. För att möta dessa utmaningar utnyttjar denna avhandling generativa metoder inom artificiell intelligens genom hela värdekedjan för medicinsk avbildning. För fotonuppräknande CT-avbildning behandlar avhandlingen inversa problem med hjälp av diffusions- och Poisson-flödesmodeller. Syn2Real syntetiserar realistiska ringartifakter för att e!ektivt generera träningsdata för djupinlärnings-baserad artefaktkorrigering. För brusreducering i bilder introducerar avhandlingen metoder som utnyttjar robustheten hos PFGM++ i både övervakade och icke-övervakade versioner av posterior sampling Poisson-flödes generativa modeller, vilket kulminerar i Poisson-flödes konsistensmodeller—en ny familj av djupa generativa modeller som kombinerar robustheten hos PFGM++ med effektiv enkelsagsprovtagning och flexibiliteten hos konsistensmodeller. Dessutom arbetar denna avhandling för att tackla den globala bristen på radiologer genom att förbättra medicinska vision-språkmodeller med hjälp av CheXalign: ett nytt ramverk som utnyttjar o!entligt tillgängliga dataset, innehållande parade lungröntgenbilder och radiologiska rapporter skrivna i en klinisk miljö, samt referensbaserade mått för att generera högkvalitativ preferensdata. Detta möjliggör i sin tur tillämpningen av direkta justeringsalgoritmer som ökar sannolikheten för goda rapporter samtidigt som sannolikheten för dåliga minskar, vilket förbättrar de övergripande resultaten. Delvis automatisering av genereringen av lungröntgenrapporter—där språkmodeller används för att utarbeta initiala rapporter—lovar stora möjligheter till e!ektivare arbetsflöden, minskad utbrändhet och att radiologerna kan avsätta mer tid för mer avancerade avbildningsstudier, såsom fotonuppräknande CT.

Place, publisher, year, edition, pages
Universitetsservice US-AB, Sweden 2025 , 2025.
Series
TRITA-SCI-FOU ; 2025:29
Keywords [en]
CT, photon-counting CT, chest X-rays, diffusion models, PFGM++, large language models, vision-language models, post-training, reinforcement learning from human feedback, direct alignment algorithms
Keywords [sv]
CT, fotonräknande CT, lugnröntgen, diffusionsmodeller, PFGM++, stora språkmodeller, vision-språkmodeller, efterträning, förstärkningsinlärning från mänsklig feedback, direktjusteringsalgoritmer
National Category
Radiology and Medical Imaging Other Physics Topics
Research subject
Physics, Biological and Biomedical Physics
Identifiers
URN: urn:nbn:se:kth:diva-363233ISBN: 978-91-8106-316-5 (print)OAI: oai:DiVA.org:kth-363233DiVA, id: diva2:1957256
Public defence
2025-06-05, FD5, Roslagstullsbacken 21, Stockholm, 09:15 (English)
Opponent
Supervisors
Note

QC 2025-05-09

Available from: 2025-05-09 Created: 2025-05-08 Last updated: 2025-05-09Bibliographically approved
List of papers
1. Syn2Real: synthesis of CT image ring artifacts for deep learning-based correction
Open this publication in new window or tab >>Syn2Real: synthesis of CT image ring artifacts for deep learning-based correction
Show others...
2025 (English)In: Physics in Medicine and Biology, ISSN 0031-9155, E-ISSN 1361-6560, Vol. 70, no 4, article id 04NT01Article in journal (Refereed) Published
Abstract [en]

Objective. We strive to overcome the challenges posed by ring artifacts in x-ray computed tomography (CT) by developing a novel approach for generating training data for deep learning-based methods. Training such networks require large, high quality, datasets that are often generated in the data domain, time-consuming and expensive. Our objective is to develop a technique for synthesizing realistic ring artifacts directly in the image domain, enabling scalable production of training data without relying on specific imaging system physics. Approach. We develop 'Syn2Real,' a computationally efficient pipeline that generates realistic ring artifacts directly in the image domain. To demonstrate the effectiveness of our approach, we train two versions of UNet, vanilla and a high capacity version with self-attention layers that we call UNetpp, with & ell;2 and perceptual losses, as well as a diffusion model, on energy-integrating CT images with and without these synthetic ring artifacts. Main Results. Despite being trained on conventional single-energy CT images, our models effectively correct ring artifacts across various monoenergetic images, at different energy levels and slice thicknesses, from a prototype photon-counting CT system. This generalizability validates the realism and versatility of our ring artifact generation process. Significance. Ring artifacts in x-ray CT pose a unique challenge to image quality and clinical utility. By focusing on data generation, our work provides a foundation for developing more robust and adaptable ring artifact correction methods for pre-clinical, clinical and other CT applications.

Place, publisher, year, edition, pages
IOP Publishing, 2025
Keywords
deep learning, CT, photon-counting CT, ring artifacts, data synthesis, UNet
National Category
Radiology and Medical Imaging Medical Imaging Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-360399 (URN)10.1088/1361-6560/adad2c (DOI)001415391700001 ()39842097 (PubMedID)2-s2.0-85218222563 (Scopus ID)
Note

QC 20250226

Available from: 2025-02-26 Created: 2025-02-26 Last updated: 2025-05-08Bibliographically approved
2. PPFM: Image Denoising in Photon-Counting CT Using Single-Step Posterior Sampling Poisson Flow Generative Models
Open this publication in new window or tab >>PPFM: Image Denoising in Photon-Counting CT Using Single-Step Posterior Sampling Poisson Flow Generative Models
Show others...
2024 (English)In: IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES, ISSN 2469-7311, Vol. 8, no 7, p. 788-799Article in journal (Refereed) Published
Abstract [en]

Diffusion and Poisson flow models have shown impressive performance in a wide range of generative tasks, including low-dose CT (LDCT) image denoising. However, one limitation in general, and for clinical applications in particular, is slow sampling. Due to their iterative nature, the number of function evaluations (NFEs) required is usually on the order of $10-10<^>{3}$ , both for conditional and unconditional generation. In this article, we present posterior sampling Poisson flow generative models (PPFMs), a novel image denoising technique for low-dose and photon-counting CT that produces excellent image quality whilst keeping NFE = 1. Updating the training and sampling processes of Poisson flow generative models (PFGMs)++, we learn a conditional generator which defines a trajectory between the prior noise distribution and the posterior distribution of interest. We additionally hijack and regularize the sampling process to achieve NFE = 1. Our results shed light on the benefits of the PFGM++ framework compared to diffusion models. In addition, PPFM is shown to perform favorably compared to current state-of-the-art diffusion-style models with NFE = 1, consistency models, as well as popular deep learning and nondeep learning-based image denoising techniques, on clinical LDCT images and clinical images from a prototype photon-counting CT system.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Computed tomography, Noise, Photonics, Image denoising, Training, Image reconstruction, Computational modeling, Deep learning, denoising, diffusion models, photon-counting computed tomography (PCCT), Poisson flow generative models (PFGMs)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-354350 (URN)10.1109/TRPMS.2024.3410092 (DOI)001309978300005 ()2-s2.0-85196090996 (Scopus ID)
Note

QC 20241003

Available from: 2024-10-03 Created: 2024-10-03 Last updated: 2025-05-08Bibliographically approved
3. Noise suppression in photon-counting computed tomography using unsupervised Poisson flow generative models
Open this publication in new window or tab >>Noise suppression in photon-counting computed tomography using unsupervised Poisson flow generative models
Show others...
2024 (English)In: Visual Computing for Industry, Biomedicine, and Art, ISSN 2096-496X, Vol. 7, no 1, article id 24Article in journal (Refereed) Published
Abstract [en]

Deep learning (DL) has proven to be important for computed tomography (CT) image denoising. However, such models are usually trained under supervision, requiring paired data that may be difficult to obtain in practice. Diffusion models offer unsupervised means of solving a wide range of inverse problems via posterior sampling. In particular, using the estimated unconditional score function of the prior distribution, obtained via unsupervised learning, one can sample from the desired posterior via hijacking and regularization. However, due to the iterative solvers used, the number of function evaluations (NFE) required may be orders of magnitudes larger than for single-step samplers. In this paper, we present a novel image denoising technique for photon-counting CT by extending the unsupervised approach to inverse problem solving to the case of Poisson flow generative models (PFGM)++. By hijacking and regularizing the sampling process we obtain a single-step sampler, that is NFE = 1. Our proposed method incorporates posterior sampling using diffusion models as a special case. We demonstrate that the added robustness afforded by the PFGM++ framework yields significant performance gains. Our results indicate competitive performance compared to popular supervised, including state-of-the-art diffusion-style models with NFE = 1 (consistency models), unsupervised, and non-DL-based image denoising techniques, on clinical low-dose CT data and clinical images from a prototype photon-counting CT system developed by GE HealthCare.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Deep learning, Denoising, Diffusion models, Photon-counting CT, Poisson flow generative models
National Category
Computer graphics and computer vision Other Physics Topics
Identifiers
urn:nbn:se:kth:diva-354278 (URN)10.1186/s42492-024-00175-6 (DOI)001319529000001 ()2-s2.0-85204916575 (Scopus ID)
Note

QC 20241008

Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2025-05-08Bibliographically approved
4. PFCM: Poisson flow consistency models for low-dose CT image denoising
Open this publication in new window or tab >>PFCM: Poisson flow consistency models for low-dose CT image denoising
2025 (English)In: IEEE Transactions on Medical Imaging, ISSN 0278-0062, E-ISSN 1558-254X, p. 1-1Article in journal (Refereed) Published
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Other Physics Topics
Identifiers
urn:nbn:se:kth:diva-363187 (URN)10.1109/tmi.2025.3558019 (DOI)40215159 (PubMedID)2-s2.0-105002738267 (Scopus ID)
Note

QC 20250507

Available from: 2025-05-07 Created: 2025-05-07 Last updated: 2025-05-08Bibliographically approved
5. CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
Open this publication in new window or tab >>CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
Show others...
2025 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Radiologists play a crucial role in translating medical images into actionable reports. However, the field faces staffing shortages and increasing workloads. While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy. Most current VLMs in radiology rely solely on supervised fine-tuning. Meanwhile, additional preference fine-tuning in the post-training pipeline has become standard practice in the general domain. The challenge in radiology lies in the prohibitive cost of obtaining radiologist feedback at scale. To address this challenge, we propose an automated pipeline for preference feedback, focusing on chest X-ray radiology report generation (RRG). Specifically, our method leverages publicly available datasets containing pairs of images and radiologist-written reference reports with reference-based metrics, or Judges, eliminating the need for additional radiologist feedback. We investigate reward overoptimization via length exploitation in this setting and introduce a length-controlled version of the GREEN score. Our best-performing setup achieves state-of-the-art CheXbert scores on the MIMIC-CXR dataset for the RRG task while on average maintaining robust performance across six additional image perception and reasoning tasks. 

National Category
Other Physics Topics
Identifiers
urn:nbn:se:kth:diva-363190 (URN)10.48550/arXiv.2410.07025 (DOI)
Note

QC 20250507

Available from: 2025-05-07 Created: 2025-05-07 Last updated: 2025-05-09Bibliographically approved

Open Access in DiVA

fulltext(27454 kB)70 downloads
File information
File name FULLTEXT01.pdfFile size 27454 kBChecksum SHA-512
2f1467de19452442dae438549fd88800c7c660997b33a79c69c86220cb561cd23bfeeea8de52c7f196eabf0a1940346461f2d6b37b52104ce34241d0e17edf24
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hein, Dennis
By organisation
Particle Physics, Astrophysics and Medical Imaging
Radiology and Medical ImagingOther Physics Topics

Search outside of DiVA

GoogleGoogle Scholar
Total: 70 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2196 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf