kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 10) Show all publications
Christiansen, F., Konuk, E., Ganeshan, A. R., Welch, R., Palés Huix, J., Czekierdowski, A., . . . Epstein, E. (2025). International multicenter validation of AI-driven ultrasound detection of ovarian cancer. Nature Medicine, 31(1), 189-196
Open this publication in new window or tab >>International multicenter validation of AI-driven ultrasound detection of ovarian cancer
Show others...
2025 (English)In: Nature Medicine, ISSN 1078-8956, E-ISSN 1546-170X, Vol. 31, no 1, p. 189-196Article in journal (Refereed) Published
Abstract [en]

Ovarian lesions are common and often incidentally detected. A critical shortage of expert ultrasound examiners has raised concerns of unnecessary interventions and delayed cancer diagnoses. Deep learning has shown promising results in the detection of ovarian cancer in ultrasound images; however, external validation is lacking. In this international multicenter retrospective study, we developed and validated transformer-based neural network models using a comprehensive dataset of 17,119 ultrasound images from 3,652 patients across 20 centers in eight countries. Using a leave-one-center-out cross-validation scheme, for each center in turn, we trained a model using data from the remaining centers. The models demonstrated robust performance across centers, ultrasound systems, histological diagnoses and patient age groups, significantly outperforming both expert and non-expert examiners on all evaluated metrics, namely F1 score, sensitivity, specificity, accuracy, Cohen’s kappa, Matthew’s correlation coefficient, diagnostic odds ratio and Youden’s J statistic. Furthermore, in a retrospective triage simulation, artificial intelligence (AI)-driven diagnostic support reduced referrals to experts by 63% while significantly surpassing the diagnostic performance of the current practice. These results show that transformer-based models exhibit strong generalization and above human expert-level diagnostic accuracy, with the potential to alleviate the shortage of expert ultrasound examiners and improve patient outcomes.

Place, publisher, year, edition, pages
Springer Nature, 2025
National Category
Cancer and Oncology Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-371960 (URN)10.1038/s41591-024-03329-4 (DOI)001388159800001 ()39747679 (PubMedID)2-s2.0-85214010322 (Scopus ID)
Note

Not duplicate with diva 1905526

QC 20251022

Available from: 2025-10-22 Created: 2025-10-22 Last updated: 2025-10-22Bibliographically approved
Sorkhei, M., Matsoukas, C., Fredin Haslum, J., Konuk, E. & Smith, K. (2025). k-NN as a Simple and Effective Estimator of Transferability. Transactions on Machine Learning Research, 2025-October
Open this publication in new window or tab >>k-NN as a Simple and Effective Estimator of Transferability
Show others...
2025 (English)In: Transactions on Machine Learning Research, E-ISSN 2835-8856, Vol. 2025-OctoberArticle in journal (Refereed) Published
Abstract [en]

How well can one expect transfer learning to work in a new setting where the domain is shifted, the task is different, and the architecture changes? Many transfer learning metrics have been proposed to answer this question. But how accurate are their predictions in a realistic new setting? We conducted an extensive evaluation involving over 42,000 experiments comparing 23 transferability metrics across 16 different datasets to assess their ability to predict transfer performance for image classification tasks. Our findings reveal that none of the existing metrics perform well across the board. However, we find that a simple k-nearest neighbor evaluation – as is commonly used to evaluate feature quality for self-supervision – not only surpasses existing metrics, but also offers better computational efficiency and ease of implementation.

Place, publisher, year, edition, pages
Transactions on Machine Learning Research, 2025
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-372408 (URN)2-s2.0-105018634464 (Scopus ID)
Note

QC 20251106

Available from: 2025-11-06 Created: 2025-11-06 Last updated: 2025-11-06Bibliographically approved
Konuk, E., Welch, R., Christiansen, F., Epstein, E. & Smith, K. (2024). A framework for assessing joint human-AI systems based on uncertainty estimation. In: : . Paper presented at Miccai2024, 27Th International Conference On Medical Image Computing,  And Computer Assisted Intervention, Marrakesh, October 6-10, 2024.
Open this publication in new window or tab >>A framework for assessing joint human-AI systems based on uncertainty estimation
Show others...
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the role of uncertainty quantification in aiding medical decision-making. Existing evaluation metrics fail to capture the practical utility of joint human-AI decision-making systems. To address this, we introduce a novel framework to assess such systems and use it to benchmark a diverse set of confidence and uncertainty estimation methods. Our results show that certainty measures enable joint human-AI systems to outperform both standalone humans and AIs, and that for a given system there exists an optimal balance in the number of cases to refer to humans, beyond which the system’s performance degrades.

Keywords
Uncertainty, Confidence, Selective classification, Human-AI systems
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-354836 (URN)
Conference
Miccai2024, 27Th International Conference On Medical Image Computing,  And Computer Assisted Intervention, Marrakesh, October 6-10, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20241030

Available from: 2024-10-14 Created: 2024-10-14 Last updated: 2025-03-12Bibliographically approved
Konuk, E., Welch, R., Christiansen, F., Epstein, E. & Smith, K. (2024). A Framework for Assessing Joint Human-AI Systems Based on Uncertainty Estimation. In: Linguraru, MG Dou, Q Feragen, A Giannarou, S Glocker, B Lekadir, K Schnabel, JA (Ed.), MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X: . Paper presented at 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), OCT 06-10, 2024, Palmeraie Conf Ctr, Marrakesh, MOROCCO (pp. 3-12). Springer Nature, 15010
Open this publication in new window or tab >>A Framework for Assessing Joint Human-AI Systems Based on Uncertainty Estimation
Show others...
2024 (English)In: MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X / [ed] Linguraru, MG Dou, Q Feragen, A Giannarou, S Glocker, B Lekadir, K Schnabel, JA, Springer Nature , 2024, Vol. 15010, p. 3-12Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the role of uncertainty quantification in aiding medical decision-making. Existing evaluation metrics fail to capture the practical utility of joint human-AI decision-making systems. To address this, we introduce a novel framework to assess such systems and use it to benchmark a diverse set of confidence and uncertainty estimation methods. Our results show that certainty measures enable joint human-AI systems to outperform both standalone humans and AIs, and that for a given system there exists an optimal balance in the number of cases to refer to humans, beyond which the system's performance degrades.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
Uncertainty, Selective Classification, Ultrasound
National Category
Information Systems
Identifiers
urn:nbn:se:kth:diva-357579 (URN)10.1007/978-3-031-72117-5_1 (DOI)001342237100001 ()
Conference
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), OCT 06-10, 2024, Palmeraie Conf Ctr, Marrakesh, MOROCCO
Note

Part of ISBN 978-3-031-72116-8; 978-3-031-72117-5

QC 20241209

Available from: 2024-12-09 Created: 2024-12-09 Last updated: 2025-03-12Bibliographically approved
Scabini, L., Zielinski, K., Fares, R., Konuk, E., Miranda, G., Kolb, R., . . . Bruno, O. (2024). Deep Texture Feature Aggregation on Leaf Microscopy Images for Brazilian Plant Species Recognition. In: Proceedings of the 2024 9th International Conference on Machine Learning Technologies, ICMLT 2024: . Paper presented at 9th International Conference on Machine Learning Technologies, ICMLT 2024, Oslo, Norway, May 24-26, 2024 (pp. 209-213). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Deep Texture Feature Aggregation on Leaf Microscopy Images for Brazilian Plant Species Recognition
Show others...
2024 (English)In: Proceedings of the 2024 9th International Conference on Machine Learning Technologies, ICMLT 2024, Association for Computing Machinery (ACM) , 2024, p. 209-213Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we explore various computer vision techniques, with a focus on texture recognition approaches, for the task of plant species detection. We particularly emphasize the study of a challenging dataset consisting of 50 Brazilian plant species' leaf midrib cross-sections using microscope images. The research focuses on a recent method named Random Encoding of Aggregated Deep Activation Maps (RADAM) that leverages deep features from pre-trained Convolutional Neural Networks (CNNs) for improved plant species identification. This method demonstrates significant advancement over traditional texture analysis and deep learning approaches, showcasing the potential of combining deep feature engineering with texture analysis for accurate plant species recognition.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
Computer Vision, Deep Learning, Plant Sciences, Texture Analysis
National Category
Computer graphics and computer vision Computer Sciences Medical Imaging
Identifiers
urn:nbn:se:kth:diva-366879 (URN)10.1145/3674029.3674063 (DOI)001342512100034 ()2-s2.0-85204695300 (Scopus ID)
Conference
9th International Conference on Machine Learning Technologies, ICMLT 2024, Oslo, Norway, May 24-26, 2024
Note

Part of ISBN 9798400716379

QC 20250711

Available from: 2025-07-11 Created: 2025-07-11 Last updated: 2025-07-11Bibliographically approved
Konuk, E., Matsoukas, C., Sorkhei, M., Lertsiravarameth, P. & Smith, K. (2024). Learning from Offline Foundation Features with Tensor Augmentations. In: A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak and C. Zhang (Ed.), Advances in Neural Information Processing Systems 37 (NeurIPS 2024): . Paper presented at NeurIPS 2024, the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, December 10-15, 2024. Curran Associates
Open this publication in new window or tab >>Learning from Offline Foundation Features with Tensor Augmentations
Show others...
2024 (English)In: Advances in Neural Information Processing Systems 37 (NeurIPS 2024) / [ed] A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak and C. Zhang, Curran Associates , 2024Conference paper, Published paper (Refereed)
Abstract [en]

We introduce Learning from Offline Foundation Features with Tensor Augmentations (LOFF-TA), an efficient training scheme designed to harness the capabilities of foundation models in limited resource settings where their direct development is not feasible. LOFF-TA involves training a compact classifier on cached feature embeddings from a frozen foundation model, resulting in up to 37× faster training and up to 26× reduced GPU memory usage. Because the embeddings of augmented images would be too numerous to store, yet the augmentation process is essential for training, we propose to apply tensor augmentations to the cached embeddings of the original non-augmented images. LOFF-TA makes it possible to leverage the power of foundation models, regardless of their size, in settings with limited computational capacity. Moreover, LOFF-TA can be used to apply foundation models to high-resolution images without increasing compute. In certain scenarios, we find that training with LOFF-TA yields better results than directly fine-tuning the foundation model.

Place, publisher, year, edition, pages
Curran Associates, 2024
Keywords
Adaptation, Transfer learning, Foundation models, Augmentation
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-354832 (URN)2-s2.0-105000782383 (Scopus ID)
Conference
NeurIPS 2024, the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, December 10-15, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20250408

Available from: 2024-10-14 Created: 2024-10-14 Last updated: 2025-04-08Bibliographically approved
Konuk, E. (2024). Robust and generalizable AI for medical image processing. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Robust and generalizable AI for medical image processing
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Artificial intelligence (AI) offers significant potential to enhance the accuracy and efficiency of medical diagnosis, monitoring, and treatment. In ovarian cancer, where 70% of cases are detected only at stage III or IV, AI-driven tools could enable earlier detection and improve patient outcomes.However, the safety-critical nature of medicine— where even minor errors can have serious consequences—has led to cautious adoption of AI technologies.To integrate AI  into clinical practice, it must not only demonstrate good performance, but also robustness and generalizability across diverse clinical settings.

This thesis investigates the development and evaluation of generalizable and robust AI systems, with a focus on medical image analysis. We begin by addressing key gaps in our understanding of how complexity influences generalization, exploring scaling laws across increasingly complex tasks and analyzing how the performance of foundation models is impacted. Foundation models are becoming vital for AI development in medical imaging, particularly in addressing data scarcity challenges. Adapting these models for medical applications often demands substantial computational resources, particularly due to their large size. To mitigate these computational demands, we propose an efficient method for adapting the robust representations of large foundation models trained on diverse datasets to specific medical tasks, aiming to make foundation models more accessible for medical use without compromising their effectiveness.

Using ovarian cancer as a case study, we develop and rigorously evaluate AI systems for ovarian tumor classification.Our systems demonstrate superior performance compared to both non-expert and expert doctors, with a strong emphasis on ensuring accuracy, generalizability across hospitals, and robustness across diverse patient subgroups.We implement a comprehensive evaluation strategy that tests the AI systems in varied clinical settings, ensuring that they maintain high performance.

Finally, we explore the integration of AI systems into clinical workflows, with a focus on the development of joint human-AI systems. By designing AI systems that collaborate effectively with healthcare professionals, we aim to enhance diagnostic accuracy, reduce doctors' workloads, and optimize the use of healthcare resources. Our collaborative human-AI system is designed to be generalizable across different clinical settings to improve patient care and advance the broader adoption of AI in medical practice, paving the way for more efficient and effective healthcare solutions.

Abstract [sv]

Artificiell intelligens (AI) har stor potential att förbättra träffsäkerheten och effektiviteten inom medicinsk diagnostik, uppföljning och behandling. Inom äggstockscancer, där 70% av fallen upptäcks först i stadium III eller IV, kan AI-drivna verktyg möjliggöra tidigare upptäckt och förbättra patienternas utfall. Men den säkerhetskritiska karaktären av medicin—där även mindre fel kan få allvarliga konsekvenser—har lett till en försiktig användning av AI-teknologier. För att AI ska kunna integreras i klinisk praxis är god prestanda ej tillräckligt, utan den måste även uppvisa robusthet och generaliserbarhet till olika kliniska miljöer.

Denna avhandling undersöker utvecklingen och utvärderingen av generaliserbara och robusta AI-system, med fokus på medicinsk bildanalys. Vi börjar med att adressera viktiga luckor i vår förståelse av hur komplexitet påverkar generalisering, genom att utforska skalningslagar över allt mer komplexa uppgifter och analysera hur prestandan hos grundmodeller påverkas. Grundmodeller blir allt viktigare för AI-utveckling inom medicinsk bildbehandling, särskilt vad det gäller att hantera utmaningar med otillräckliga datamängder. Att anpassa dessa modeller för medicinska tillämpningar kräver ofta betydande beräkningsresurser, särskilt på grund av deras stora storlek. För att minska dessa krav föreslår vi en effektiv metod för att anpassa de robusta representationerna av stora grundmodeller som tränats på mångsidiga dataset till specifika medicinska uppgifter, med målet att göra grundmodeller mer tillgängliga för medicinsk användning utan att kompromissa med deras effektivitet.

Med äggstockscancer som ett fallstudie utvecklar och utvärderar vi AI-system för klassificering av äggstockstumörer. Våra system visar överlägsen prestanda jämfört med både icke-experter och expertläkare, och visar på god träffsäkerhet, generaliserbarhet över sjukhus och robusthet över olika patientgrupper. Vi implementerar en omfattande utvärderingsstrategi som testar AI-systemen i olika kliniska miljöer och säkerställer att de bibehåller hög prestanda.

Slutligen undersöker vi integrationen av AI-system i kliniska arbetsflöden, med fokus på utvecklingen av system där människa och AI samverkar. Genom att designa AI-system som samarbetar effektivt med vårdpersonal strävar vi efter att förbättra diagnostisk träffsäkerhet, minska läkarnas arbetsbelastning samt optimera användningen av vårdresurser. Vårt samverkanssystem mellan människa och AI är utformat för att vara generaliserbart över olika kliniska miljöer för att förbättra patientvården och främja en bredare användning av AI inom medicinsk praxis, vilket banar väg för mer effektiva och ändamålsenliga vårdlösningar.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. vii, 88
Series
TRITA-EECS-AVL ; 2024:81
Keywords
Medical imaging, Generalization, Robustness, Uncertainty, Medicinsk avbildning, Generalisering, Robusthet, Osäkerhet
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-354838 (URN)978-91-8106-078-2 (ISBN)
Public defence
2024-11-08, Kollegiesalen, Brinellvägen 6, 114 28, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20241017

Available from: 2024-10-17 Created: 2024-10-15 Last updated: 2024-10-21Bibliographically approved
Matsoukas, C., Bou Hernandez, A. I., Liu, Y., Dembrower, K., Miranda, G., Konuk, E., . . . Smith, K. (2020). Adding Seemingly Uninformative Labels Helps in Low Data Regimes. In: Proceedings of Machine Learning Research - International Conference on Machine Learning, ICML 2020: . Paper presented at 37th International Conference on Machine Learning, ICML 2020, Virtual, Online, NA, July 13-18, 2020 (pp. 6775-6784). ML Research Press
Open this publication in new window or tab >>Adding Seemingly Uninformative Labels Helps in Low Data Regimes
Show others...
2020 (English)In: Proceedings of Machine Learning Research - International Conference on Machine Learning, ICML 2020, ML Research Press , 2020, p. 6775-6784Conference paper, Published paper (Refereed)
Abstract [en]

Evidence suggests that networks trained on large datasets generalize well not solely because of the numerous training examples, but also class diversity which encourages learning of enriched features. This raises the question of whether this remains true when data is scarce – is there an advantage to learning with additional labels in low-data regimes? In this work, we consider a task that requires difficult-to-obtain expert annotations: tumor segmentation in mammography images. We show that, in low-data settings, performance can be improved by complementing the expert annotations with seemingly uninformative labels from non-expert annotators, turning the task into a multi-class problem. We reveal that these gains increase when less expert data is available, and uncover several interesting properties through further studies. We demonstrate our findings on CSAW-S, a new dataset that we introduce here, and confirm them on two public datasets.

Place, publisher, year, edition, pages
ML Research Press, 2020
Series
Proceedings of Machine Learning Research ; 119
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-373868 (URN)2-s2.0-105022421154 (Scopus ID)
Conference
37th International Conference on Machine Learning, ICML 2020, Virtual, Online, NA, July 13-18, 2020
Note

Not duplicate with diva 1599878

QC 20251211

Available from: 2025-12-11 Created: 2025-12-11 Last updated: 2025-12-11Bibliographically approved
Konuk, E. & Smith, K. (2019). An empirical study of the relation between network architecture and complexity. In: Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019: . Paper presented at 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019, 27-28 October 2019, Seoul, South Korea (pp. 4597-4599). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>An empirical study of the relation between network architecture and complexity
2019 (English)In: Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 4597-4599Conference paper, Published paper (Refereed)
Abstract [en]

In this preregistration submission, we propose an empirical study of how networks handle changes in complexity of the data. We investigate the effect of network capacity on generalization performance in the face of increasing data complexity. For this, we measure the generalization error for an image classification task where the number of classes steadily increases. We compare a number of modern architectures at different scales in this setting. The methodology, setup, and hypotheses described in this proposal were evaluated by peer review before experiments were conducted.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2019
Keywords
Complexity, Heneralization error, Image classification, Computer vision, Network architecture, Data complexity, Empirical studies, Generalization Error, Generalization performance, Modern architectures, Network Capacity, Number of class, Complex networks
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-274757 (URN)10.1109/ICCVW.2019.00563 (DOI)000554591604082 ()2-s2.0-85082452340 (Scopus ID)
Conference
17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019, 27-28 October 2019, Seoul, South Korea
Note

QC 20200625

Part of ISBN 9781728150239

Available from: 2020-06-25 Created: 2020-06-25 Last updated: 2024-10-15Bibliographically approved
Christiansen, F., Konuk, E., Raju, A., Welch, R., Huix, J. P., Czekierdowski, A., . . . Epstein, E. International multicenter validation of AI-driven ultrasound detection of ovarian cancer.
Open this publication in new window or tab >>International multicenter validation of AI-driven ultrasound detection of ovarian cancer
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Ovarian lesions are common and often incidentally detected. A critical shortage of expert ultrasound examiners has raised concerns of unnecessary interventions and delayed cancer diagnoses. Deep learning has shown promising results in the detection of ovarian cancer in ultrasound images; however, external validation is lacking. In this international multicenter retrospective study, we developed and validated transformer-based neural network models using a comprehensive dataset of 17,119 ultrasound images from 3,652 patients across 20 centers in eight countries. Using a leave-one-center-out cross-validation scheme, for each center in turn, we trained a model using data from the remaining centers. The models demonstrated robust performance across centers, ultrasound systems, histological diagnoses and patient age groups, significantly outperforming both expert and non-expert examiners on all evaluated metrics, namely F1 score, sensitivity, specificity, accuracy, Cohen’s kappa, Matthew’s correlation coefficient, diagnostic odds ratio and Youden’s J statistic. Furthermore, in a retrospective triage simulation, artificial intelligence (AI)-driven diagnostic support reduced referrals to experts by 63% while significantly surpassing the diagnostic performance of the current practice. These results show that transformer-based models exhibit strong generalization and above human expert-level diagnostic accuracy, with the potential to alleviate the shortage of expert ultrasound examiners and improve patient outcomes.

Keywords
Deep learning, Generalization, External validity, Ultrasound, Ovarian cancer
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-354833 (URN)
Note

QC 20241015

Accepted for publication

Available from: 2024-10-14 Created: 2024-10-14 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9437-4553

Search in DiVA

Show all publications