kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Pena, Francisco J.
Publications (3 of 3) Show all publications
Pena, F. J., Hübinger, C., Payberah, A. H. & Jaramillo, F. (2024). DEEPAQUA: Semantic segmentation of wetland water surfaces with SAR imagery using deep neural networks without manually annotated data. International Journal of Applied Earth Observation and Geoinformation, 126, Article ID 103624.
Open this publication in new window or tab >>DEEPAQUA: Semantic segmentation of wetland water surfaces with SAR imagery using deep neural networks without manually annotated data
2024 (English)In: International Journal of Applied Earth Observation and Geoinformation, ISSN 1569-8432, E-ISSN 1872-826X, Vol. 126, article id 103624Article in journal (Refereed) Published
Abstract [en]

Deep learning and remote sensing techniques have significantly advanced water surface monitoring; however, the need for annotated data remains a challenge. This is particularly problematic in wetland detection, where water extent varies over time and space, demanding multiple annotations for the same area. In this paper, we present DEEPAQUA, a deep learning model inspired by knowledge distillation (a.k.a. teacher–student model) to generate labeled data automatically and eliminate the need for manual annotations during the training phase. We utilize the Normalized Difference Water Index (NDWI) as a teacher model to train a Convolutional Neural Network (CNN) for segmenting water from Synthetic Aperture Radar (SAR) images. To train the student model, we exploit cases where optical- and radar-based water masks coincide, enabling the detection of both open and vegetated water surfaces. DEEPAQUA represents a significant advancement in computer vision techniques for water detection by effectively training semantic segmentation models without any manually annotated data. Experimental results show that DEEPAQUA outperforms other unsupervised methods by improving accuracy by 3%, Intersection Over Union by 11%, and F1-score by 6%. This approach offers a practical solution for monitoring wetland water extent changes without the need of ground truth data, making it highly adaptable and scalable for wetland monitoring.

Place, publisher, year, edition, pages
Elsevier BV, 2024
Keywords
Automated data labeling, CNN, Deep learning, Remote sensing, Semantic segmentation, Vegetated water, Wetland mapping
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-341922 (URN)10.1016/j.jag.2023.103624 (DOI)001142090200001 ()2-s2.0-85180567035 (Scopus ID)
Note

QC 20240108

Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2025-02-07Bibliographically approved
Pena, F. J., Gonzalez Lopez, A. L., Pashami, S., Al-Shishtawy, A. & Payberah, A. H. (2022). SIAMBERT: Siamese Bert-based Code Search. In: 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022: . Paper presented at 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022, Stockholm, 13 June 2022, through 14 June 2022 (pp. 64-70). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>SIAMBERT: Siamese Bert-based Code Search
Show others...
2022 (English)In: 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 64-70Conference paper, Published paper (Refereed)
Abstract [en]

Code Search is a practical tool that helps developers navigate growing source code repositories by connecting natural language queries with code snippets. Platforms such as StackOverflow resolve coding questions and answers; however, they cannot perform a semantic search through the code. Moreover, poorly documented code adds more complexity to search for code snippets in repositories. To tackle this challenge, this paper presents SIAMBERT, a BERT-based model that gets the question in natural language and returns relevant code snippets. The SIAMBERT architecture consists of two stages, where the first stage, inspired by Siamese Neural Network, returns the top K relevant code snippets to the input questions, and the second stage ranks the given snippets by the first stage. The experiments show that SIAMBERT outperforms non-BERT-based models having improvements that range from 12% to 39% on the Recall@1 metric and improves the inference time performance, making it 15x faster than standard BERT models.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Information Systems
Identifiers
urn:nbn:se:kth:diva-319425 (URN)10.1109/SAIS55783.2022.9833051 (DOI)000855561800008 ()2-s2.0-85136132400 (Scopus ID)
Conference
34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022, Stockholm, 13 June 2022, through 14 June 2022
Note

QC 20220930

Part of proceedings: ISBN 978-166547126-8

Available from: 2022-09-30 Created: 2022-09-30 Last updated: 2022-09-30Bibliographically approved
Hägglund, M., Pena, F. J., Pashami, S., Al-Shishtawy, A. & Payberah, A. H. (2021). COCLUBERT: Clustering Machine Learning Source Code. In: Wani, MA Sethi, I Shi, W Qu, G Raicu, DS Jin, R (Ed.), 20th IEEE international conference on machine learning and applications (ICMLA 2021): . Paper presented at 20th IEEE International Conference on Machine Learning and Applications (ICMLA), DEC 13-16, 2021, ELECTR NETWORK (pp. 151-158). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>COCLUBERT: Clustering Machine Learning Source Code
Show others...
2021 (English)In: 20th IEEE international conference on machine learning and applications (ICMLA 2021) / [ed] Wani, MA Sethi, I Shi, W Qu, G Raicu, DS Jin, R, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 151-158Conference paper, Published paper (Refereed)
Abstract [en]

Nowadays, we can find machine learning (ML) applications in nearly every aspect of modern life, and we see that more developers are engaged in the field than ever. In order to facilitate the development of new ML applications, it would be beneficial to provide services that enable developers to share, access, and search for source code easily. A step towards making such a service is to cluster source code by functionality. In this work, we present COCLUBERT, a BERT-based model for source code embedding based on their functionality and clustering them accordingly. We build COCLUBERT using CuBERT, a variant of BERT pre-trained on source code, and present three ways to fine-tune it for the clustering task. In the experiments, we compare COCLUBERT with a baseline model, where we cluster source code using CuBERT embedding without fine-tuning. We show that COCLUBERT significantly outperforms the baseline model by increasing the Dunn Index metric by a factor of 141, the Silhouette Score metric by a factor of two, and the Adjusted Rand Index metric by a factor of 11.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Source Code Clustering, NLP, BERT, CuBERT
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-312970 (URN)10.1109/ICMLA52953.2021.00031 (DOI)000779208200023 ()2-s2.0-85125848071 (Scopus ID)
Conference
20th IEEE International Conference on Machine Learning and Applications (ICMLA), DEC 13-16, 2021, ELECTR NETWORK
Note

QC 20220530

Part of proceedings ISBN 978-1-6654-4337-1

Available from: 2022-05-30 Created: 2022-05-30 Last updated: 2022-06-25Bibliographically approved
Organisations

Search in DiVA

Show all publications