kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 64) Show all publications
Chu, H. Y., Zhao, H. & Flierl, M. (2024). Adversarial Training with Maximal Coding Rate Reduction. In: Conference Record of the 58th Asilomar Conference on Signals, Systems and Computers, ACSSC 2024: . Paper presented at 58th Asilomar Conference on Signals, Systems and Computers, ACSSC 2024, Hybrid, Pacific Grove, United States of America, Oct 27 2024 - Oct 30 2024 (pp. 1866-1870). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Adversarial Training with Maximal Coding Rate Reduction
2024 (English)In: Conference Record of the 58th Asilomar Conference on Signals, Systems and Computers, ACSSC 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 1866-1870Conference paper, Published paper (Refereed)
Abstract [en]

Deep convolutional networks can solve various complex tasks in the field of image processing. However, adversarial attacks have been shown to have the ability of fooling deep learning models. Adversarial training is one commonly used strategy to improve the robustness of deep learning models against adversarial examples, which is performed by incorporating adversarial examples into the training process. Traditionally, during this process, cross-entropy loss is used as the loss function. In order to improve the robustness of deep learning models against adversarial examples, we propose in this paper two new methods of adversarial training by applying the principle of Maximal Coding Rate Reduction (MCR2). We evaluate the performance of different adversarial training methods by comparing the clean accuracy and adversarial accuracy. It is shown that adversarial training with the MCR2 loss function yields a more robust network than the traditional adversarial training method. In our experiments, adversarial accuracies are improved by up to 10%. The two loss functions are discussed by using a model.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
adversarial attack, adversarial example, adversarial training, deep neural networks, Machine learning, quadratic similarity queries on compressed data
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-362682 (URN)10.1109/IEEECONF60004.2024.10942802 (DOI)2-s2.0-105002685564 (Scopus ID)
Conference
58th Asilomar Conference on Signals, Systems and Computers, ACSSC 2024, Hybrid, Pacific Grove, United States of America, Oct 27 2024 - Oct 30 2024
Note

Part of ISBN 9798350354058QC 20250425

Available from: 2025-04-23 Created: 2025-04-23 Last updated: 2025-04-25Bibliographically approved
Zhang, M., Dong, L., Huang, Z. & Flierl, M. (2024). TSPNet: Temporal-Spatial Pyramid Network for Infrared Maritime Object Detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 18058-18069
Open this publication in new window or tab >>TSPNet: Temporal-Spatial Pyramid Network for Infrared Maritime Object Detection
2024 (English)In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, ISSN 1939-1404, E-ISSN 2151-1535, Vol. 17, p. 18058-18069Article in journal (Refereed) Published
Abstract [en]

Infrared object detection is one of the critical technologies for maritime search and rescue. However, it is still challenging due to the strong background clutter interference and the lack of small object information. We proposed a temporal-spatial pyramid network for infrared maritime object detection. We proposed a nested temporal pyramid to represent the temporal features through motion differences maps and energy accumulation maps to distinguish the wave clutter and objects. We proposed a dense spatial pyramid to learn the spatial features and the differences between temporal maps and then to clarify and locate objects. For training, we designed a scale-related composite loss function with correlated location description and weighted confidence loss. Finally, based on the ablation and comparison experiments, the proposed method performs better on maritime infrared sequences.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Composite loss function, infrared, object detection, spatial pyramid, temporal pyramid
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-355798 (URN)10.1109/JSTARS.2024.3452674 (DOI)001336265700002 ()2-s2.0-85203625676 (Scopus ID)
Note

QC 20241104

Available from: 2024-11-04 Created: 2024-11-04 Last updated: 2024-11-04Bibliographically approved
Shen, Q., Mahima, K., De Zoysa, K., Mottola, L., Voigt, T. & Flierl, M. (2023). CNN-Based Estimation of Water Depth from Multispectral Drone Imagery for Mosquito Control. In: 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings: . Paper presented at 30th IEEE International Conference on Image Processing, ICIP 2023, Kuala Lumpur, Malaysia, Oct 8 2023 - Oct 11 2023 (pp. 3250-3254). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>CNN-Based Estimation of Water Depth from Multispectral Drone Imagery for Mosquito Control
Show others...
2023 (English)In: 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 3250-3254Conference paper, Published paper (Refereed)
Abstract [en]

We present a machine learning approach that uses a custom Convolutional Neural Network (CNN) for estimating the depth of water pools from multispectral drone imagery. Using drones to obtain this information offers a cheaper, timely, and more accurate solution compared to alternative methods, such as manual inspection. This information, in turn, represents an asset to identify potential breeding sites of mosquito larvae, which grow only in shallow water pools. As a significant part of the world's population is affected by mosquito-borne viral infections, including Dengue and Zika, identifying mosquito breeding sites is key to control their spread. Experiments with 5-band drone imagery show that our CNN-based approach is able to measure shallow water depths accurately up to a root mean square error of less than 0.5 cm, outperforming state-of-the-art Random Forest methods and empirical approaches.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Bathymetry Retrieval, Convolutional Neural Networks, Drones, Multispectral Imagery
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-342093 (URN)10.1109/ICIP49359.2023.10222934 (DOI)001106821003061 ()2-s2.0-85180771937 (Scopus ID)
Conference
30th IEEE International Conference on Image Processing, ICIP 2023, Kuala Lumpur, Malaysia, Oct 8 2023 - Oct 11 2023
Note

QC 20240111

Part of ISBN 978-1-7281-9835-4

Available from: 2024-01-11 Created: 2024-01-11 Last updated: 2024-03-12Bibliographically approved
Mahima, K. T., Weerasekara, M., Zoysa, K. D., Keppitiyagama, C., Flierl, M., Mottola, L. & Voigt, T. (2023). MM4Drone: A Multi-spectral Image and mmWave Radar Approach for Identifying Mosquito Breeding Grounds via Aerial Drones. In: Pervasive Computing Technologies for Healthcare - 16th EAI International Conference, PervasiveHealth 2022, Proceedings: . Paper presented at 16th EAI International Conference on Pervasive Computing Technologies for Healthcare, PH 2022, Thessaloniki, Greece, Dec 12 2022 - Dec 14 2022 (pp. 412-426). Springer Nature
Open this publication in new window or tab >>MM4Drone: A Multi-spectral Image and mmWave Radar Approach for Identifying Mosquito Breeding Grounds via Aerial Drones
Show others...
2023 (English)In: Pervasive Computing Technologies for Healthcare - 16th EAI International Conference, PervasiveHealth 2022, Proceedings, Springer Nature , 2023, p. 412-426Conference paper, Published paper (Refereed)
Abstract [en]

Mosquitoes spread disases such as Dengue and Zika that affect a significant portion of the world population. One approach to hamper the spread of the disases is to identify the mosquitoes’ breeding places. Recent studies use drones to detect breeding sites, due to their low cost and flexibility. In this paper, we investigate the applicability of drone-based multi-spectral imagery and mmWave radios to discover breeding habitats. Our approach is based on the detection of water bodies. We introduce our Faster R-CNN-MSWD, an extended version of the Faster R-CNN object detection network, which can be used to identify water retention areas in both urban and rural settings using multi-spectral images. We also show promising results for estimating extreme shallow water depth using drone-based multi-spectral images. Further, we present an approach to detect water with mmWave radios from drones. Finally, we emphasize the importance of fusing the data of the two sensors and outline future research directions.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Aerial Drones, mmWave Radar, Multispectral Imagery, Object Detection
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-334526 (URN)10.1007/978-3-031-34586-9_27 (DOI)2-s2.0-85164166242 (Scopus ID)
Conference
16th EAI International Conference on Pervasive Computing Technologies for Healthcare, PH 2022, Thessaloniki, Greece, Dec 12 2022 - Dec 14 2022
Note

Part of ISBN 9783031345852

QC 20230823

Available from: 2023-08-23 Created: 2023-08-23 Last updated: 2023-08-23Bibliographically approved
Mahima, K. T., Weerasekara, M., De Zoysa, K., Keppitiyagama, C., Mottola, L., Voigt, T. & Flierl, M. (2022). Poster: Fighting Dengue Fever with Aerial Drones. In: International Conference on Embedded Wireless Systems and Networks: . Paper presented at International Conference on Embedded Wireless Systems and Networks, EWSN 2022, 3 October 2022 through 5 October 2022. Junction Publishing
Open this publication in new window or tab >>Poster: Fighting Dengue Fever with Aerial Drones
Show others...
2022 (English)In: International Conference on Embedded Wireless Systems and Networks, Junction Publishing , 2022Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Junction Publishing, 2022
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-328979 (URN)2-s2.0-85141862706 (Scopus ID)
Conference
International Conference on Embedded Wireless Systems and Networks, EWSN 2022, 3 October 2022 through 5 October 2022
Note

QC 20230614

Available from: 2023-06-14 Created: 2023-06-14 Last updated: 2023-06-14Bibliographically approved
Wu, H., Gattami, A. & Flierl, M. (2020). Conditional mutual information-based contrastive loss for financial time series forecasting. In: Proceedings ICAIF '20: The First ACM International Conference on AI in Finance: . Paper presented at ICAIF '20: The First ACM International Conference on AI in Finance, New York, NY, USA, October 15-16, 2020. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Conditional mutual information-based contrastive loss for financial time series forecasting
2020 (English)In: Proceedings ICAIF '20: The First ACM International Conference on AI in Finance, Association for Computing Machinery (ACM) , 2020Conference paper, Published paper (Refereed)
Abstract [en]

We present a representation learning framework for financial time series forecasting. One challenge of using deep learning models for finance forecasting is the shortage of available training data when using small datasets. Direct trend classification using deep neural networks trained on small datasets is susceptible to the overfitting problem. In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements. We consider a class-conditioned latent variable model. We train an encoder network to maximize the mutual information between the latent variables and the trend information conditioned on the encoded observed variables. We show that conditional mutual information maximization can be approximated by a contrastive loss. Then, the problem is transformed into a classification task of determining whether two encoded representations are sampled from the same class or not. This is equivalent to performing pairwise comparisons of the training datapoints, and thus, improves the generalization ability of the encoder network. We use deep autoregressive models as our encoder to capture long-term dependencies of the sequence data. Empirical experiments indicate that our proposed method has the potential to advance state-of-the-art performance.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Keywords
Classification (of information), Deep neural networks, Equivalence classes, Finance, Signal encoding, Time series, Compact representation, Conditional mutual information, Financial time series forecasting, Learn+, Learning frameworks, Learning models, Over fitting problem, Small data set, Time-series data, Training data, Forecasting
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-313547 (URN)10.1145/3383455.3422550 (DOI)2-s2.0-85095337230 (Scopus ID)
Conference
ICAIF '20: The First ACM International Conference on AI in Finance, New York, NY, USA, October 15-16, 2020
Note

Part of ISBN 9781450375849

QC 20220614

Available from: 2022-06-14 Created: 2022-06-14 Last updated: 2022-06-25Bibliographically approved
Wu, H. & Flierl, M. (2020). Vector Quantization-Based Regularization for Autoencoders. In: Thirty-fourth AAAI conference on artificial intelligence, the thirty-second innovative applications of artificial intelligence conference and the tenth AAAI symposium on educational advances in artificial intelligence: . Paper presented at 34th AAAI Conference on Artificial Intelligence / 32nd Innovative Applications of Artificial Intelligence Conference / 10th AAAI Symposium on Educational Advances in Artificial Intelligence, FEB 07-12, 2020, New York, NY (pp. 6380-6387). ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE
Open this publication in new window or tab >>Vector Quantization-Based Regularization for Autoencoders
2020 (English)In: Thirty-fourth AAAI conference on artificial intelligence, the thirty-second innovative applications of artificial intelligence conference and the tenth AAAI symposium on educational advances in artificial intelligence, ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE , 2020, p. 6380-6387Conference paper, Published paper (Refereed)
Abstract [en]

Autoencoders and their variations provide unsupervised models for learning low-dimensional representations for downstream tasks. Without proper regularization, autoencoder models are susceptible to the overfitting problem and the so-called posterior collapse phenomenon. In this paper, we introduce a quantization-based regularizer in the bottleneck stage of autoencoder models to learn meaningful latent representations. We combine both perspectives of Vector Quantized-Variational AutoEncoders (VQ-VAE) and classical denoising regularization methods of neural networks. We interpret quantizers as regularizers that constrain latent representations while fostering a similarity-preserving mapping at the encoder. Before quantization, we impose noise on the latent codes and use a Bayesian estimator to optimize the quantizer-based representation. The introduced bottleneck Bayesian estimator outputs the posterior mean of the centroids to the decoder, and thus, is performing soft quantization of the noisy latent codes. We show that our proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures.

Place, publisher, year, edition, pages
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, 2020
Series
AAAI Conference on Artificial Intelligence, ISSN 2159-5399 ; 34
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:kth:diva-299731 (URN)000667722806057 ()2-s2.0-85106408043 (Scopus ID)
Conference
34th AAAI Conference on Artificial Intelligence / 32nd Innovative Applications of Artificial Intelligence Conference / 10th AAAI Symposium on Educational Advances in Artificial Intelligence, FEB 07-12, 2020, New York, NY
Note

QC 20210816

Available from: 2021-08-16 Created: 2021-08-16 Last updated: 2023-04-05Bibliographically approved
Liu, D. & Flierl, M. (2019). Fractional-Pel Accurate Motion-Adaptive Transforms. IEEE Transactions on Image Processing, 28(6), 2731-2742, Article ID 8590746.
Open this publication in new window or tab >>Fractional-Pel Accurate Motion-Adaptive Transforms
2019 (English)In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 28, no 6, p. 2731-2742, article id 8590746Article in journal (Refereed) Published
Abstract [en]

Fractional-pel accurate motion is widely used in video coding. For subband coding, fractional-pel accuracy is challenging since it is difficult to handle the complex motion field with temporal transforms. In our previous work, we designed integer accurate motion-adaptive transforms (MAT) which can transform integer accurate motion-connected coefficients. In this paper, we extend the integer MAT to fractional-pel accuracy. The integer MAT allows only one reference coefficient to be the lowhand coefficient. In this paper, we design the transform such that it permits multiple references and generates multiple low-band coefficients. In addition, our fractional-pel MAT can incorporate a general interpolation filter into the basis vector, such that the highband coefficient produced by the transform is the same as the prediction error from the interpolation filter. The fractional-pel MAT is always orthonormal. Thus, the energy is preserved by the transform. We compare the proposed fractional-pel MAT, the integer MAT, and the half-pel motion-compensated orthogonal transform (MCOT), while HEVC intra coding is used to encode the temporal subbands. The experimental results show that the proposed fractional-pel MAT outperforms the integer MAT and the half-pel MCOT. The gain achieved by the proposed MAT over the integer MAT can reach up to 1 dB in PSNR.

Place, publisher, year, edition, pages
IEEE, 2019
Keywords
Fractional-pel accurate motion; motionadaptive transforms; orthonormal transforms for video coding
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-248684 (URN)10.1109/TIP.2018.2889917 (DOI)000462386000008 ()30596576 (PubMedID)2-s2.0-85059251036 (Scopus ID)
Funder
Swedish Research Council, 2011-5841
Note

QC 20190424

Available from: 2019-04-09 Created: 2019-04-09 Last updated: 2022-06-26Bibliographically approved
Wu, H. & Flierl, M. (2019). Learning product codebooks using vector-quantized autoencoders for image retrieval. In: GlobalSIP 2019 - 7th IEEE Global Conference on Signal and Information Processing, Proceedings: . Paper presented at 7th IEEE Global Conference on Signal and Information Processing, GlobalSIP 2019, 11 November 2019 through 14 November 2019. Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Learning product codebooks using vector-quantized autoencoders for image retrieval
2019 (English)In: GlobalSIP 2019 - 7th IEEE Global Conference on Signal and Information Processing, Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2019Conference paper, Published paper (Refereed)
Abstract [en]

Vector-Quantized Variational Autoencoders (VQ-VAE)[1] provide an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. In this paper, we study the use of VQ-VAE for representation learning of downstream tasks, such as image retrieval. First, we describe the VQ-VAE in the context of an information-theoretic framework. Then, we show that the regularization effect on the learned representation is determined by the size of the embedded codebook before the training. As a result, we introduce a hyperparameter to balance the strength of the vector quantizer and the reconstruction error. By tuning the hyperparameter, the embedded bottleneck quantizer is used as a regularizer that forces the output of the encoder to share a constrained coding space. With that, the learned latent features better preserve the similarity relations of the data space. Finally, we incorporate the product quantizer into the bottleneck stage of VQ-VAE and use it as an end-to-end unsupervised learning model for image retrieval tasks. The product quantizer has the advantage of generating large and structured codebooks. Fast retrieval can be achieved by using lookup tables that store the distance between any pair of sub-codewords. State-of-the-art retrieval results are achieved by the proposed codebooks. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2019
Keywords
Information theory, Learning systems, Table lookup, Vector quantization, Vectors, Constrained coding, Fast retrievals, Learning products, Reconstruction error, Similarity relations, State of the art, Structured codebooks, Vector quantizers, Image retrieval
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-274151 (URN)10.1109/GlobalSIP45357.2019.8969272 (DOI)000555454800086 ()2-s2.0-85079284181 (Scopus ID)
Conference
7th IEEE Global Conference on Signal and Information Processing, GlobalSIP 2019, 11 November 2019 through 14 November 2019
Note

QC 20200622

Part of ISBN 9781728127231

Available from: 2020-06-22 Created: 2020-06-22 Last updated: 2025-02-07Bibliographically approved
Wu, H. & Flierl, M. (2018). Component-based quadratic similarity identification for multivariate Gaussian sources. In: Data Compression Conference Proceedings: . Paper presented at 2018 Data Compression Conference, DCC 2018, 27 March 2018 through 30 March 2018. Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Component-based quadratic similarity identification for multivariate Gaussian sources
2018 (English)In: Data Compression Conference Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2018Conference paper, Published paper (Refereed)
Abstract [en]

This paper considers the problem of compression for similarity identification. Unlike classic compression problems, the focus is not on reconstructing the original data. Instead, compression is determined by the reliability of answering given queries. The problem is characterized by the identification rate of a source which is the minimum compression rate which allows reliable answers for a given similarity threshold. In this work, we investigate the component-based quadratic similarity identification for multivariate Gaussian sources. The decorrelated original data is processed by a distinct D- A dmissible system for each component. For a special case, we characterize the component-based identification rate for a correlated Gaussian source. Furthermore, we derived the optimal bit allocation for a given total rate constraint.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
Bit allocation, Similarity identification, Gaussian distribution, Component based, Compression rates, Gaussian sources, Identification rates, Optimal bit allocation, Rate constraints, Similarity threshold, Data compression
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-238074 (URN)10.1109/DCC.2018.00086 (DOI)000540644700079 ()2-s2.0-85050969530 (Scopus ID)9781538648834 (ISBN)
Conference
2018 Data Compression Conference, DCC 2018, 27 March 2018 through 30 March 2018
Note

Conference code: 138136; Export Date: 30 October 2018; Conference Paper; CODEN: DDCCF

QC 20180114

Available from: 2019-01-14 Created: 2019-01-14 Last updated: 2022-06-26Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7807-5681

Search in DiVA

Show all publications