kth.sePublikationer KTH
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Hu, Hao
Publikationer (2 of 2) Visa alla publikationer
Hu, H., Baldassarre, F. & Azizpour, H. (2023). Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III. Paper presented at 22nd Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022, Grenoble 19-23 September 2022 (pp. 409-426). Springer Nature
Öppna denna publikation i ny flik eller fönster >>Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers
2023 (Engelska)Ingår i: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III, Springer Nature , 2023, s. 409-426Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2023
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13715
Nyckelord
Computer vision, Transfer learning, Vision transformer
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:kth:diva-325535 (URN)10.1007/978-3-031-26409-2_25 (DOI)000999043300025 ()2-s2.0-85151048008 (Scopus ID)
Konferens
22nd Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022, Grenoble 19-23 September 2022
Anmärkning

QC 20230620

Tillgänglig från: 2023-04-27 Skapad: 2023-04-27 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Yao, J., Wang, D., Hu, H., Xing, W. & Wang, L. (2022). ADCNN: Towards learning adaptive dilation for convolutional neural networks. Pattern Recognition, 123, Article ID 108369.
Öppna denna publikation i ny flik eller fönster >>ADCNN: Towards learning adaptive dilation for convolutional neural networks
Visa övriga...
2022 (Engelska)Ingår i: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 123, artikel-id 108369Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Dilated convolution kernels are constrained by their shared dilation, keeping them from being aware of diverse spatial contents at different locations. We address such limitations by formulating the dilation as trainable weights with respect to individual positions. We propose Adaptive Dilation Convolutional Neural Networks (ADCNN), a light-weighted extension that allows convolutional kernels to adjust their dilation value based on different contents at the pixel level. Unlike previous content-adaptive models, ADCNN dynamically infers pixel-wise dilation via modeling feed-forward inter-patterns, which provides a new perspective for developing adaptive network structures other than sampling kernel spaces. Our evaluation results indicate ADCNNs can be easily integrated into various backbone networks and consistently outperform their regular counterparts on various visual tasks.

Ort, förlag, år, upplaga, sidor
Elsevier BV, 2022
Nyckelord
Adaptive dilated convolution, Representation learning, Image classification
Nationell ämneskategori
Datavetenskap (datalogi) Datorgrafik och datorseende Kommunikationssystem
Identifikatorer
urn:nbn:se:kth:diva-305118 (URN)10.1016/j.patcog.2021.108369 (DOI)000711834400003 ()2-s2.0-85117736740 (Scopus ID)
Anmärkning

QC 20211122

Tillgänglig från: 2021-11-22 Skapad: 2021-11-22 Senast uppdaterad: 2025-02-01Bibliografiskt granskad
Organisationer

Sök vidare i DiVA

Visa alla publikationer