kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A lightweight deep learning architecture for text embedding: Comparison between the usage of Transformers and Mixers for textual embedding
KTH, School of Electrical Engineering and Computer Science (EECS).
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
En lättviktsarkitektur för djupinlärning för inbäddning av text : Jämförelse mellan användningen av Transformers och Mixers för textinbäddning (Swedish)
Abstract [en]

Text embedding is a widely used method for comparing pieces of text together by mapping them to a compact vector space. One such application is deduplication which consists in finding textual records that refer to the same underlying idea in order to merge them or delete one of them. The current state of the art in this domain uses the Transformer architecture trained on a large corpus of text. In this work, we evaluate the performance of a recently proposed architecture: the Mixer. It offers two key advantages, its parameter count scale linearly with the context window and it is built of simple MLP blocks that benefit from hardware acceleration. We found a 26% increase in performance when using the Mixer compared to the Transformer for a model of similar size.

Abstract [sv]

Textinbäddning är en allmänt använd metod för att jämföra textstycken med varandra genom att mappa dem till ett kompakt vektorutrymme. En sådan tillämpning är deduplicering som består i att hitta textposter som hänvisar till samma underliggande idé för att slå samman dem eller radera en av dem. Den nuvarande tekniken inom detta område använder Transformer-arkitekturen som tränats på en stor textkorpus. I detta arbete utvärderar vi prestandan hos en nyligen föreslagen arkitektur: Mixer. Den erbjuder två viktiga fördelar, dess parameterantal skalar linjärt med kontextfönstret och den är byggd av enkla MLP-block som drar nytta av hårdvaruacceleration. Vi fann en 26% ökning av prestanda när vi använde Mixer jämfört med Transformer för en modell av liknande storlek.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology , 2023. , p. 51
Series
TRITA-EECS-EX ; 2023:630
Keywords [en]
Deep Learning, Entity Retrieval, Mixer, Transformer
National Category
Computer Sciences Computer Engineering Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-337052OAI: oai:DiVA.org:kth-337052DiVA, id: diva2:1799726
External cooperation
ELCA Informatique
Supervisors
Examiners
Available from: 2023-09-28 Created: 2023-09-24 Last updated: 2023-09-28Bibliographically approved

Open Access in DiVA

fulltext(860 kB)408 downloads
File information
File name FULLTEXT01.pdfFile size 860 kBChecksum SHA-512
a23951cf578fe9468abfdd33ffda64cc21e2ea5a2c746b5f4dc8e15ed74a8befeeb0421d74d861913ff646118590ff8e734d4902983ef73ffdf9925a51143408
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer SciencesComputer EngineeringComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 408 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 419 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf