Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Text based image search using deep learning
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This report investigates the possibility of using deep learning for performing text-to-image search. The investigation is made by creating a deep learning model which is trained and evaluated on three different data sets. One publicly available data set (Flickr-8K) and two data sets created from the private data of Avito.ru, the worlds second largest classified advertisement site. The model performs the search by transforming the text queries and the images into a joint embedding space. Through the transformation to the joint embedding space, the search can easily be made by retrieving the images which are closest tothe given search query in the joint embedding space. The architecture of the model is slightly changed for each of the data sets due to the different structure of the data and experimental purposes. The evaluation of the model training indicates that deep learning can successfully be used for text-to-image retrieval, but the quality of the retrieval is highly dependent on the training data.

Abstract [sv]

Rapporten utreder möjligheten att använda djupinlärning för att utföra text till-bildsökning. Utredningen görs genom att skapa en djupinlärningsmodell som tränas och evalueras på tre olika datamängder. En publik datamäng (Flickr-8K) och två datamängder som skapas av privat data från Avito.ru, världens näst största annonsförsäljningshemsida. Djupinlärningsmodellen gör sökningar genom att transformera bilder och söksträngar till ett gemensamt vektorrum. Det gemensamma vektorrummet gör det möjligt att enkelt göra sökningar genom att titta på vilka bilder som är närmast den givna söksträngen i vektorrummet. Modellens arkitektur förändras något för de olika datamängderna och experimentela syften. Resultaten från evalueringarna visar att djupinlärning har stor potential för text-till-bildsökning, men att modellens kvalitet är högst beroende på datan.

Place, publisher, year, edition, pages
2019. , p. 63
Series
TRITA-EECS-EX ; 2019:719
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-270736OAI: oai:DiVA.org:kth-270736DiVA, id: diva2:1414424
Subject / course
Computer Science
Educational program
Master of Science - Computer Science
Supervisors
Examiners
Available from: 2020-03-13 Created: 2020-03-13 Last updated: 2020-03-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 38 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf