kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Guiding generation of 2D pixel art characters using text-image similarity models: A comparative study of generating 2D pixel art characters using PixelDraw and Diffusion Model guided by text-image similarity models
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Guidad bildgeneration med använding av text-bild-likhetsmodeller för generation av 2D-pixel art karaktärer : En komparativ studie mellan bildgenerering av 2D-pixel art karaktärer med använding av PixelDraw och Diffusion model guidad av text-bild-likhetsmodeller (Swedish)
Abstract [en]

Image generation has been taking large strides and new models showing great potential have been created. One of the continued struggles with image generation is controlling what the output will be, with no real way of guiding the generation into creating what the user wants. This has now been improved with the creation of text-image similarity models, which can be used together with an image generation model to guide the generation. This thesis will examine this new method of using a text-image similarity model and see how well it can generate pixel art of humanoid characters. The thesis compares the popular model Diffusion with a simple image generation method that relies solely on the text-image similarity models guidance. The results show that combining a diffusion model with a text-image similarity model improves the results over only using the text-image similarity model in almost every regard. Using a text-image similarity model allows the user to guide the generation, although sometimes the model will misinterpret the request.

Abstract [sv]

Bildgeneration har tagit stora steg och nya modeller har tagits fram som visar stor potential. En av de forsatta svårigheterna med bildgeneration är att kontrollera vad modellen genererar. De nya text-bild-likhet modellerna förenklar nu för användare att tillsammans med en bildgenerator modell använda text-bild-likhet modellen att styra bildgeneratorn. Den här uppsatsen kommer utforska den nya metoden och se hur väl den kan användas för att generera mänskliga pixel art karaktärer. I uppsatsen kommer den populära Diffusion modellen jämföras med en enkel ritmetod som styrs av text-bild likhet modeller. Resultatet visar att kombinationen av en Diffusion modell och text-bild likhets modell ökar prestandan på nästan alla sätt i jämförelse med att låta text-bild-likhets modellen styra bildgeneratorn helt och hållet. Det visar sig att text-bild likhet modellen kan användas för att styra generationen men ibland så missförstår modellen vad som önskas.

Place, publisher, year, edition, pages
2024. , p. 44
Series
TRITA-EECS-EX ; 2024:25
Keywords [en]
CLIP, Machine learning, Sprite generation, image generation
Keywords [sv]
CLIP, Maskininlärning, 2D-karaktärsprites generation, bildgeneration
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-345841OAI: oai:DiVA.org:kth-345841DiVA, id: diva2:1853314
External cooperation
NextML
Subject / course
Computer Science
Educational program
Master of Science - Computer Science
Supervisors
Examiners
Available from: 2024-05-07 Created: 2024-04-22 Last updated: 2024-05-07Bibliographically approved

Open Access in DiVA

fulltext(1562 kB)387 downloads
File information
File name FULLTEXT02.pdfFile size 1562 kBChecksum SHA-512
f32ae62d3037c475208fd0d0dbbb2babfe4f90b4dde383348c829789a64bdead993665a002d64bb073ade494f93c66fe0cc42055be2af3c34b95e75608cf2b53
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 387 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 857 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf