kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Comparative Study on the Importance of Image Resolution in Gesture Recognition
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
En jämförande studie av bildupplösningens vikt inom gestigenkänning (Swedish)
Abstract [en]

Sign language translation applications could provide a whole new avenue of communication. However, translating sign language comes with challenges such as deriving and handling information from images which can be a difficult task for computers. To make such a service versatile it should be able to run on a mobile phone which means limited processing power and space capacity. This thesis aims to research if lowering the image quality is a viable way to decrease the processing power and space capacity needed, while keeping as much accuracy in the object detection step as possible. A skeleton tracking model was used for hand detection, where both accuracy and processing time was measured over several resolutions. The accuracy was measured by the mean average precision detailed in the COCO Keypoint Detection challenge [1] and the overall recall. The study found that the overall recall and mean average precision decreased with lower resolutions. However, for the highest resolutions the decrease in accuracy was small compared to the decrease for lower resolutions. The processing time also had a general downward trend when lowering the resolution. This study concludes that the method of lowering the resolution can be used to gain time and memory without a significant drop in accuracy.

Abstract [sv]

Automatisk teckenspråkstolkning har potential att förändra hur vi kommunicerar, om den görs tillgänglig för mobiltelefoner. Uppgiften kommer dock med utmaningar, som svårigheterna att digitalt hantera och utvinna information ur bilder. En mobiltelefon har dessutom begränsat med utrymme och processorkraft vilket behöver tas i beaktning när den här typen av program utvecklas. Syftet med den här studien var att undersöka huruvida det är en bra metod att sänka bildkvaliteten och därigenom minska behovet av utrymme och processorkraft utan att drastiskt påverka identifieringsfrekvensen. En maskininlärningsmodell för skelettspårning användes för att identifiera handen i bilden och modellen kördes på flera uppsättningar bilder med olika upplösning. Modellen utvärderades genom att mäta identifieringsfrekvens (recall), träffsäkerhet i utplaceringen av skelettet (mean average precision) samt körtid. Studien visade att både identifieringsfrekvens och träffsäkerhet försämrades vid lägre upplösta bilder. För bilder med högre upplösning var dock försämringen mycket liten, medan det var en drastisk försämringen mellan de lägre upplösningarna. Även körtiden minskade generellt när upplösningen minskade. Studiens resultat visar att det går att vinna både tid och minne genom att reducera bildupplösningen utan att signifikant påverka träffsäkerheten.

Place, publisher, year, edition, pages
2022. , p. 22
Series
TRITA-EECS-EX ; 2022:489
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319911OAI: oai:DiVA.org:kth-319911DiVA, id: diva2:1702464
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2022-10-11 Created: 2022-10-11 Last updated: 2022-10-11Bibliographically approved

Open Access in DiVA

fulltext(823 kB)587 downloads
File information
File name FULLTEXT01.pdfFile size 823 kBChecksum SHA-512
2b7945fe82ed4e84e97437651e8a0cff8596fb22dc33a4b447e270ffc657487c9c1a4f99f6ea65ac7ed6beeebbd0f56df7ce2bc74e850cc14213d5009eccf456
Type fulltextMimetype application/pdf

By organisation
Computer Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 599 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 993 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf