kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep Learning for Automated Football Goal Detection: Comparing the performance of 2D and 3D Residual Networks
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesisAlternative title
Automatisk igenkänning av fotbollsmål med deep learning : En jämförelse av 2D och 3D Residual Networks (Swedish)
Abstract [en]

Advancements in deep-learning architectures have revolutionized computer vision tasks across various domains, including sports. One such task is the creation of highlight clips, which involves editing broadcast sports events to feature the most important or interesting moments. This thesis focuses on automating parts of the highlight clip creation process for football matches, a currently manual and labor-intensive task for Sveriges Television AB (SVT), the Swedish national public service broadcast company. Building on prior work in football action recognition using Convolutional Neural Networks (CNNs), this study explores the integration of temporal information through 3D convolutions. More specifically, it compares the performance of a traditional 2D ResNet-50 model with a 3D ResNet-50 model, which processes sequences of frames using inflated convolution and pooling kernels as opposed to the standard 2D operations. The experiments focused on classifying football goals, as they are among the most significant highlights of a game and present a substantial classification challenge. The results show that while the 2D ResNet-50 achieves better classification performance than its 3D counterpart at higher sample rates, its performance degrades as the sample rate decreases. The 2D model retains accuracy, precision, and recall better than the 3D model on more challenging data, where the differences between goal and non-goal examples are more subtle. However, the 3D model has a potential advantage of shorter inference time on longer video sequences. In conclusion, the 2D ResNet-50 performs surprisingly well in classifying goals without learning any temporal context, demonstrating its potential effectiveness for automatic highlight detection in a production environment. This finding suggests that a simpler 2D model can efficiently be used to automate parts of highlight clip creation, potentially reducing costs from manual labor for broadcasters like SVT.

Abstract [sv]

Framsteg inom djupinlärningsarkitekturer har möjliggjort avancerade datorseendeuppgifter inom olika domäner, inklusive sport. En sådan uppgift är skapandet av höjdpunktsklipp, vilket innebär redigering av sända sportevenemang för att lyfta fram de viktigaste eller mest intressanta ögonblicken. Detta examensarbetet fokuserar på att automatisera delar av processen för att skapa höjdpunktsklipp för fotbollsmatcher—en för närvarande manuell och arbetsintensiv uppgift för Sveriges Television AB (SVT), det svenska nationella public service bolaget. Med utgångspunkt i tidigare arbete inom aktivetetsigenkänning i fotboll med hjälp av konvolutionella neurala nätverk (CNNs), utforskar denna studie integrationen av temporal information genom 3D-konvolutioner. Specifikt jämförs en två-dimensionell ResNet-50-modell med en tre-dimensionell ResNet-50-modell, som bearbetar sekvenser av frames med modifierade, tre-dimensionella konvolutions- och pooling kernels. Experimenten fokuserade på att klassificera fotbollsmål på grund av deras betydelse och den utmaning de utgör i en klassificeringsuppgift. Resultaten visar att ResNet-50 uppnår bättre klassificeringsprestanda än sin 3D-motsvarighet vid högre samplingsfrekvenser, dock försämras dess prestanda när samplingsfrekvensen minskar. 2D-modellen gör fler korrekta klassifikationer än 3D-modellen på mer utmanande data, där skillnaderna mellan mål och icke-mål är mer subtila. Däremot har 3D-modellen en potentiell fördel med lägre inferenskomplexitet på längre videosekvenser. Sammanfattningsvis presterar 2D ResNet-50 förvånansvärt bra när det gäller att klassificera mål utan att lära sig det tidsmässigt sammanhanget, vilket visar på dess effektivitet som en metod för automatisk detektering av highlights. Resultaten antyder att den relativt enklare 2D-modell effektivt kan användas för att skapa automatisera skapande av highlights-klipp, vilket potentiellt kan minska kostnader för medieföretag som SVT.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology , 2024. , p. 55
Series
TRITA-EECS-EX ; 2024:702
Keywords [en]
Deep learning, 3D convolutions, football action recognition
Keywords [sv]
Djupinlärning, 3D faltning, aktivitetigenkänning fotboll
National Category
Computer Sciences Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-356223OAI: oai:DiVA.org:kth-356223DiVA, id: diva2:1912371
External cooperation
Sveriges Television AB
Supervisors
Examiners
Available from: 2025-01-22 Created: 2024-11-11 Last updated: 2025-01-22Bibliographically approved

Open Access in DiVA

fulltext(2216 kB)60 downloads
File information
File name FULLTEXT01.pdfFile size 2216 kBChecksum SHA-512
dc0e59e31114c22fd98c326f8645f4fea1ae3c9af7fd854b73f998dc4a1c6d3b259aa1f1487973d988a95cc2eaa04b6139f860eeb9681b2008e85f1b99bc33b3
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Karimi Valdani, Adrian
By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer SciencesComputer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 60 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 597 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf