kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep text classification of Instagram data using word embeddings and weak supervision
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-7786-9551
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-2339-2337
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-4722-0823
2020 (English)In: WEB INTELLIGENCE, ISSN 2405-6456, Vol. 18, no 1, p. 53-67Article in journal (Refereed) Published
Abstract [en]

With the advent of social media, our online feeds increasingly consist of short, informal, and unstructured text. Instagram is one of the largest social media platforms, containing both text and images. However, most of the prior research on text processing in social media is focused on analyzing Twitter data, and little attention has been paid to text mining of Instagram data. Moreover, many text mining methods rely on training data annotated manually by humans, which in practice is both difficult and expensive to obtain. In this paper, we present methods for weakly supervised text classification of Instagram text. We analyze a corpora of Instagram posts from the fashion domain and train a deep clothing classifier with weak supervision to classify Instagram posts based on the associated text. With our experiments, we demonstrate that in absence of annotated training data, using weak supervision to train models is a viable approach. With weak supervision we were able to label a large dataset in hours, something that would have taken months to do with human annotators. Using the dataset labeled with weak supervision in combination with generative modeling, an F-1 score of 0.61 is achieved on the task of classifying the image contents of Instagram posts based solely on the associated text, which is on level with human performance.

Place, publisher, year, edition, pages
IOS PRESS , 2020. Vol. 18, no 1, p. 53-67
Keywords [en]
Instagram, weak supervision, word embeddings, deep learning
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-271954DOI: 10.3233/WEB-200428ISI: 000521596300004Scopus ID: 2-s2.0-85082963198OAI: oai:DiVA.org:kth-271954DiVA, id: diva2:1423596
Note

QC 20200415

Available from: 2020-04-15 Created: 2020-04-15 Last updated: 2022-10-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Hammar, KimJaradat, ShathaDokoohaki, NimaMatskin, Mihhail

Search in DiVA

By author/editor
Hammar, KimJaradat, ShathaDokoohaki, NimaMatskin, Mihhail
By organisation
Software and Computer systems, SCS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 83 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf