Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Anti-analysis techniques to weaken author classification accuracy in compiled executables
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Programming languages such as C/C++ allow for great flexibility in how code can be written. This leads to pro- grammers developing their own “code style” that can be used to identify them among a group of other programmers, in a setting such as a programming competition. Recent research has shown that some of the identifying stylistic features present in source code survive the compilation pro- cess, and that authorship classification can be performed on the compiled executables alone. This was originally per- formed by Rosenblum et al. in their 2011 paper on the subject.

This thesis takes the approach of Rosenblum et al. and in- vestigates how the author classification process is a ected by changes in the compilation process of the training dataset, specifically di erent levels of optimisation (-O1 to -O3) and static linkage. We find that full optimisation yields a 10% drop in accuracy in datasets with 413 and 20 authors re- spectively. Static linkage results in a significant drop in accuracy in datasets with 20 and 10 authors, respectively. In both cases, the classifiers still perform significantly bet- ter than random chance and as such these methods cannot guarantee anonymity to the programmer. It is not clear how these results translate to other datasets, although there is reason to believe they would be reproducible using other classifiers found in the literature.

Place, publisher, year, edition, pages
2016.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-186516OAI: oai:DiVA.org:kth-186516DiVA: diva2:927549
Supervisors
Examiners
Available from: 2016-05-12 Created: 2016-05-12 Last updated: 2016-05-12Bibliographically approved

Open Access in DiVA

fulltext(1419 kB)104 downloads
File information
File name FULLTEXT01.pdfFile size 1419 kBChecksum SHA-512
c9a15b747e8010c2d1eca1b2bae25484a015683e108c4e8ad28399150ab7ab7440a0c413af503896520d2feacc978d50d3456596617099388948be0e0b092931
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 104 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 804 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf