Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Experiments to Investigate the Utility of Linguistically Informed Features for Detecting Textual Plagiarism.
KTH, School of Computer Science and Communication (CSC).
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

We perform experiments that shows whether or not two linguistic features are good indicators to be used when automatically detecting plagiarism in digital texts.

Two experiments are performed. In the first experiment a linguistic feature based on a semantic word-space model is evaluated, and in the second experiment a linguistic feature based on stylometry is evaluated. Both experiments are evaluated by using a nearest neighbor metric since the features are multidimensional vectors.

We find that the fist feature is a good indicator for detecting plagiarism that is an exact copy of its source. We find that the second feature performs equally good independent of text obfuscation.

Abstract [sv]

Vi utför experiment som visar huruvida två lingvistiska särdrag är bra indikatorer att använda för att automatiskt upptäcka plagiat i digitala texter.

Två experiment utförs. I det första experimentet utvärderas ett lingvistiskt särdrag som baseras på en semantisk ord-rums modell och i det andra experimentet utvärderas ett lingvistiskt särdrag som baseras på stilometeri (eng. stylometry). Båda experimenten utvärderas med hjälp av ett närmaste granne (eng. nearest neighbor) mätvärde eftersom särdragen är flerdimensionella vektorer.

Vi finner att det första särdraget är en bra indikator för att upptäcka plagiat som är en exakt kopia av källan. Vi finner att det andra särdraget fungerar lika bra oberoende av text-förvirring (eng. obfuscation).

Place, publisher, year, edition, pages
2011.
Series
Trita-CSC-E, ISSN 1653-5715 ; 2011:109
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-130657OAI: oai:DiVA.org:kth-130657DiVA: diva2:654104
Educational program
Master of Science in Engineering - Computer Science and Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links

http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2011/rapporter11/almquist_per_11109.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 251 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf