his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Style Markers for Detecting Plagiarism in Natural Language Documents
University of Skövde, Department of Computer Science.
2003 (English)Independent thesis Advanced level (degree of Master (One Year))Student thesis
Abstract [en]

Most of the existing plagiarism detection systems compare a text to a database of other texts. These external approaches, however, are vulnerable because texts not contained in the database cannot be detected as source texts. This paper examines an internal plagiarism detection method that uses style markers from authorship attribution studies in order to find stylistic changes in a text. These changes might pinpoint plagiarized passages. Additionally, a new style marker called specific words is introduced. A pre-study tests if the style markers can fingerprint an author s style and if they are constant with sample size. It is shown that vocabulary richness measures do not fulfil these prerequisites. The other style markers - simple ratio measures, readability scores, frequency lists, and entropy measures - have these characteristics and are, together with the new specific words measure, used in a main study with an unsupervised approach for detecting stylistic changes in plagiarized texts at sentence and paragraph levels. It is shown that at these small levels the style markers generally cannot detect plagiarized sections because of intra-authorial stylistic variations (i.e. noise), and that at bigger levels the results are strongly a ected by the sliding window approach. The specific words measure, however, can pinpoint single sentences written by another author.

Place, publisher, year, edition, pages
Skövde: Institutionen för datavetenskap , 2003. , 111 p.
Keyword [en]
plagiarism detection, stylometry, authorship attribution
National Category
Computer Science
Identifiers
URN: urn:nbn:se:his:diva-824OAI: oai:DiVA.org:his-824DiVA: diva2:3236
Presentation
(English)
Uppsok
Technology
Supervisors
Available from: 2008-02-15 Created: 2008-02-15 Last updated: 2009-10-19

Open Access in DiVA

fulltext(17124 kB)1776 downloads
File information
File name FULLTEXT01.psFile size 17124 kBChecksum MD5
200a024d64a6e5b32c1d6506b22eb955af529eed036889bdf6c9e176f3bfcaa2d900101a
Type fulltextMimetype application/postscript
fulltext(1826 kB)383 downloads
File information
File name FULLTEXT02.pdfFile size 1826 kBChecksum SHA-512
a443442cd8d31e98071fc426a5061b74e9ae6bf6f97028a43a3f3835ec66de7f73de0c8b65f56fcb1cf6389d2a82ba4fe6334ed3f07125a6726be1721547dcc2
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 2159 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 354 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf