Högskolan i Skövde

his.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using Style Markers for Detecting Plagiarism in Natural Language Documents
Högskolan i Skövde, Institutionen för datavetenskap.
2003 (engelsk)Independent thesis Advanced level (degree of Master (One Year))Oppgave
Abstract [en]

Most of the existing plagiarism detection systems compare a text to a database of other texts. These external approaches, however, are vulnerable because texts not contained in the database cannot be detected as source texts. This paper examines an internal plagiarism detection method that uses style markers from authorship attribution studies in order to find stylistic changes in a text. These changes might pinpoint plagiarized passages. Additionally, a new style marker called specific words is introduced. A pre-study tests if the style markers can fingerprint an author s style and if they are constant with sample size. It is shown that vocabulary richness measures do not fulfil these prerequisites. The other style markers - simple ratio measures, readability scores, frequency lists, and entropy measures - have these characteristics and are, together with the new specific words measure, used in a main study with an unsupervised approach for detecting stylistic changes in plagiarized texts at sentence and paragraph levels. It is shown that at these small levels the style markers generally cannot detect plagiarized sections because of intra-authorial stylistic variations (i.e. noise), and that at bigger levels the results are strongly a ected by the sliding window approach. The specific words measure, however, can pinpoint single sentences written by another author.

sted, utgiver, år, opplag, sider
Skövde: Institutionen för datavetenskap , 2003. , s. 111
Emneord [en]
plagiarism detection, stylometry, authorship attribution
HSV kategori
Identifikatorer
URN: urn:nbn:se:his:diva-824OAI: oai:DiVA.org:his-824DiVA, id: diva2:3236
Presentation
(engelsk)
Uppsök
Technology
Veileder
Tilgjengelig fra: 2008-02-15 Laget: 2008-02-15 Sist oppdatert: 2018-01-12

Open Access i DiVA

fulltekst(17124 kB)2350 nedlastinger
Filinformasjon
Fil FULLTEXT01.psFilstørrelse 17124 kBChecksum SHA-1
036889bdf6c9e176f3bfcaa2d900101a200a024d64a6e5b32c1d6506b22eb955af529eed
Type fulltextMimetype application/postscript
fulltekst(1826 kB)733 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 1826 kBChecksum SHA-512
a443442cd8d31e98071fc426a5061b74e9ae6bf6f97028a43a3f3835ec66de7f73de0c8b65f56fcb1cf6389d2a82ba4fe6334ed3f07125a6726be1721547dcc2
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 3083 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 1171 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf