Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of protein sequence classification patterns
University of Skövde, Department of Computer Science.
2002 (English)Independent thesis Advanced level (degree of Master (One Year))Student thesis
Abstract [en]

Classification of protein sequence is one of the foundations of bioinformatics, as new proteins are sequenced every day. Each protein sequence represents a protein of a certain family and its function can sometimes be predicted through sequence classification. Today several approaches exist for sequence classification, and in this work pattern approaches are considered. A pattern is an expression, representing a certain protein family, which corresponding protein sequences hopefully match. PROSITE is a pattern collection that well known in the area of bioinformatics and therefore plays an important part in this project together with the MAMA pattern collection. Evaluation of patterns today focus on accuracy, i.e. sensitivity and specificity, but in this thesis information content is also considered. The intended experiment which was about discovering any relationship between accuracy and information content showed that no clear connection was found. This fact led to the conclusion that information content might not be suitable as an evaluation measure when evaluating patterns. The second experiment concerned the fact that sometimes the same sequences are used both during training and testing, which probably gives misleadingly high accuracy values. This fact gave birth to the idea that an independent test set other than the training set reduces accuracy values, which was revealed after a number of tests. Finally the last experiment, which was about creating a new system for evaluating whole pattern collections, is presented with results showing that MAMA performs better than PROSITE according to this system.

Place, publisher, year, edition, pages
Skövde: Institutionen för datavetenskap , 2002. , p. 76
Keywords [en]
performance, protein, sequence, classification, patterns
National Category
Information Systems
Identifiers
URN: urn:nbn:se:his:diva-738OAI: oai:DiVA.org:his-738DiVA, id: diva2:3141
Presentation
(English)
Uppsok
Social and Behavioural Science, Law
Supervisors
Available from: 2008-02-06 Created: 2008-02-06 Last updated: 2018-01-12

Open Access in DiVA

fulltext(5629 kB)259 downloads
File information
File name FULLTEXT01.psFile size 5629 kBChecksum SHA-1
bb5af84114347a69800a07407fc6114c5550b2a10902daa3398eec941256da5455d991af
Type fulltextMimetype application/postscript
fulltext(795 kB)311 downloads
File information
File name FULLTEXT02.pdfFile size 795 kBChecksum SHA-512
d5df0ad9132e19d510cc52a17c0e2d100d20f8e41fcff7f125d390d831201a12bbadf4880f6a523555a0e4c43ef8c6edc28b6fcb7f429ff9a0937e94e359367a
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 570 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 297 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf