his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Finding remote protein homologs with hidden Markov models
University of Skövde, Department of Computer Science.
1997 (English)Independent thesis Advanced level (degree of Master (One Year))Student thesis
Abstract [en]

Detecting remote homologs by sequence similarity gets increasingly difficult as the percentage of identical residues decreases. The aim of this work was to investigate if the performance of hidden Markov models could be improved by ignoring the subsequences that exhibit high variability, and only concentrate on the truly conserved regions. This is based on the underlying assumption that these high variability regions could be unnecessary, or even misleading, during search of remote protein homologs.

In this paper we challenge this assumption by identifying the high and low variability regions of multiple alignments and modifying models by focusing them on the conserved regions. The high variability regions are located with information theoretic measures and modeled by free insertion modules, which are special nodes that can be used to model arbitrarily long subsequences with a uniform probability distribution.

The results do not support a definitive conclusion since a few cases exhibit a performance increase, while the general trend is that the performance decreases when ignoring high variability regions. Two supplementary tests suggest that when there is a significant performance loss due to deletion of high variability nodes, a much smaller decrease occurs when the nodes are preserved but the position-specific amino acid distributions are removed. Taken together, these results support the hypothesis that there is some valuable information present in the high variability regions that enable the model to better discriminate between true and false homologs; and that other constructs for the high variability regions could perform better.

Place, publisher, year, edition, pages
Skövde: Institutionen för datavetenskap , 1997. , 81 p.
Identifiers
URN: urn:nbn:se:his:diva-293OAI: oai:DiVA.org:his-293DiVA: diva2:2654
Presentation
(English)
Uppsok

Supervisors
Available from: 2007-11-26 Created: 2007-11-26 Last updated: 2009-11-16

Open Access in DiVA

fulltext(682 kB)102 downloads
File information
File name FULLTEXT01.psFile size 682 kBChecksum MD5
b41e267afed8aad85be64462087f7b475a8b0437bc78036487cd75a31e66ece831cadcf5
Type fulltextMimetype application/postscript
fulltext(439 kB)84 downloads
File information
File name FULLTEXT02.pdfFile size 439 kBChecksum SHA-512
c7daeb5a54daf908d71ecba4602d3763e37b2f41e7930efa6f3ffb4c03cac1d89bf57a49716eff72d66322527eca1fe772895844eb94ba5e34464e3a333b62a1
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 186 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 76 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf