Högskolan i Skövde

his.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Finding remote protein homologs with hidden Markov models
Högskolan i Skövde, Institutionen för datavetenskap.
1997 (Engelska)Självständigt arbete på avancerad nivå (magisterexamen)Studentuppsats
Abstract [en]

Detecting remote homologs by sequence similarity gets increasingly difficult as the percentage of identical residues decreases. The aim of this work was to investigate if the performance of hidden Markov models could be improved by ignoring the subsequences that exhibit high variability, and only concentrate on the truly conserved regions. This is based on the underlying assumption that these high variability regions could be unnecessary, or even misleading, during search of remote protein homologs.

In this paper we challenge this assumption by identifying the high and low variability regions of multiple alignments and modifying models by focusing them on the conserved regions. The high variability regions are located with information theoretic measures and modeled by free insertion modules, which are special nodes that can be used to model arbitrarily long subsequences with a uniform probability distribution.

The results do not support a definitive conclusion since a few cases exhibit a performance increase, while the general trend is that the performance decreases when ignoring high variability regions. Two supplementary tests suggest that when there is a significant performance loss due to deletion of high variability nodes, a much smaller decrease occurs when the nodes are preserved but the position-specific amino acid distributions are removed. Taken together, these results support the hypothesis that there is some valuable information present in the high variability regions that enable the model to better discriminate between true and false homologs; and that other constructs for the high variability regions could perform better.

Ort, förlag, år, upplaga, sidor
Skövde: Institutionen för datavetenskap , 1997. , s. 81
Identifikatorer
URN: urn:nbn:se:his:diva-293OAI: oai:DiVA.org:his-293DiVA, id: diva2:2654
Presentation
(Engelska)
Uppsök

Handledare
Tillgänglig från: 2007-11-26 Skapad: 2007-11-26 Senast uppdaterad: 2009-11-16

Open Access i DiVA

fulltext(682 kB)191 nedladdningar
Filinformation
Filnamn FULLTEXT01.psFilstorlek 682 kBChecksumma MD5
b41e267afed8aad85be64462087f7b475a8b0437bc78036487cd75a31e66ece831cadcf5
Typ fulltextMimetyp application/postscript
fulltext(439 kB)195 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 439 kBChecksumma SHA-512
c7daeb5a54daf908d71ecba4602d3763e37b2f41e7930efa6f3ffb4c03cac1d89bf57a49716eff72d66322527eca1fe772895844eb94ba5e34464e3a333b62a1
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för datavetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 386 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 235 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf