Högskolan i Skövde

his.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Finding remote protein homologs with hidden Markov models
Högskolan i Skövde, Institutionen för datavetenskap.
1997 (engelsk)Independent thesis Advanced level (degree of Master (One Year))Oppgave
Abstract [en]

Detecting remote homologs by sequence similarity gets increasingly difficult as the percentage of identical residues decreases. The aim of this work was to investigate if the performance of hidden Markov models could be improved by ignoring the subsequences that exhibit high variability, and only concentrate on the truly conserved regions. This is based on the underlying assumption that these high variability regions could be unnecessary, or even misleading, during search of remote protein homologs.

In this paper we challenge this assumption by identifying the high and low variability regions of multiple alignments and modifying models by focusing them on the conserved regions. The high variability regions are located with information theoretic measures and modeled by free insertion modules, which are special nodes that can be used to model arbitrarily long subsequences with a uniform probability distribution.

The results do not support a definitive conclusion since a few cases exhibit a performance increase, while the general trend is that the performance decreases when ignoring high variability regions. Two supplementary tests suggest that when there is a significant performance loss due to deletion of high variability nodes, a much smaller decrease occurs when the nodes are preserved but the position-specific amino acid distributions are removed. Taken together, these results support the hypothesis that there is some valuable information present in the high variability regions that enable the model to better discriminate between true and false homologs; and that other constructs for the high variability regions could perform better.

sted, utgiver, år, opplag, sider
Skövde: Institutionen för datavetenskap , 1997. , s. 81
Identifikatorer
URN: urn:nbn:se:his:diva-293OAI: oai:DiVA.org:his-293DiVA, id: diva2:2654
Presentation
(engelsk)
Uppsök

Veileder
Tilgjengelig fra: 2007-11-26 Laget: 2007-11-26 Sist oppdatert: 2009-11-16

Open Access i DiVA

fulltekst(682 kB)191 nedlastinger
Filinformasjon
Fil FULLTEXT01.psFilstørrelse 682 kBChecksum MD5
b41e267afed8aad85be64462087f7b475a8b0437bc78036487cd75a31e66ece831cadcf5
Type fulltextMimetype application/postscript
fulltekst(439 kB)195 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 439 kBChecksum SHA-512
c7daeb5a54daf908d71ecba4602d3763e37b2f41e7930efa6f3ffb4c03cac1d89bf57a49716eff72d66322527eca1fe772895844eb94ba5e34464e3a333b62a1
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 386 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 235 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf