his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clustering biological data using a hybrid approach: Composition of clusterings from different features
University of Skövde, School of Humanities and Informatics.
2008 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology.

The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.

Place, publisher, year, edition, pages
Skövde: Institutionen för kommunikation och information , 2008. , 106 p.
Keyword [en]
clustering, bioinformatics, hybrid
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:his:diva-1078OAI: oai:DiVA.org:his-1078DiVA: diva2:2196
Presentation
(English)
Uppsok
Physics, Chemistry, Mathematics
Supervisors
Examiners
Available from: 2008-06-05 Created: 2008-06-05 Last updated: 2009-10-16

Open Access in DiVA

fulltext(1805 kB)851 downloads
File information
File name FULLTEXT01.pdfFile size 1805 kBChecksum MD5
520aae281ecfb3cb03b1e5afb054a6a115c496291efa3cffa59c1731d7e76a35d9f1448a
Type fulltextMimetype application/pdf

By organisation
School of Humanities and Informatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 851 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 1238 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf