Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing SNV Prioritization in WGS Cancer Diagnostics
University of Skövde, School of Bioscience.
2024 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background: Whole-genome sequencing (WGS) has revolutionized cancer diagnostics, enabling the identification of somatic single nucleotide variants (SNVs) critical for precision oncology. However, interpreting these variants remains challenging due to annotation inconsistencies and biases, particularly in tumor-only sequencing workflows. To address these challenges, the Balsamic pipeline integrates a scoring model that prioritizes variants by their likelihood of pathogenicity. This thesis evaluates the scoring model integrated within the Balsamic pipeline, designed to prioritize clinically relevant variants in tumor-only analyses. Using an initial dataset of 8,145 variants and an extended dataset derived from ClinVar, the model's ability to distinguish between benign and pathogenic variants was assessed. 

Results: The initial analysis revealed high specificity for benign variants, with 98% correctly classified, but struggled with pathogenic variants, achieving a recall of 54% and an Fl-score of 0.59 at the optimal threshold. Feature contribution analysis identified "Consequence" (CON) and "Clinical Significance" (CLIN) as key predictors, leading to reweighting efforts to improve separation between groups. Using the extended dataset, model performance improved significantly, achieving a precision of 92% and an Fl-score of 0.90 at the optimal threshold, demonstrating the potential of balanced datasets and combined features like the newly introduced "COMBINED SCORE" for enhancing classification accuracy. 

Conclusions: This work highlights the importance of tailored feature weighting, dataset balance, and innovative feature engineering in improving variant prioritization workflows. Future research should focus on integrating real-world clinical data and leveraging machine learning to refine predictive capabilities further. This work provides a foundation for improving variant interpretation workflows in precision oncology. 

Place, publisher, year, edition, pages
2024. , p. 50
National Category
Medical Genetics and Genomics
Identifiers
URN: urn:nbn:se:his:diva-24866OAI: oai:DiVA.org:his-24866DiVA, id: diva2:1932387
External cooperation
Clinical Genomics, SciLifeLab
Subject / course
Bioinformatics
Educational program
Bioinformatics - Master’s Programme
Supervisors
Examiners
Available from: 2025-01-31 Created: 2025-01-29 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

fulltext(3762 kB)86 downloads
File information
File name FULLTEXT01.pdfFile size 3762 kBChecksum SHA-512
6d9b5b35a50d2b60a38237e72760de9032a08b317e0d865c0693ff097c50d11df8d64dd0822db40a14a7bc7fe2e22559d989d844eda55ae7716b9392a605ed36
Type fulltextMimetype application/pdf

By organisation
School of Bioscience
Medical Genetics and Genomics

Search outside of DiVA

GoogleGoogle Scholar
Total: 86 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 279 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf