Högskolan i Skövde

his.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A blood transcriptomic pipeline for type 2 diabetes biomarker discovery training on GSE9006 and external validation on GSE15932
University of Skövde, School of Bioscience.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
Abstract [en]

Early detection of type 2 diabetes (T2D) is a public-health priority, yet externally validated, blood-based screening tools remain limited. This work implements a fully reproducible biomarker-discovery pipeline using whole-blood microarray data, trained on GSE9006 (Affymetrix GPL96; Ntrain=117, Control/T2D=105/12) and externally validated on GSE15932 (Affymetrix GPL570; Nval=16, Control/T2D=8/8). The objective was to derive a small, interpretable gene panel for T2D classification and to quantify performance with rigorous uncertainty estimates. Probe-level intensities were mapped to HGNC symbols and collapsed to the gene level. Differential expression (limma) was performed within cross-validation folds to avoid information leakage. Model-agnostic stability was quantified via Boruta and bootstrapped elastic-net (B=500), and the combined evidence yielded a 12-gene candidate panel (e.g., ETS1, GJD2, RALGDS). To mitigate cross-study heterogeneity, the combined train/validation matrix (restricted to the panel intersection) was adjusted with removeBatchEffect; PCA before/after adjustment showed attenuation of train-validation separation while retaining class structure. A shrunken elastic-net classifier was refit on the panel and evaluated in an independent cohort. Performance estimates were as follows: nested cross-validated AUC in training 0.877; external AUC 0.562 (95% CI 0.235-0.890; bootstrap B=2000). Using the Youden-optimal threshold, sensitivity 0.375 and specificity 0.250 were obtained; assuming 11% screening prevalence, PPV 0.236 and NPV 0.942 were achieved. Calibration was modest (Brier score 0.629), and a label-permutation test indicated no evidence above chance (p=0.677). These findings indicate modest discrimination in the independent cohort, consistent with platform and cohort heterogeneity, while delivering a leakage-aware workflow, a stabilized small panel, and clinically interpretable metrics that can guide multi-cohort refinement and prospective assay development. The full code and intermediate artefacts are provided to facilitate reuse and extension.

Place, publisher, year, edition, pages
2025. , p. 29
National Category
Medical Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:his:diva-25889OAI: oai:DiVA.org:his-25889DiVA, id: diva2:2003037
Subject / course
Systems Biology
Educational program
Molecular Biotechnology - Master's Programme, 120 ECTS
Supervisors
Examiners
Available from: 2025-10-02 Created: 2025-10-02 Last updated: 2025-10-02Bibliographically approved

Open Access in DiVA

fulltext(1350 kB)28 downloads
File information
File name FULLTEXT01.pdfFile size 1350 kBChecksum SHA-512
184d9dc438c92fa390e193c0a1b5394533cf90f78bf3518bfe60c9bf9c8bb090e4a8ec7ef00c919ac384e4c797e9ef096edb62e491e94e40b4791695b04e2858
Type fulltextMimetype application/pdf

By organisation
School of Bioscience
Medical Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1057 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf