This thesis explores the diagnostic potential of cell-free Deoxyribonucleic Acid (cfDNA) fragmentation features for distinguishing cancer from non-cancer cases in a pan-cancer cohort. A total of 216 plasma-derived cfDNA samples were analyzed using targeted methylation sequencing, focusing on fragment size distributions, Fragment End Motif (FEM) frequencies, and the Motif Diversity Score (MDS). A custom bioinformatics pipeline was developed for data preprocessing, motif extraction, diversity scoring, and statistical analysis.
MDS, reflecting global fragmentation diversity, did not significantly differentiate cancer from non-cancer cases (p = 0.72). However, unsupervised clustering of FEM frequencies revealed subgroup-specific fragmentation patterns, particularly among hematopoietic and metastatic cancers. Motif-wise statistical analysis identified several 4-mer motifs, including TG-starting sequences, that were significantly depleted in cancer samples, suggesting cancer-specific fragmentation signatures. Fragment size analysis further indicated a higher proportion of short cfDNA fragments (<150 bp) in cancer patients, consistent with known tumor-associated fragmentation patterns.
These findings suggest that while MDS alone lacks diagnostic power, combining motif-specific fragmentation profiles with fragment size distributions may improve non-invasive cancer detection. Future studies should validate these results in larger, independent cohorts and investigate integrated machine learning models to enhance diagnostic performance.