Pancreatic ductal adenocarcinoma is characterized by high mortality and limited diagnostic and prognostic tools, creating a need for improved molecular classifiers. This study investigated how variational autoencoders could capture nonlinear, high-dimensional gene expression features and integrate these latent representations with classical machine learning approaches. Using a dataset from collaborative sources, pre-processing steps were performed together with dimensionality reduction and differential expression analyses. Despite group-level differences in gene expression, the resulting variational autoencoder-based latent spaces did not translate into improved classification of low-grade pancreatic ductal adenocarcinoma tumors. The result showed the challenges of class imbalance, the complexity of pancreatic ductal adenocarcinomas heterogeneous transcriptome and the limitations of purely unsupervised latent variable models in handling clinically relevant distinctions.
The methodological framework developed here still shows the potential for combining deep generative feature extraction and conventional classifiers, providing the groundwork for future refinements through semi-supervised learning and multi-omics integration. Conclusively, this work contributes insights into the design and optimization of machine learning pipelines aimed at improving pancreatic ductal adenocarcinoma stratification and informing targeted therapeutic strategies.