Adipocyte signaling, normally and in type 2 diabetes, is far from fully understood. We have earlier developed detailed dynamic mathematical models for several well-studied, partially overlapping, signaling pathways in adipocytes. Still, these models only cover a fraction of the total cellular response. For a broader coverage of the response, large-scale phosphoproteomic data and systems level knowledge on protein interactions are key. However, methods to combine detailed dynamic models with large-scale data, using information about the confidence of included interactions, are lacking. We have developed a method to first establish a core model by connecting existing models of adipocyte cellular signaling for: (1) lipolysis and fatty acid release, (2) glucose uptake, and (3) the release of adiponectin. Next, we use publicly available phosphoproteome data for the insulin response in adipocytes together with prior knowledge on protein interactions, to identify phosphosites downstream of the core model. In a parallel pairwise approach with low computation time, we test whether identified phosphosites can be added to the model. We iteratively collect accepted additions into layers and continue the search for phosphosites downstream of these added layers. For the first 30 layers with the highest confidence (311 added phosphosites), the model predicts independent data well (70–90% correct), and the predictive capability gradually decreases when we add layers of decreasing confidence. In total, 57 layers (3059 phosphosites) can be added to the model with predictive ability kept. Finally, our large-scale, layered model enables dynamic simulations of systems-wide alterations in adipocytes in type 2 diabetes.
Background: Transcription factors (TFs) are the upstream regulators that orchestrate gene expression, and therefore a centrepiece in bioinformatics studies. While a core strategy to understand the biological context of genes and proteins includes annotation enrichment analysis, such as Gene Ontology term enrichment, these methods are not well suited for analysing groups of TFs. This is particularly true since such methods do not aim to include downstream processes, and given a set of TFs, the expected top ontologies would revolve around transcription processes.
Results: We present the TFTenricher, a Python toolbox that focuses specifically at identifying gene ontology terms, cellular pathways, and diseases that are over-represented among genes downstream of user-defined sets of human TFs. We evaluated the inference of downstream gene targets with respect to false positive annotations, and found an inference based on co-expression to best predict downstream processes. Based on these downstream genes, the TFTenricher uses some of the most common databases for gene functionalities, including GO, KEGG and Reactome, to calculate functional enrichments. By applying the TFTenricher to differential expression of TFs in 21 diseases, we found significant terms associated with disease mechanism, while the gene set enrichment analysis on the same dataset predominantly identified processes related to transcription.
Conclusions and availability: The TFTenricher package enables users to search for biological context in any set of TFs and their downstream genes. The TFTenricher is available as a Python 3 toolbox at https://github.com/rasma774/Tftenricher, under a GNU GPL license and with minimal dependencies.
Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease.
The different chambers of the human heart demonstrate regional physiological traits and may be differentially affected during pathologic remodeling, resulting in heart failure. Few previous studies have, however, characterized the different chambers at a transcriptomic level. We therefore conducted whole-tissue RNA sequencing and gene set enrichment analysis of biopsies collected from the four chambers of adult failing (n = 8) and nonfailing (n = 11) human hearts. Atria and ventricles demonstrated distinct transcriptional patterns. Compared to nonfailing ventricles, the transcriptional pattern of nonfailing atria was enriched for a large number of gene sets associated with cardiogenesis, the immune system and bone morphogenetic protein (BMP), transforming growth factor beta (TGF beta), MAPK/JNK and Wnt signaling. Differences between failing and nonfailing hearts were also determined. The transcriptional pattern of failing atria was distinct compared to that of nonfailing atria and enriched for gene sets associated with the innate and adaptive immune system, TGF beta/SMAD signaling, and changes in endothelial, smooth muscle cell and cardiomyocyte physiology. Failing ventricles were also enriched for gene sets associated with the immune system. Based on the transcriptomic patterns, upstream regulators associated with heart failure were identified. These included many immune response factors predicted to be similarly activated for all chambers of failing hearts. In summary, the heart chambers demonstrate distinct transcriptional patterns that differ between failing and nonfailing hearts. Immune system signaling may be a hallmark of all four heart chambers in failing hearts, and could constitute a novel therapeutic target.
BACKGROUND: Hub transcription factors, regulating many target genes in gene regulatory networks (GRNs), play important roles as disease regulators and potential drug targets. However, while numerous methods have been developed to predict individual regulator-gene interactions from gene expression data, few methods focus on inferring these hubs.
RESULTS: We have developed ComHub, a tool to predict hubs in GRNs. ComHub makes a community prediction of hubs by averaging over predictions by a compendium of network inference methods. Benchmarking ComHub against the DREAM5 challenge data and two independent gene expression datasets showed a robust performance of ComHub over all datasets.
CONCLUSIONS: In contrast to other evaluated methods, ComHub consistently scored among the top performing methods on data from different sources. Lastly, we implemented ComHub to work with both predefined networks and to perform stand-alone network inference, which will make the method generally applicable.