Welcome to Onco Pro Exp
OncoProExp: An Interactive Platform for Cancer Proteomics and Phosphoproteomics Analysis
OncoProExp is comprehensive web-based Shiny application designed for visualization, statistical analysis, and predictive modeling of cancer proteomics and phosphoproteomics data. This platform allows for in-depth exploration and prediction of tumor versus normal samples across multiple cancer types. OncoProExp utlizes data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) and includes the following Proteome and Phoso-proteome types
1. Data Preprocessing: Imputation, normalization, outlier management, and distribution checks to ensure robust data quality.
2. Data Visualization: PCA, MDS, UMAP, heatmaps, and density plots to visualize protein and phosphoprotein distributions.
3. Gene Set Enrichment Analysis (GSEA): Analysis of pathways and biological processes relevant to cancer progression, like the Kras signaling pathway.
4. Differential Expression Analysis: Identification of differentially expressed proteins and phosphoproteins, revealing metabolic and molecular profiles specific to each cancer type.
5. Survival Analysis: Kaplan-Meier plots and Cox proportional hazard models to assess protein and phosphoprotein markers associated with patient survival outcomes across cancer types.
6. Protein-Protein Interactions and Protein-Drug Interactions: Visual exploration of protein networks, including potential drug targets, available in the Differential Expression tab.
7. AI-based Predictive Models: Predict cancer types with SVM, Random Forest, and ANN models tailored to handle complex proteomic and phosphoproteomic datasets.
If you find our tool useful, please cite our paper:
OncoProExp: A Web-Based Platform for Oncologically Relevant Proteome and Phosphoproteome Exploration and Machine Learning based Prediction in Human Cancers (Manuscript in preparation)
This tool is free of charge and is intended solely for non-commercial, academic, and research purposes. It should not be used for any commercial applications or profit-driven activities. Our mission is to support researchers, students, and professionals in advancing knowledge, and we kindly request that users respect these terms to help maintain the accessibility and integrity of the tool.
User Privacy Policy:
- No Personal Data Storage: We do not store any personal information, cookies, or uploaded files. All uploaded files are renamed with random identifiers for added security and are deleted immediately after processing.
- Anonymized Location Data: For general analytics and to improve user experience, we may use IP addresses to approximate your location (city, state, or country). This information is anonymized, used only for traffic pattern analysis, and is not linked to any specific user.
- Opt-out Option: You can disable location services in your browser settings if you prefer not to share approximate location data.
- Transparent Processing: All data handling is transparent, secure, and strict to provide a better user experience without compromising your privacy.
Data preprocessing
Data preprocessing is the first crucial step in preparing the raw data and making it suitable for a machine learning model. Finding missing values and detecting outliers are essential steps before performing any analysis. Some algorithms require data to meet some assumptions, such as following a normal distribution. Data transformation (e.g., log, z-score, scaling, etc.) can prevent model bias and reduce computation time. OncoProExp uses missForest and dlookr packages to impute missing values, diagnose, explore, and transform data. Please upload your data with samples organized in columns and features in rows or load example data.
Check the separation field, file format, or load the example.
Head of table
General overview
Outlier detection
Missing values
Outlier distribution
Differential Expressed Proteins and Phosphoproteins
OncoProExp Initially calculates the Median Absolute Deviation (MAD) to select the most variable features and uses a scree plot to determine the number of principal components (PCs) to retain through PCA. It visualizes data dimensionality reduction with PCA plots, cluster patterns with color-coded samples, and assesses variability through box plots, density plots, and heatmaps. Multidimensional Scaling (MDS) plots reveal sample distances and clustering. Differentially expressed proteins (DEPs) and phosphoproteins (DEPPs) are identified using empirical Bayesian modeling with the `lmFit` function from the limma package, with significant changes highlighted by volcano plots and heatmaps. Gene Set Enrichment Analysis (GSEA) identifies impacted pathways, visualized through bar plots. For enrichment analysis, OncoProExp leverages the gprofiler2 package to explore gene ontology terms and pathways, with interactive Plotly plots and summarized dot plots highlighting key biological processes. OncoProExp uses limma package to perform differential gene expression analysis, ggplot2 to visualize it, and finally, ComplexHeatmap package to construct heatmap graph.
Users can upload their expression (with gene symbol/IDs in the first column and sample data in the remaining columns) and metadata tables (with sample names and class labels as first two columns), perform filtering and normalization, and convert Ensemble IDs to gene symbols, facilitating a tailored and thorough analysis of their data.
Check the separation field and file format in your expression and metadata files, or load the example.
Expression table and metadata
Table
Inspecting DEGs
Explore differentially expressed proteins in the DrugBank website to gain insights into therapeutic potential and identify relevant drugs.
CancerDrugs_DB is a curated database created by the Anticancer Fund, offering an accessible listing of licensed cancer drugs from sources such as the NCI, FDA, and EMA. This resource supports researchers, clinicians, and regulatory bodies with a comprehensive catalog of drugs approved specifically for cancer treatment, excluding supportive care, diagnostic agents, and investigational treatments. By leveraging CancerDrugs_DB, we aim to identify matching protein targets between licensed cancer drugs and our own differentially expressed proteins, enabling deeper insights into potential therapeutic connections.
AI-based Predictive Models
In OncoProExp , predicting cancer types from proteomic and phosphoproteomic data is tackled using advanced AI-based models designed to handle the complexity and variability of the data. The app employs Support Vector Machines (SVM) from e1071 package, Random Forests (RF) from randomForest , and Artificial Neural Networks (ANN) from keras with TensorFlow v 2.10 in backend. Model performance is evaluated using accuracy, sensitivity, specificity, precision, F1-score, and area under the curve (AUC), with calculations adhering to established metrics and using the pROC package to assess class separation capabilities. Please upload your table with gene/IDs in the first column and sample data in the remaining columns.
Check the separation field, file format, or load the example.
Uploaded table
Model
Model Performance
New data prediction
Real labels for comarison (if any)
Survival Analysis
The survival analysis section in OncoProExp provides comprehensive survival modeling tools to explore prognostic markers using survival package. Kaplan-Meier plots are generated based on user-defined gene expression thresholds, enabling visual assessment of survival outcomes for specific biomarkers. Cox proportional hazards models are used for each split criterion, including mean, median, and quantiles, allowing for robust hazard ratio (HR) estimation and testing of covariate effects on survival. A unique feature in this module allows users to adjust covariates while calculating hazard ratios, making the models flexible for multi-variable analysis. Each gene’s best split point is automatically identified, based on the optimal HR, to ensure balanced group comparisons.