Welcome to Onco Pro Exp

OncoProExp: An Interactive Platform for Cancer Proteomics and Phosphoproteomics Analysis

OncoProExp is comprehensive web-based Shiny application designed for visualization, statistical analysis, and predictive modeling of cancer proteomics and phosphoproteomics data. This platform allows for in-depth exploration and prediction of tumor versus normal samples across multiple cancer types. OncoProExp utlizes data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) and includes the following Proteome and Phoso-proteome types

1. Data Preprocessing: Imputation, normalization, outlier management, and distribution checks to ensure robust data quality.

2. Data Visualization: PCA, MDS, UMAP, heatmaps, and density plots to visualize protein and phosphoprotein distributions.

3. Gene Set Enrichment Analysis (GSEA): Analysis of pathways and biological processes relevant to cancer progression, like the Kras signaling pathway.

4. Differential Expression Analysis: Identification of differentially expressed proteins and phosphoproteins, revealing metabolic and molecular profiles specific to each cancer type.

5. Survival Analysis: Kaplan-Meier plots and Cox proportional hazard models to assess protein and phosphoprotein markers associated with patient survival outcomes across cancer types.

6. Protein-Protein Interactions and Protein-Drug Interactions: Visual exploration of protein networks, including potential drug targets, available in the Differential Expression tab.

7. AI-based Predictive Models: Predict cancer types with SVM, Random Forest, and ANN models tailored to handle complex proteomic and phosphoproteomic datasets.

If you find our tool useful, please cite our paper:

OncoProExp: A Web-Based Platform for Oncologically Relevant Proteome and Phosphoproteome Exploration and Machine Learning based Prediction in Human Cancers (Manuscript in preparation)

This tool is free of charge and is intended solely for non-commercial, academic, and research purposes. It should not be used for any commercial applications or profit-driven activities. Our mission is to support researchers, students, and professionals in advancing knowledge, and we kindly request that users respect these terms to help maintain the accessibility and integrity of the tool.

User Privacy Policy:

No Personal Data Storage: We do not store any personal information, cookies, or uploaded files. All uploaded files are renamed with random identifiers for added security and are deleted immediately after processing.
Anonymized Location Data: For general analytics and to improve user experience, we may use IP addresses to approximate your location (city, state, or country). This information is anonymized, used only for traffic pattern analysis, and is not linked to any specific user.
Opt-out Option: You can disable location services in your browser settings if you prefer not to share approximate location data.
Transparent Processing: All data handling is transparent, secure, and strict to provide a better user experience without compromising your privacy.

Data preprocessing

Data preprocessing is the first crucial step in preparing the raw data and making it suitable for a machine learning model. Finding missing values and detecting outliers are essential steps before performing any analysis. Some algorithms require data to meet some assumptions, such as following a normal distribution. Data transformation (e.g., log, z-score, scaling, etc.) can prevent model bias and reduce computation time. OncoProExp uses missForest and dlookr packages to impute missing values, diagnose, explore, and transform data. Please upload your data with samples organized in columns and features in rows or load example data.

Error

Check the separation field, file format, or load the example.

Head of table

General overview

Select one

Outlier detection

Select one

Change the outlier ratio %

Missing values

Outlier distribution

Select a variable

Head of imputed values

Save results

Head of imputed Outilers

Save results

Visualization

Visualization method

Save plot

Differential Expressed Proteins and Phosphoproteins

Info
Example

OncoProExp Initially calculates the Median Absolute Deviation (MAD) to select the most variable features and uses a scree plot to determine the number of principal components (PCs) to retain through PCA. It visualizes data dimensionality reduction with PCA plots, cluster patterns with color-coded samples, and assesses variability through box plots, density plots, and heatmaps. Multidimensional Scaling (MDS) plots reveal sample distances and clustering. Differentially expressed proteins (DEPs) and phosphoproteins (DEPPs) are identified using empirical Bayesian modeling with the `lmFit` function from the limma package, with significant changes highlighted by volcano plots and heatmaps. Gene Set Enrichment Analysis (GSEA) identifies impacted pathways, visualized through bar plots. For enrichment analysis, OncoProExp leverages the gprofiler2 package to explore gene ontology terms and pathways, with interactive Plotly plots and summarized dot plots highlighting key biological processes. OncoProExp uses limma package to perform differential gene expression analysis, ggplot2 to visualize it, and finally, ComplexHeatmap package to construct heatmap graph.

Users can upload their expression (with gene symbol/IDs in the first column and sample data in the remaining columns) and metadata tables (with sample names and class labels as first two columns), perform filtering and normalization, and convert Ensemble IDs to gene symbols, facilitating a tailored and thorough analysis of their data.

Expression example:

Metadata example:

Error

Check the separation field and file format in your expression and metadata files, or load the example.

Visualization
DEG analysis

Expression table and metadata

Save table

Inspecting Samples

Calculate top variable genes (MAD)

Save plot

Table

Save table

Select

Inspecting DEGs

Target

Save plot

Show sample label

Target

Save plot

Save plot Save table

Select one

AI-based Predictive Models

In OncoProExp , predicting cancer types from proteomic and phosphoproteomic data is tackled using advanced AI-based models designed to handle the complexity and variability of the data. The app employs Support Vector Machines (SVM) from e1071 package, Random Forests (RF) from randomForest , and Artificial Neural Networks (ANN) from keras with TensorFlow v 2.10 in backend. Model performance is evaluated using accuracy, sensitivity, specificity, precision, F1-score, and area under the curve (AUC), with calculations adhering to established metrics and using the pROC package to assess class separation capabilities. Please upload your table with gene/IDs in the first column and sample data in the remaining columns.

Error

Check the separation field, file format, or load the example.

Table
Model

Uploaded table

Model

Model Performance

Save table

New data prediction

Save table

Real labels for comarison (if any)

Upload real labels

Browse...

Format

Save table

Survival Analysis

The survival analysis section in OncoProExp provides comprehensive survival modeling tools to explore prognostic markers using survival package. Kaplan-Meier plots are generated based on user-defined gene expression thresholds, enabling visual assessment of survival outcomes for specific biomarkers. Cox proportional hazards models are used for each split criterion, including mean, median, and quantiles, allowing for robust hazard ratio (HR) estimation and testing of covariate effects on survival. A unique feature in this module allows users to adjust covariates while calculating hazard ratios, making the models flexible for multi-variable analysis. Each gene’s best split point is automatically identified, based on the optimal HR, to ensure balanced group comparisons.

Save plot

Save table

Select a covariate

Save plot

Select a cancer type to download the expression and metadata files for both proteome and phosphoproteome data.

Data type

Download

Data type

Download