Orthogonal Partial Least Squares (OPLS)

Metabolites and lipids multivariate statistical analysis in R

The Orthogonal Partial Least Squares (OPLS) uses orthogonal signal correction (OSC) to maximize the explained covariance between predictors (X) and responses (Y) on the first latent variable (LV). The orthogonal components cover all variance that is unrelated to Y.

Practical applications of OPLS-DA (examples)

The OPLS is a handy tool for lipidomics and metabolomics scientists. OPLS, similarly to PLS, produces a low-dimensional representation of large data sets for a simple investigation of differences/similarities between biological groups. Usually, a clear separation of biological groups is observed along the x-axis if the differences in lipidome or metabolome are significant enough. OPLS-DA is a good feature selection tool. The OPLS-DA for discriminant analysis (classification of samples based on metabolome/lipidome) is frequently used in the field. It can also be applied to predict clinical parameters based on lipid or metabolite concentrations.

Check out the following applications of OPLS-DA:

  • D. Wolrab et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. DOI: https://doi.org/10.1038/s41467-021-27765-9arrow-up-right (the authors of the study published in Nature Communications utilize OPLS-DA to differentiate between patients with pancreatic cancer and healthy controls; additionally, OPLS-DA is used for sample classification and then further employed to identify the most distinguishing lipids between these groups).

  • D. Wolrab et al. Plasma lipidomic profiles of kidney, breast and prostate cancer patients differ from healthy controls. DOI: https://doi.org/10.1038/s41598-021-99586-1arrow-up-right (classification of patients with kidney, breast, and prostate cancer based on plasma lipidomes; selection of distinguishing features).

  • A. Kvasnička et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. DOI: https://doi.org/10.1186/s13075-023-03204-6arrow-up-right (analysis of differences between different groups of patients and healthy controls based on plasma lipidome, sample classification, analysis of distinguishing features).

  • D. Olešová et al. Changes in lipid metabolism track with the progression of neurofibrillary pathology in tauopathies. DOI: https://doi.org/10.1186/s12974-024-03060-4arrow-up-right (the authors use OPLS-DA to analyze differences (separation) of experimental groups, e.g., based on cerebrospinal fluid lipidome - exploring global effects of brain aberrant metabolism on the composition of CSF).

  • J. Idkowiak et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. DOI: https://doi.org/10.1007/s00216-022-04490-warrow-up-right (OPLS-DA is used for the analysis of differences between patients with pancreatic cancer and healthy controls, sample classification, selection of features, and comparison of outcomes across different analytical approaches to lipidome analysis).

Training OPLS in R

The R implementation of OPLS is the ropls package. We will briefly introduce it here and prepare the OPLS-DA model to classify samples from our PDAC data set. Let's begin with the installation of ropls from the Bioconductor.

# Installation of ropls package from Bioconductor.
# Highlight and run this code:
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ropls")

# Calling the library:
library(ropls)

# We will also need function from the tidymodels library:
library(tidymodels)

The opls() function fully automates the preparation of the OPLS-DA() in R. Take a look at the code below:

After we run the last line, we obtain the following outputs from tuning:

And the visual output presents scores plot, loadings plot, observation diagnostics plot, and inertia barplot:

The ropls output from OPLS-DA model training.

1) Scores plot (lower left corner) presents the training data set after the dimensionality reduction (we clearly see the separation of healthy controls from PDAC patients).

2) Loadings plot (lower right corner) presents the most important lipids for OPLS components; for example, the three most characteristic lipids for N are SM 39:1;O2, PC 37:2, and LPC 18:2, while for T - Cer 34:1;O2, Cer 36:1;O2, TG 56:5.

3) Inertia barplots (upper left corner) show that one OPLS component with two orthogonal components may be enough to capture most of the inertia (variance);

4) Observation diagnostics - presents samples that can potentially bias the OPLS-DA computation.

We can now perform predictions using our OPLS-DA. The ropls function opls() creates a specific type of R object called S4 R object. To access elements of this object, we use the @ symbol. To perform predictions, we need to run the following line of code:

The ROC curve for our OPLS-DA:

The ROC curve for our OPLS-DA model.

As you see, the performance of the OPLS-DA model prepared using ropls library is even better than the PLS-DA model we built in the previous subchapters using caret and tidymodels with mixOmics. Remember that in OPLS-DA, we focus on presenting the crucial variance in the first component while separating all the noise in the orthogonal component(s). Using caret function confusionMatrix(), we can also compute the confusion matrix:

The output:

The caret confusion matrix characterizing ropes OPLS-DA performance at 0.5 threshold.

We see that our OPLS-DA deals quite well with the classification of patient samples at a threshold of 0.5 (sensitivity of ~97%). On the other hand, the specificity is around ~85%, meaning that our OPLS-DA is a bit worse in classifying healthy control samples. The overall accuracy is 90.5% with relatively narrow 95% confidence intervals, and it is significantly above the no information rate. The model correctly distinguishes healthy individuals (controls) from patients with PDAC based on the concentrations of lipids in serum.

The low-dimensional representation of OPLS-DA was presented above - in the summary visualization with OPLS-DA fit results. In the scores plot, we saw the separation of patients with PDAC from healthy volunteers, suggesting differences between N and T groups in serum lipid profiles. Below, we will also create the loadings and VIP plots to investigate the most important lipid species contributing to the separation of controls from cancer patients - according to the OPLS-DA model. We will also show you how you can prepare S-plot.

Now, let's extract scores and loadings from the 'OPLSDA' object and prepare elegant publication-ready plots using ggplot2 function:

Both plots together:

PLOT A (on the left side): OPLS-DA scores plot, PLOT B (on the right side): OPLS-DA loadings plot of the model prepared using ropls library.

VIP plot can be prepared after extracting variable importance scores through getVipVn():

The plot:

The ggpubr lollipop chart shows the importance scores of predictors (lipid species) according to the OPLS-DA model from the ropls library.

Finally, next to the VIP plot, the S-plot is a frequently used chart to present the most important features according to the OPLS model. S-plot presents Pearson correlations between OPLS scores and normalized lipid concentrations in relation to covariance between OPLS scores and normalized lipid concentrations. If data are Pareto-scaled, then usually a characteristic "S" appears in the middle of the chart, while autoscaling of data results in a straight line. Take a look at the code block below and the obtained S-plot:

The plot:

The S-plot for our OPLS-DA model. The data are autoscaled.

Below, you will find one more S-plot - for Pareto-scaled data and the updated S-plot code:

The S-plot if the data set is Pareto-scaled (forming the S-like shape):

The S-plot shape for Pareto-scaled data (based on the OPLS-DA model for the PDAC data set).

More about the possibilities offered by the ropls() package you can find in the detailed vignette prepared by the authors:

The vignette of ropls package

Last updated