💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Why do we need Virtual Environments?
  • What are Virtual Environments?
  • Installing virtual environments in Python
  • Installing Anaconda / Conda (Python)
  • Example in Pycharm
  • Virtual environments and R: Example in RStudio
  1. Introduction

Virtual environments - let's begin

What are virtual environments and why do we need them, Anaconda | Conda

For a beginning data analyst, it's reasonable to prioritize learning foundational data analysis concepts and techniques over setting up virtual environments initially. While virtual environments are valuable for managing dependencies and isolating project environments, they can add complexity to the learning process for beginners. Beginners can skip this chapter and focus on the following chapters. As the analyst becomes more comfortable with R and/or Python and begins working on more complex projects or collaborating with others, they can return to this chapter to explore setting up virtual environments to manage dependencies more efficiently.

Why do we need Virtual Environments?

In modern programming languages, programmers rarely write all the functionality they need from scratch. Libraries (often used interchangeably with packages) are reusable pieces of code shared with the community to make everyone's life easier. Modern programming languages often come preinstalled with a lot of valuable libraries; for example, in Python, you can load the zipfile library to, you guessed it, read and write zip files. Many of the most useful libraries for data scientists are not part of the standard installation of Python/R but are created by independent 3rd parties and can be installed with a package manager (e.g., Pip for Python, as we introduced in the next chapter).

Data scientists rely heavily on these packages/libraries. The most popular packages tend to be constantly evolving, with new versions being released that add new functionality or sometimes remove old obsolete functionality or change the behavior of existing functions. For data scientists with some years of experience, the evolving landscape of packages results in situations where your recently written scripts probably are using recent versions of 3rd party packages because you wanted to use that cool new data analysis technique, but your older scripts rely on older versions of those same packages. Since some functions in the packages may have changed or been removed, your old scripts may not work anymore with the latest versions of these packages! So, will you update your old scripts to work with the newer versions of the packages it relies on, or will you uninstall the new versions and reinstall the older versions each time you need to run an old script (and vice versa)? And then we haven't even touched the subject of packages depending on specific versions of other packages! As you create more scripts, installing compatible versions of packages can become daunting! A much more convenient solution was created to deal with this problem, which allows us to have independent "environments" into which different versions of packages can be installed: Virtual Environments.

What are Virtual Environments?

A virtual environment is a self-contained directory that houses a specific interpreter and its associated libraries, isolating project dependencies from the global environment. Virtual environments allow you to isolate project-specific dependencies. This means that the libraries and packages required for one project can be managed independently of those for another project. This isolation helps avoid conflicts and ensures that each project has its own clean and consistent set of dependencies.

Important: The benefit of virtual environments lies in their isolation; each virtual environment operates independently. When a module (a Python extension) is installed, it resides exclusively within the designated environment. This ensures that if a particular library disrupts functionality in one project, the integrity of all other projects remains unaffected and secure.

Installing virtual environments in Python

Install and set up virtual environments, you can use the built-in venv module for Python. Here are the basic five steps:

1. Open a Terminal or Command Prompt

2. Navigate to Your Project Directory

cd path/to/your/project

3. Create a Virtual Environment

Run the following command to create a virtual environment named venv. You can replace venv with your preferred environment name.

python -m venv venv

If you're using Python 3.3 or earlier, you might need to use virtualenv instead:

virtualenv venv

4. Activate the Virtual Environment:

Activate the virtual environment based on your operating system.

On Windows:

.\venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

You should see the virtual environment's name in your command prompt, indicating it's active.

5. Install Packages:

With the virtual environment activated, you can install Python packages using pip.

pip install package_name

6. Deactivate the Virtual Environment:

When you're done working in the virtual environment, deactivate it.

deactivate

The virtual environment's name should disappear from your command prompt.

A concrete example is shown below – directly using Pycharm and RStudio, which are the most used environments for advanced data analysis - such as OMICs analysis.

Installing Anaconda / Conda (Python)

Anaconda facilitates the creation and management of virtual environments. Virtual environments allow users to isolate project-specific dependencies, making managing different projects with potentially conflicting requirements easier.

To install Conda, follow the provided by:

and opt for the Anaconda distribution. Subsequently, add the complete path to the 'Scripts' directory in your Anaconda installation (replace the dots with the actual path corresponding to your installation location) to your PATH environment variable. If you encounter difficulties, feel free to reach out for assistance. You can confirm the successful addition by typing the following command into your command line:

conda --help

Example in Pycharm

Here is a short example of how to set up the environment env in Pycharm and where to find the terminal. However, more detailed settings (step-by-step manual) are included in the following chapter ‘Getting started with Python’.

After opening Pycharm, we recommend working with the Terminal listed in it; see red rectangle.

Firstly, you can see that ‘C:\User..’ is not specified; there is no virtual environment. Therefore, click on File (2.) and choose Settings (3.) and Project (4.). You will see Python Interpreter, in which you can choose your environment and click on Apply:

Here, the Python 3.10 (venv) is chosen (6.). After, click on Apply (7.) and OK (8.):

After that, wait a minute to complete the skeletons update (9.) Next, open a new session in Terminal (10.).

Here, you can see that your virtual environment is set up correctly; venv – (11.):

Virtual environments and R: Example in RStudio

In the RStudio it is possible to set the virtual environment in Terminal next to Console. We mention this possibility here. However, it is not common to set it up in R when compared to Python, if one uses R for the OMICs analysis. Thus, just a brief remark.

PreviousFrom AuthorsNextGetting started with Python

Last updated 1 year ago

https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html