Virtual environments - let's begin
What are virtual environments and why do we need them, Anaconda | Conda
For a beginning data analyst, it's reasonable to prioritize learning foundational data analysis concepts and techniques over setting up virtual environments initially. While virtual environments are valuable for managing dependencies and isolating project environments, they can add complexity to the learning process for beginners. Beginners can skip this chapter and focus on the following chapters. As the analyst becomes more comfortable with R and/or Python and begins working on more complex projects or collaborating with others, they can return to this chapter to explore setting up virtual environments to manage dependencies more efficiently.
Why do we need Virtual Environments?
In modern programming languages, programmers rarely write all the functionality they need from scratch. Libraries (often used interchangeably with packages) are reusable pieces of code shared with the community to make everyone's life easier. Modern programming languages often come preinstalled with a lot of valuable libraries; for example, in Python, you can load the zipfile library to, you guessed it, read and write zip files. Many of the most useful libraries for data scientists are not part of the standard installation of Python/R but are created by independent 3rd parties and can be installed with a package manager (e.g., Pip for Python, as we introduced in the next chapter).
Data scientists rely heavily on these packages/libraries. The most popular packages tend to be constantly evolving, with new versions being released that add new functionality or sometimes remove old obsolete functionality or change the behavior of existing functions. For data scientists with some years of experience, the evolving landscape of packages results in situations where your recently written scripts probably are using recent versions of 3rd party packages because you wanted to use that cool new data analysis technique, but your older scripts rely on older versions of those same packages. Since some functions in the packages may have changed or been removed, your old scripts may not work anymore with the latest versions of these packages! So, will you update your old scripts to work with the newer versions of the packages it relies on, or will you uninstall the new versions and reinstall the older versions each time you need to run an old script (and vice versa)? And then we haven't even touched the subject of packages depending on specific versions of other packages! As you create more scripts, installing compatible versions of packages can become daunting! A much more convenient solution was created to deal with this problem, which allows us to have independent "environments" into which different versions of packages can be installed: Virtual Environments.
What are Virtual Environments?
A virtual environment is a self-contained directory that houses a specific interpreter and its associated libraries, isolating project dependencies from the global environment. Virtual environments allow you to isolate project-specific dependencies. This means that the libraries and packages required for one project can be managed independently of those for another project. This isolation helps avoid conflicts and ensures that each project has its own clean and consistent set of dependencies.
Important: The benefit of virtual environments lies in their isolation; each virtual environment operates independently. When a module (a Python extension) is installed, it resides exclusively within the designated environment. This ensures that if a particular library disrupts functionality in one project, the integrity of all other projects remains unaffected and secure.
Installing virtual environments in Python
Install and set up virtual environments, you can use the built-in venv module for Python. Here are the basic five steps:
1. Open a Terminal or Command Prompt
2. Navigate to Your Project Directory
3. Create a Virtual Environment
Run the following command to create a virtual environment named venv. You can replace venv with your preferred environment name.
If you're using Python 3.3 or earlier, you might need to use virtualenv instead:
4. Activate the Virtual Environment:
Activate the virtual environment based on your operating system.
On Windows:
On macOS/Linux:
You should see the virtual environment's name in your command prompt, indicating it's active.
5. Install Packages:
With the virtual environment activated, you can install Python packages using pip.
6. Deactivate the Virtual Environment:
When you're done working in the virtual environment, deactivate it.
The virtual environment's name should disappear from your command prompt.
A concrete example is shown below – directly using Pycharm and RStudio, which are the most used environments for advanced data analysis - such as OMICs analysis.
Installing Anaconda / Conda (Python)
Anaconda facilitates the creation and management of virtual environments. Virtual environments allow users to isolate project-specific dependencies, making managing different projects with potentially conflicting requirements easier.
To install Conda, follow the provided by:
and opt for the Anaconda distribution. Subsequently, add the complete path to the 'Scripts' directory in your Anaconda installation (replace the dots with the actual path corresponding to your installation location) to your PATH environment variable. If you encounter difficulties, feel free to reach out for assistance. You can confirm the successful addition by typing the following command into your command line:
Example in Pycharm
Here is a short example of how to set up the environment env in Pycharm and where to find the terminal. However, more detailed settings (step-by-step manual) are included in the following chapter ‘Getting started with Python’.
After opening Pycharm, we recommend working with the Terminal listed in it; see red rectangle.
Firstly, you can see that ‘C:\User..’ is not specified; there is no virtual environment. Therefore, click on File (2.) and choose Settings (3.) and Project (4.). You will see Python Interpreter, in which you can choose your environment and click on Apply:
Here, the Python 3.10 (venv) is chosen (6.). After, click on Apply (7.) and OK (8.):
After that, wait a minute to complete the skeletons update (9.) Next, open a new session in Terminal (10.).
Here, you can see that your virtual environment is set up correctly; venv – (11.):
Virtual environments and R: Example in RStudio
In the RStudio it is possible to set the virtual environment in Terminal next to Console. We mention this possibility here. However, it is not common to set it up in R when compared to Python, if one uses R for the OMICs analysis. Thus, just a brief remark.
Last updated