Getting started with Python
Installation of Python, Pycharm, Jupyter Notebook, Google Colab, and Python repositories and libraries
Last updated
Installation of Python, Pycharm, Jupyter Notebook, Google Colab, and Python repositories and libraries
Last updated
Python is an open-source and interpreted programming language. Python has become immensely popular due to its availability for data analysis. Its extensive ecosystem empowers data analysts to process, explore, and present data. Due to the open-source approach, Python has an active community that fosters continuous innovation, offering solutions and support for a wide range of data-related challenges.
The latest version of Python can be downloaded from the official Python website:
The process of installing Python is straightforward. You should download and run the installer from the official Python download page to get the latest version of Python for your system. You can refer to the Python documentation for more details on the installation process and getting started:
A lot of functionality is already build into the default installation of Python, but one of the real strengths of Python lies in the huge range of 3rd party packages that bring new capabilities. For example, standard python does not have a user friendly way of dealing with tabular data, but this is gracefully handled by the Pandas package.
By default with the installation of Python also came pip, the most popular python package manager. If we want to install Pandas for example, we can simply type in the command window (on Windows) or in the terminal (On Mac) (and hit Enter on the keyboard after typing/copying this command):
Note that if the above command does not work, on some versions of the operating systems you have to use pip3 instead of pip:
Throughout this guide, we will assume the standard pip command works on your system, but can always swap pip for pip3 if it does not work.
In the follow-up section, selected available programming platforms are presented. Each data analyst usually codes in a chosen environment. Among solutions most often used for OMICs analysis are the notebook format, e.g., Jupyter notebook or its online version operated by Google - Google Colab. In advanced OMICs analysis, you may frequently encounter platforms like PyCharm or Spyder IDE.
PyCharm stands out as a robust integrated development environment (IDE) meticulously crafted for Python development. Created by JetBrains, PyCharm equips Python developers and data scientists with an extensive set of tools encompassing coding, debugging, testing, and project management. Features such as intelligent code completion, advanced code navigation, and a built-in visual debugger, significantly boost productivity. Furthermore, the IDE extends its support to control systems such as Git. With its user-friendly interface and a rich array of features, PyCharm is a good solution for both novices and seasoned Python developers, streamlining the development workflow and facilitating the creation of top-notch Python applications.
The main advantage of Pycharm IDE for OMICs scientists is that it is very similar to RStudio.
You can easily install PyCharm by following the instructions on their official website:
Upon launching PyCharm, we opt for the "New Project" option in the initial window to create a new project:
In the subsequent window, we will designate the project's location to the directory representing the working path/folder where we intend to work.
It is essential to incorporate the previously configured interpreter (virtual environment) into PyCharm.
To achieve this, we choose the "Add Interpreter" option → "Add Local Interpreter." In the left column, we opt for the "Virtual Environment" choice, and on the right side, we select the "Existing" option for the Environment and then add the Interpreter (you will find more information in Virtual Environment - let's begin).
If you don't see "Add Local Interpreter", you have installed a newer version of Python IDE – which means that you have to set the Interpreter type based on the Environment which you have chosen (more information in Virtual Environment - let's begin):
JupyterLab is another web-based interactive computational environment for creating notebook documents. JupyterLab makes it easy to manage and run your Python code.
Installing JupyterLab is as simple as running the following command:
And to start JupyerLab:
Once JupyterLab is running, you should see the following screen in your web browser:
In the menu on the left (red arrow 1 in the screenshot above) you can navigate to the folder where you want to create your project. Once navigated to the desired location, click on the Python 3 button in the Launcher window (red arrow 2). A new window "Untitled.ipynb" will open:
The file "Untitled.ipynb" is now created in the selected folder. By right-clicking on the file in the explorer on the left (red arrow 1 on the screenshot above) the file can be renamed. Python code can be typed into the field indicated by arrow 2. Extra fields can be created by clicking on the button shown by arrow 3.
To get started, let's try loading the pandas library - again. Type the following command into the code field:
Then, when the cursor is still active on the code field, hit shift+Enter on the keyboard to run the code. You should see a number one appear before the code field, which indicates the code ran successfully.
Google Colab, short for Colaboratory, stands out as a cloud-based platform offered by Google, designed to facilitate collaborative and interactive Python programming. It provides a Jupyter Notebook environment that allows users to write and run Python code in a collaborative and shareable manner.
Google Colab integrates seamlessly with Google Drive, enabling easy storage and sharing of notebooks. Another notable feature of Colab is its provision of free GPU access, enhancing the efficiency of executing machine learning tasks - a significant benefit in predictive modeling. This platform proves invaluable for researchers, students, and data scientists, aligning well with our Gitbook.
If you have Gmail - navigate to your Google Drive:
Click on “+ New” button:
In “More”, you will find “Google Colaboratory”:
If you can not find it, add it using “Connect more apps” –> “Install”. Next, close it and look at the bar, again.
Click on Google Colaboratory, and you will see a new ‘.ipynb’ file where you can write your code:
For example, loading the pandas library which we will use extensively in the following OMICs analysis:
The last Python coding environment mentioned here is the Spyder IDE. Spyder is an integrated development environment (IDE) designed for scientific computing with a focus on Python. Installing Spyder is a straightforward process. Users can typically obtain Spyder by installing the Anaconda environment distribution – see previous subchapter Virtual Environment - let's begin.
For the installation, follow the guidelines from the website below:
Every programmer or data analyst has own preference for a coding environment. Throughout this book, we will frequently use PyCharm, Google Colab, and Jupyter Notebook for Python. We have chosen these tools not only for their availability but also because of their intuitive interfaces, aligning with programming standards such as PEP-8 ().