💪
Omics data visualization in R and Python
  • Introduction
    • From Authors
    • Virtual environments - let's begin
    • Getting started with Python
    • Getting started with R
    • Example data sets
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R
    • Fundamental data structures
    • Loading data into R
    • Preferred formats in metabolomics and lipidomics analysis
    • Preprocess data type using Tidyverse package
    • Useful R tricks and features in OMICs mining
      • Application of pipe (%>%) functions
      • Changing data frames format with pivot_longer()
      • Data wrangling syntaxes useful in OMICs mining
      • Writing functions in R
      • The 'for' loop in R (advanced)
  • PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING PYTHON
    • Fundamental data structures
    • Loading data into Python
  • Missing values handling in R
    • Missing values – Introduction
    • Detecting missing values (DataExplorer R package)
    • Filtering out columns containing mostly NAs
    • Data imputation by different available R libraries
      • Basic data imputation in R with dplyr and tidyr (tidyverse)
      • Data imputation using recipes library (tidymodels)
      • Replacing NAs via k-nearest neighbor (kNN) model (VIM library)
      • Replacing NAs via random forest (RF) model (randomForest library)
  • Missing values handling in Python
    • Detecting missing values
    • Filtering out columns containing mostly NAs
    • Data imputation
  • Data transformation, scaling, and normalization in R
    • Data normalization in R - fundamentals
    • Data normalization to the internal standards (advanced)
    • Batch effect corrections in R (advanced)
    • Data transformation and scaling - introduction
    • Data transformation and scaling using different available R packages
      • Data transformation and scaling using mutate()
      • Data transformation and scaling using recipes R package
      • Data Normalization – bestNormalize R package
  • Data transformation, scaling, and normalization in Python
    • Data Transformation and scaling in Python
  • Metabolites and lipids descriptive statistical analysis in R
    • Computing descriptive statistics in R
    • Using gtsummary to create publication-ready tables
    • Basic plotting in R
      • Bar charts
      • Box plots
      • Histograms
      • Density plots
      • Scatter plots
      • Dot plots with ggplot2 and tidyplots
      • Correlation heat maps
    • Customizing ggpubr and ggplot2 charts in R
    • Creating interactive plots with ggplotly
    • GGally for quick overviews
  • Metabolites and lipids descriptive statistics analysis in Python
    • Basic plotting
    • Scatter plots and linear regression
    • Correlation analysis
  • Metabolites and lipids univariate statistics in R
    • Two sample comparisons in R
    • Multi sample comparisons in R
    • Adjustments of p-values for multiple comparisons
    • Effect size computation and interpretation
    • Graphical representation of univariate statistics
      • Results of tests as annotations in the charts
      • Volcano plots
      • Lipid maps and acyl-chain plots
  • Metabolites and lipids univariate statistical analysis in Python
    • Two sample comparisons in Python
    • Multi-sample comparisons in Python
    • Statistical annotations on plots
  • Metabolites and lipids multivariate statistical analysis in R
    • Principal Component Analysis (PCA)
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
    • Partial Least Squares (PLS)
    • Orthogonal Partial Least Squares (OPLS)
    • Hierarchical Clustering (HC)
      • Dendrograms
      • Heat maps with clustering
      • Interactive heat maps
  • Metabolites and lipids multivariate statistical analysis in Python
    • Principal Component Analysis
    • t-Distributed Stochastic Neighbor Embedding
    • Uniform Manifold Approximation and Projection
    • PLS Discriminant Analysis
    • Clustered heatmaps
  • OMICS IN MACHINE LEARNING APPROACHES IN R AND PYTHON
    • Application of selected models to OMICs data
    • OMICs machine learning – Examples
  • References
    • Library versions
Powered by GitBook
On this page
  • Vectors in R
  • Other data types used in R for OMICs analysis
  • Factors in R
  1. PERFORMING FUNDAMENTAL OPERATIONS ON OMICs DATA USING R

Fundamental data structures

A part of preparing data for analysis and visualization in OMICs analysis

Vectors in R

A vector is the most basic R object used to store data. Importantly, in R, indexing of a vector starts from '1', not from '0' (which is opposite to Python). Vectors contain homogenous data types. In R, two types of vectors can be distinguished:

  1. (Atomic) vectors:

  • Character (string) <chr>

Each element of such a vector is a string of one or more characters, e.g.:

y <- c('a', 'b', 'a1b', 'a2b', 'abc', 'my', 'favorite', 'vector')

# It is the same as:

y <- c("a", "b", "a1b", "a2b", "abc", "my", "favorite", "vector")
  • Integer <int>

Each element of such a vector is an integer (a whole number, not a fraction) or NA, e.g.:

y <- c(1L, 100L, 5L, 4L, NA)

# or

y <- c(1,4,5,5,-5,-10, 0)

The L allows differing integers from numeric vectors.

  • Double <dbl> or numeric <num>

Each element of such a vector can contain a number which can be double type, but also values like NA, NaN, Inf, -Inf

y <- c(1.2, 3.5, 5, NA, Inf, -Inf, NaN)
  • Logical <logi>

This vector contains TRUE, FALSE, or NA entries, e.g.:

y <- c(TRUE, FALSE, TRUE, TRUE, FALSE, NA)
  • Complex <cplx>

This vector type allows for storing numbers with imaginary components, e.g.:

y <- c(1+0i, 2+2i, 3+0i, 5-0i, 6+6i, -10+100i)
  • Raw <raw>

The raw type vector is intended to hold raw bytes, e.g.:

y <- c(00, 00, 00)
  1. Recursive vectors:

  • list

In R, lists contain heterogeneous elements. Lists can store, for example, numeric vectors, integer vectors, strings, and matrices, as well as other lists inside of one list.

The vector types that are often used in lipidomics and metabolomics data are character and double (numeric) vectors and recursive vectors - lists. In some cases, we may also need integer or logical vectors.

Other data types used in R for OMICs analysis

  • matrix (matrices) - we can think of a matrix as a vector with two-dimensional shape information (e.g., all lipid or metabolite concentrations only).

  • data frames (and tibbles) - are lists with heterogeneous vector elements of the same length (e.g. the whole set of lipidomics or metabolomics data containing sample names, biological groups, clinical data, and concentrations of lipids/metabolites). By separating a column of a data frame, we can obtain a vector. In tidy data frames, one column represents one variable (a feature, lipid concentration, metabolite concentration, gender, age, tumor grade, smoking status, etc.), every row represents one observation (one patient for whom all variables are collected in columns), and values are in cells.

Both - matrices and data frames (tibbles) will be used by us while working with lipidomics and metabolomics data sets.

Further reading about data types in R:

Factors in R

Factors <fct> are categorical variables in R (or grouping variables). This data format is widely used in statistics. The factors are labels used to denote biological groups in -omics data. In the case of our clinical data example, we will change the Label column from character to factor. The Label column contains data on a biological group type that every sample belongs to, e.g., N - healthy volunteers, PAN - patients with pancreatitis, and T - patients with pancreatic cancer. Factors have a limited set of values. Factors in R are stored as a vector of integer values, and character values are displayed when a vector containing factors is called. More information about factors can be found here:

PreviousExample data setsNextLoading data into R

Last updated 4 months ago

R Data Types | R-bloggersR-bloggers
STAT 133
Class Notes by Deborah Nolan and Duncan Temple Lang (UC Berkeley). Select: Data Types and Structures in R.
Factors in R
Statistics at UC Berkeley - further reading about R factors.
An Introduction to R
ETH Zurich - Introduction to R (about factors).
Logo
Logo