> For the complete documentation index, see [llms.txt](https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/performing-fundamental-operations-on-omics-data-using-r/fundamental-data-structures.md).

# Fundamental data structures

## Vectors in R

A vector is the most basic R object used to store data. Importantly, in R, indexing of a vector starts from '1', not from '0' (which is opposite to Python). Vectors contain homogenous data types. In R, two types of vectors can be distinguished:&#x20;

1. (Atomic) vectors:

* Character (string) \<chr>

Each element of such a vector is a string of one or more characters, e.g.:

```r
y <- c('a', 'b', 'a1b', 'a2b', 'abc', 'my', 'favorite', 'vector')

# It is the same as:

y <- c("a", "b", "a1b", "a2b", "abc", "my", "favorite", "vector")
```

* Integer \<int>

Each element of such a vector is an integer (a whole number, not a fraction) or NA, e.g.:

```r
y <- c(1L, 100L, 5L, 4L, NA)

# or

y <- c(1,4,5,5,-5,-10, 0)
```

The `L` allows differing integers from numeric vectors.

* Double \<dbl> or numeric \<num>

Each element of such a vector can contain a number which can be double type, but also values like NA, NaN, Inf, -Inf

```r
y <- c(1.2, 3.5, 5, NA, Inf, -Inf, NaN)
```

* Logical \<logi>

This vector contains TRUE, FALSE, or NA entries, e.g.:

```r
y <- c(TRUE, FALSE, TRUE, TRUE, FALSE, NA)
```

* Complex \<cplx>

This vector type allows for storing numbers with imaginary components, e.g.:

```r
y <- c(1+0i, 2+2i, 3+0i, 5-0i, 6+6i, -10+100i)
```

* Raw \<raw>

The raw type vector is intended to hold raw bytes, e.g.:

```r
y <- c(00, 00, 00)
```

2. Recursive vectors:

* list

In R, lists contain heterogeneous elements. Lists can store, for example, numeric vectors, integer vectors, strings, and matrices, as well as other lists inside of one list.

The vector types that are often used in lipidomics and metabolomics data are character and double (numeric) vectors and recursive vectors - lists. In some cases, we may also need integer or logical vectors.

## Other data types used in R for OMICs analysis

* matrix (matrices) - we can think of a matrix as a vector with two-dimensional shape information (e.g., all lipid or metabolite concentrations only).
* data frames (and tibbles) - are lists with heterogeneous vector elements of the same length (e.g. the whole set of lipidomics or metabolomics data containing sample names, biological groups, clinical data, and concentrations of lipids/metabolites). By separating *a column* of a data frame, we can obtain *a vector*. In tidy data frames, one column represents one variable (a feature, lipid concentration, metabolite concentration, gender, age, tumor grade, smoking status, etc.), every row represents one observation (one patient for whom all variables are collected in columns), and values are in cells.

Both - matrices and data frames (tibbles) will be used by us while working with lipidomics and metabolomics data sets.

Further reading about data types in R:

{% embed url="<https://www.r-bloggers.com/2021/09/r-data-types/>" %}

{% embed url="<https://www.stat.berkeley.edu/~nolan/stat133/Fall05/lectures/>" %}
Class Notes by Deborah Nolan and Duncan Temple Lang (UC Berkeley). Select: Data Types and Structures in R.
{% endembed %}

## Factors in R

Factors \<fct> are categorical variables in R (or grouping variables). This data format is widely used in statistics. The factors are labels used to denote biological groups in -omics data. In the case of our clinical data example, we will change the `Label` column from character to factor. The `Label` column contains data on a biological group type that every sample belongs to, e.g., N - healthy volunteers,  PAN - patients with pancreatitis, and T - patients with pancreatic cancer. Factors have a limited set of values. Factors in R are stored as a vector of integer values, and character values are displayed when a vector containing factors is called. More information about factors can be found here:

{% embed url="<https://www.stat.berkeley.edu/~s133/factors.html>" %}
Statistics at UC Berkeley - further reading about R factors.
{% endembed %}

{% embed url="<https://stat.ethz.ch/R-manual/R-devel/doc/manual/R-intro.html#Factors>" %}
ETH Zurich - Introduction to R (about factors).
{% endembed %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/performing-fundamental-operations-on-omics-data-using-r/fundamental-data-structures.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
