# Fundamental data structures

* **Integer (`int`)**: Whole numbers, e.g., `x = 10`
* **Floating Point (`float`)**: Decimal numbers, e.g., `y = 3.14`
* **String (`str`)**: Sequence of characters, e.g., `text = "Metabolites ID"`
* **Boolean (`bool`)**: Logical values, `True` or `False` (This can be useful to indicate if a sample or metabolite should be selected from a data table)
* **None (NoneType)**: None is used to define a null value, or no value at all.

### Lists in Python

A **list** is the most basic data structure in Python used to store data. Unlike R, Python follows **zero-based indexing**, meaning the first element in a list is accessed with index `0`. Lists in Python can hold heterogeneous data types (mixed types of data in a single list).

**Creating a List**

Lists are defined using square brackets `[]` and elements are separated by commas:

```python
# List containing strings
lipids_list = ["Cholesterol", "PC 34:1", "PC 34:2", "TG 54:2"]

# List containing integers
int_list = [1, 100, 5, 4]

# List containing floating-point numbers (doubles)
float_list = [1.2, 3.5, 5.78, float('inf'), float('-inf'), float('nan')]

# List containing boolean values
bool_list = [True, False, True, True, False]

print(lipids_list)
print(int_list)
print(float_list)
print(bool_list)
```

Lists are highly flexible and allow modifications, including adding, removing, and modifying elements.

**Accessing Elements in a List**

Elements in a list can be accessed using indexing:

```python
first_element = lipids_list[0]  # Accessing first element
last_element = lipids_list[-1]  # Accessing last element
subset = lipids_list[1:4]  # Slicing from index 1 to 3
print(first_element)
print(last_element)
print(subset)
```

and elements at a position can be overwritten by a new value:

```python
lipids_list[0] = "Ergosterol" # changing the first element in lipids_list 
print(my_list)
```

### Tuples in Python

A **tuple** is similar to a list, but it is immutable (cannot be changed after creation). Tuples are defined using parentheses `()`.

```python
my_tuple = ("a", "b", "c")
num_tuple = (1, 2, 3, 4, 5)
```

**Accessing Elements in a Tuple**

Similar to lists, elements in a tuple can be accessed using indexing:

```python
first_element = my_tuple[0]
last_element = my_tuple[-1]
print(first_element)
print(last_element)
```

### Sets in Python

A **set** is an unordered collection of unique elements, defined using curly braces `{}`.

```python
my_set = {"a", "b", "c", "a"}
print(my_set) #output: {'a', 'b', 'c'} (duplicates are removed)
```

### Dictionaries in Python

A **dictionary** is a collection of key-value pairs, similar to named lists in R. It is defined using curly braces `{}` with keys and values separated by colons `:.`

```python
mass_dict = {
    "PC 34:0": 761.5935,
    "PC 34:1": 759.5778,
    "PC 34:2": 757.5622,
    "PC 34:3": 755.5465
}
```

**Accessing Elements in a Dictionary**

Elements in a dictionary can be accessed using keys:

```python
mz_34_0 = mass_dict["PC 34:0"]  # Accessing the value associated with key 'PC 34:0'
mz_34_1 = mass_dict.get("PC 34:1")  # Another way to access a value safely
print(mz_34_0)
print(mz_34_1)
```

Note that using square brackets for accessing values in a dictionary results in an error if the provided key isn't present in the dictionary. The .get() method however will not raise an error if the key isn't present but it will return the value *None*.

```python
mz_34_0 = mass_dict["PC 36:6"]  #This will give a KeyError since "PC 36:6" is not present in the dictionary
```

```python
mz_34_1 = mass_dict.get("PC 36:1")  # This will not give an error but will asign the value None to the variable
print(mz_34_1) # output: None
```

### Arrays in Python

Python does not have built-in support for arrays like R does for vectors, but arrays can be created using the `array` module or `NumPy`.

```python
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
```

Arrays are useful for numerical operations and are more efficient than lists for large datasets. Similar to lists they can be indexed with square brackets to access or overwrite values at a specified position:

```python
first_element = arr[0] #access the first element of the array
print(first_element)

arr[0] = 10 #overwrite the first element of the array
print(arr)
```

### Other Data Types Used in Python for -Omics Analysis

* **NumPy Arrays:** Numpy arrays can be multi-dimensional, for example 2D, to represent matrices.
* **Pandas DataFrames:** Similar to R's data frames or tibbles, used for tabular data.
* **Pandas Series:** Equivalent to a single column in a DataFrame, similar to a vector in R.

```python
import pandas as pd

# Creating a DataFrame
data = {
    "Sample": ["S1", "S2", "S3"],
    "Lipid_Concentration": [12.5, 15.3, 18.1],
    "Group": ["Healthy", "Patient", "Healthy"]
}
df = pd.DataFrame(data)
```

#### Factors in Python (Categorical Data)

In Python, categorical data is handled using Pandas' `Categorical` type, similar to factors in R.

```python
df["Group"] = df["Group"].astype("category")
```

Categorical variables are useful for grouping and statistical analysis.

### Summary

In Python, lists, tuples, sets, and dictionaries are the core data structures. For handling large datasets, NumPy arrays and Pandas DataFrames are commonly used, especially in bioinformatics and -omics research. Categorical data can be represented using Pandas' `Categorical` type, aiding in statistical analysis and grouping.

Further reading on data structures in Python:

* [Python Lists and Tuples](https://docs.python.org/3/tutorial/datastructures.html)
* [NumPy for Scientific Computing](https://numpy.org/)
* [Pandas Documentation](https://pandas.pydata.org/docs/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/performing-fundamental-operations-on-omics-data-using-python/fundamental-data-structures.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
