Data normalization to the internal standards (advanced)
A part of data transformation & normalization
Last updated
A part of data transformation & normalization
Last updated
Normalizing to internal standards (IS) in mass spectrometry is crucial for ensuring accurate and reliable quantification of analytes. Ideally, we would use an IS for every analyte we measure; however, this is not feasible in large screening studies such as lipidomics. Instead, we use (at least) one deuterated IS for every lipid class we measure. Free software solutions for dealing with this issue exist, e.g., LipidQuant. You can read more about it in the following article:
Internal standards serve as reference points, helping to minimize technical variability, compensate for matrix effects, and correct for losses of analytes during sample preparation and fluctuations in the instrument response. These standards are added in the same amount to every sample at the beginning of the sample preparation process. You can find more about the importance of normalizing signals to internal standards, e.g., on the website of Lipidomics Standards Initiative:
Or in the following articles:
In the following section, we provide a simple code based on the tidyverse package in R to normalize your data to IS.
First, we call the tidyverse collection, specify the path to the working directory, and load the example dataset:
The input dataset should contain analytes/lipids in columns and samples in rows, with the first column containing sample identifiers, as shown in the picture below.
Normalizing to internal standards (IS) in mass spectrometry is crucial for ensuring accurate and reliable quantification of analytes. Ideally, we would use an IS for every analyte we measure; however, this is not feasible in large screening studies such as lipidomics. Instead, we use (at least) one deuterated IS for every lipid class we measure. Free software solutions for dealing with deisotoping and lipid concentrations calculation exist, e.g., LipidQuant. You can read more about it in the following article:
Internal standards serve as reference points, helping to minimize technical variability, compensate for matrix effects, and correct for losses of analytes during sample preparation and fluctuations in the instrument response. These standards are added in the same amount to every sample at the beginning of the sample preparation process. You can find more about the importance of normalizing signals to internal standards, e.g., on the website of Lipidomics Standards Initiative:
In the following section, we provide a simple code based on the tidyverse package in R to normalize your data to IS.
First, we call the tidyverse collection, specify the path to the working directory, and load the dataset:
And the code block:
The input dataset should contain analytes/lipids in columns and samples in rows, with the first column containing sample identifiers, as shown in the picture below:
To work with the data further, we need to transform it into a tidy table form. We will use gather() function for this.
We will need additional information for our calculations: which analytes are internal standards and what lipid classes they belong to.
It's important to note that in this example, we expect to use one internal standard for every lipid class. However, there may be situations where we use more than one IS per group. In such cases, we must specify subgroups within the class and associate specific standards accordingly.
In this dataset, our internal standards are all included in the name pattern "_IS". We strongly encourage using a specific annotation pattern for internal standards during analysis to help identify them in large datasets. Here, we use the grep() function to find internal standards and create a new column named "IS", containing logical TRUE/FALSE information about the type of analyte.
The last part of the data preparation involves extracting information about the lipid class. This can be done in several ways, typically depending on the type of lipid annotation your lab uses. In this example, we use a simple approach by extracting everything before a symbol "(". If you require a different approach, we recommend checking the chapter in this Gitbook (Metabolites and Lipids Univariate Statistics in R -> Graphical representation of univariate statistics -> Lipid maps and acyl chain plots), where the detailed method of extracting annotation from lipid names is provided.
After preparing our table, we can proceed with the calculations. First, we will prepare a function that divides the intensity of an analyte by the intensity of a corresponding IS. When the function does not find the internal standard, it leaves the intensity as it was.
We will split our data into a list containing small data groups to perform calculations. Data are grouped by the Samples, and Lipid Class.
Finally, we apply the prepared function normalize_intensity_IS() to the grouped_data. After applying the function to each element of grouped_data, the results are combined into a single data structure. Here, do.call() is used to call the function rbind() (which binds rows together) with the list data_normalized. This effectively combines all the results into a single data structure.
At this point, we have normalized the signal intensities. We can quickly check if the function performed correctly by examining the Intensity_IS column in the data_normalized table, where the IS should always equal 1.
To calculate the concentration of analytes based on the IS concentration, we need to input additional data in the form of a .csv table. This table stores the concentration of every IS used in the batch. The concentration values are analysis-specific and must be consulted with an analytical chemist.
This table should contain columns LipClass and IS_conc. In the next step, we will merge tables by the column LipClass, to add the concentration information to every group. Then, we will simply multiply the column Intensity_IS wit the newly added column IS_conc.
To prepare the output table, we have to filter the columns we need, reshape the data, excluding internal standards, and save the reshaped data to a CSV file.