# Using gtsummary to create publication-ready tables

## Basic gtsummary table

Except for visualizations, tables can be used to present the most interesting alterations in features' levels or to summarize your data set (e.g., clinical parameters). Check out the following example:

* D. Zhu et al. Lipidomics Profiling and Risk of Coronary Artery Disease in the BioHEART-CT Discovery Cohort. DOI: <https://doi.org/10.3390/biom13060917> - Table 1.

The gtsummary package via the \`tbl\_summary()\` function can create beautiful, publication-ready tables filled with descriptive statistics for the most important features. Here, we will show you how to use gtsummary to prepare these tables. More information about the gtsummary can be found here:

{% embed url="<https://www.danieldsjoberg.com/gtsummary/#summary-table>" %}
Introduction to the gtsummary package.
{% endembed %}

The gtsummary works with all functions from the tidyverse collection. Let's assume the situation that we already know based on the exploratory analysis that in our lipidomics data set we have interesting alterations in long-chain sphingomyelin (SM) profiles in PDAC. We want to present these trends in the manuscript as a table. We decided to show the data as a median with an interquartile range (IQR) and apply a non-parametric test to compare all distributions (in this case the Kruskal-Wallis test). A publication ready-table can be prepared using nearly a single line of code(!):

```r
# Installation of gtsummary:
install.packages("gtsummary")

# Calling library:
library(gtsummary)

# Investigate the tbl_summary():
?tbl_summary()

# Creating publication-ready table via tbl_summary:
data %>%
  select(`Label`, `SM 39:1;O2`, starts_with("SM 4")) %>%
  tbl_summary(by = `Label`) %>%
  add_p()
  
# Explanations:
# From the 'data' in the global environment, select columns:
# 1. `Label`,`SM 39:1;O2`,
# 2. All columns whose names start with 'SM 4'.
# 3. Pipe them to tbl_summary() function, which groups by 'Label' column and computes median with IQR.
# 4. Pipe the results to add_p().
# 4. Compare all outcomes using the KW statistical test (add_p()).
```

This elegant table is obtained in the effect:

<figure><img src="/files/Eoblt8j07xaDkCSIM8gd" alt=""><figcaption><p>Table generated through tbl_summary (gtsummary library) comparing medians with IQR of sphingomyelin concentrations in plasma for healthy volunteers (N), patients with pancreatitis (PAN), and patients with pancreatic cancer (PDAC) - example.</p></figcaption></figure>

The tbl\_summary() is very flexible in terms of customization. Suppose we would like to present results as mean concentration with standard deviation rounded to the first decimal place, and ANOVA test for the comparison of these three means. Also, we do not particularly like the column name: 'Characteristic', and we want to change it to 'SM species' in bold. We can use the following code:

```r
# Preparing table with mean (sd) and ANOVA for every lipid:
data %>%
  select(`Label`, `SM 39:1;O2`, starts_with("SM 4")) %>%
  tbl_summary(by = `Label`,
              statistic = all_continuous() ~ c("{mean} ({sd})"),
              digits = all_continuous() ~ 1) %>%
  add_p(all_continuous() ~ 'aov') %>%
  modify_header(label = "**SM species**")
```

The output:

<figure><img src="/files/LjIvzItYxtHS8EoUNOud" alt=""><figcaption><p>Table generated through tbl_summary (gtsummary library) comparing means with standard deviations of sphingomyelin concentrations in plasma for healthy volunteers (N), patients with pancreatitis (PAN), and patients with pancreatic cancer (PDAC) - example.</p></figcaption></figure>

The tbl\_summary() function is constructed in the following way:

<figure><img src="/files/do0JwqrcscuIIaCouA6q" alt=""><figcaption><p>Construction of the tbl_summary() function. In red frames - arguments expecting list().</p></figcaption></figure>

<figure><img src="/files/9WN0ExdIdFODdBK5tXQ8" alt=""><figcaption><p>How to use list() to create a summary table with tbl_summary() from gtsummary package.</p></figcaption></figure>

As you already realize, if one operation is to be performed, it is not necessary to use a list(). In the example above, we will change the type of `SM 39:1;O2` to 'continuous2' only - and we do not need to use a list. In turn, while computing statistics, first we want to compute summary for all\_continuous() variables, and in the next step, we will compute statistics for `SM 39:1;O2` which is now a 'continuous2' variable - in this case, we need to merge all operations into one list. Based on the figure above, you also see that on the left side, we define what variables should be affected (red frame), and on the right - how they should be affected. To define what variables should be affected - we can use the column name, e.g. `SM 39:1;O2`, or we can immediately affect all variables of the same type via all\_continuous(). To separate what variables should be affected from how they should be affected, a "tilde" symbol is used: "\~":

```r
all_continuous() ~ c("{mean}, {sd}"))
```

Now, look at the right side. As we compute more than one type of statistics, we need to define all of them in c(). Functions in the tbl\_summary() are usually defined in {}, e.g. {mean} or {sd}. Finally, we can define how the statistics will be presented in the table, e.g. the form:&#x20;

```r
c("{function_1} ({function_2})")
```

will result in:

**output\_1 (output\_2)**

for example:

```r
c("{mean} ({sd})")
```

is:

**mean value (standard deviation) -> 4.7 (1.7)**

While:

```r
c("{function_1}, {function_2}")
```

is:

**output\_1, output\_2**

for example:

```r
c("{mean}, {sd}")
```

is:

**mean value, standard deviation -> 4.7, 1.7**

What functions can be used to compute summary statistics? According to the function documentation (?tbl\_summary()), there is a long list of summary statistics the tbl\_summary() can compute for you for continuous variables, e.g.:

* `{median}` median
* `{mean}` mean
* `{sd}` standard deviation
* `{var}` variance
* `{min}` minimum
* `{max}` maximum
* `{sum}` sum
* `⁠{p##}⁠` any integer percentile, where `⁠##⁠` is an integer from 0 to 100
* `{foo}` any function of the form `foo(x)` is accepted where `x` is a numeric vector

For the categorical variables you can obtain:

* `{n}` frequency
* `{N}` denominator, or cohort size
* `{p}` formatted percentage

Moreover, for categorical and continuous variables, statistics regarding the number of missing and non-missing entries and their proportions can also be presented:

* `{N_obs}` total number of observations
* `{N_miss}` number of missing observations
* `{N_nonmiss}` number of non-missing observations
* `{p_miss}` percentage of observations missing
* `{p_nonmiss}` percentage of observations not missing

**One additional important matter:** as you have seen in the example above, we can change the type of continuous variable - concentrations of SM 39:1;O2 stored in the column `SM 39:1;O2` to continuous2 variable. **For continuous2 variables, you can show summaries in two or more table rows, e.g. mean with standard deviation and range (min, max) in two separate rows.**

Additional summary statistics or statistical tests can be also computed via the add\_...() functions, e.g. add\_ci() can be used to compute confidence intervals, or add\_p() to compute p-value from a statistical test.&#x20;

The add\_p() was used in the examples above too. Statistical tests can be selected via **test** argument in the add\_p() function:

```r
# Changing test type in the add_p() function:
add_p(..., test = 
       list(all_categorical() ~ "chisq.test",
            all_continuous() ~ "wilcox.test"))
```

Let's assume that we want to compare in the table only medians of long-chain SM concentrations measured for volunteers and patients with pancreatic cancer using the Wilcoxon rank sum test. The concentrations in the table should be shown as median with interquartile range (IQR). Here is the code:

```r
# Adjusting the add_p() function, example:
data %>%
  filter(Label == 'N' | Label == 'T') %>%
  droplevels() %>%                             # WE NEED TO DROP EMPTY FACTOR "PAN"!!!
  select(`Label`, `SM 39:1;O2`, starts_with("SM 4")) %>%
  tbl_summary(by = `Label`) %>%
  add_p(all_continuous() ~ 'wilcox.test')
```

The output:

<figure><img src="/files/glXODE036et652wtfmjs" alt=""><figcaption><p>Table comparing median concentrations (median with IQR) of long chain SM in healthy volunteers (N) and patients with pancreatic cancer (T) and the Wilcoxon rank sum test for comparing samples of distributions.</p></figcaption></figure>

Now, one more example - let's assume we want to perform the Wilcoxon rank sum test for all continuous variables except for SM 39:1;O2 - here we want to perform the classic *t*-test. We need to apply the following modifications to our code:

```r
data %>%
  filter(Label == 'N' | Label == 'T') %>%
  droplevels() %>%
  select(`Label`, `SM 39:1;O2`, starts_with("SM 4")) %>%
  tbl_summary(by = `Label`) %>%
  add_p(test = 
          list(all_continuous() ~ 'wilcox.test',
               `SM 39:1;O2` ~ 't.test'),
        test.args = `SM 39:1;O2` ~ list(var.equal = T))
```

Without specifying in test.args that variances are assumed equal (var.equal = T), which will be passed to the function computing *t*-test, a Welch test would be performed instead. The output:

<figure><img src="/files/kVsQDMarX7EtCdlzPfk5" alt=""><figcaption><p>Table comparing median concentrations (median with IQR) of long chain SM in healthy volunteers (N) and patients with pancreatic cancer (T) and the Wilcoxon rank sum test for comparing samples of distributions of all continuous variables except for SM 39:1;O2 - here the classic <em>t</em>-test was used.</p></figcaption></figure>

More examples of the application of the add\_p() function are also presented below.&#x20;

## Preparing gtsummary table for the complete patient information (advanced)

Now, let's use a different data set, where more clinical information is available. You can download it here:&#x20;

{% file src="/files/SycMVzhvAaFN0EDxapDS" %}
Lipidomics data set from the article by Robert Jirásko et al. published in *Cancers* (with minor adjustments)*.* \
DOI: *10.3390*/*cancers14194622.*
{% endfile %}

Read the data set into R as tibble 'data.ccRCC', and correct variable types (if necessary):&#x20;

* `Label` should be a factor,
* `gender` should be a factor,
* `Age` should be a numeric variable,&#x20;
* `BMI` should be a numeric variable,
* `Type of tumor` should be a factor,
* `Tumor grade` should be a factor,
* All lipid concentrations should be numeric.

Now, we want to summarize patients' information in a gtsummary table:

```r
# Reading the data set into R:
data.ccRCC <- readxl::read_xlsx(file.choose())

# Adjusting column types:
data.ccRCC$Label <- as.factor(data.ccRCC$Label)
data.ccRCC$Gender <- as.factor(data.ccRCC$Gender)
data.ccRCC$`Type of tumor`<- as.factor(data.ccRCC$`Type of tumor`)
data.ccRCC$`Tumor grade` <- as.factor(data.ccRCC$`Tumor grade`)

# Creating summary table for patients' information:
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              statistic = all_continuous() ~ c("{mean} ({sd})"),
              digits = all_continuous() ~ 1)
```

The simple tbl\_summary() function produces this output:

<figure><img src="/files/2ROzXV7XQSKQFCxDWCVl" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622.</em></p></figcaption></figure>

This table looks good, but it could benefit from further adjustments, e.g.:

* age and BMI provided only as mean and standard deviation could be characterized better, e.g., we could add at least the range of values for both parameters,
* the 'Characteristic' name of the first column is back,
* for continuous2 variables - no statistical test was applied to compare samples of populations - the *t*-test should be applied here,&#x20;
* also, we would be interested if Pearson’s Chi-squared test will detect differences in the numbers of male and female participants (carefully! - it is a dichotomous variable),
* no tests should be performed for `Type of tumor`, `Tumor grade`, and `Collected samples`,&#x20;
* if a statistical test is considered - we would also like to see the p-value corrected for multiple comparisons (Bonferroni correction),
* all labels should be in bold to clearly differentiate them from the rest of the entries,&#x20;
* let's assume that we are not interested in the number of missing entries,&#x20;
* for `Collected samples` - without a description of P, U, T - we can only assume to what sample types refer these annotations - we need to modify the footnote,
* for healthy volunteers - as we do not expect any entries in the `Type of tumor` or `Tumor grade` - we could remove the zero values.

Now, let's turn all these remarks into code step-by-step. Complementary to tbl\_summary() functions, add\_... and modify\_... allows extending the table and modifying its content. We will use some of them here.

We will begin by adding the ranges to means and standard deviations for `Age` and `BMI`. We need to change the type of `Age` and `BMI` from continuous to continuous2, so additional rows in the table can be added for these variables. Then, in the c() we can add additional functions we would like to use for these variables and a form in which the values should appear in the table. The code after modifications:

```r
# Adding range to `Age` and `BMI`:
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1)
```

And the output:

<figure><img src="/files/B3c98LUtuVXaXRmrhuKw" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622 -</em> with range for age and BMI.</p></figcaption></figure>

In the next step, the 'Characteristic' header should be substituted with 'Information'. Here, the modify\_... family of functions would be useful - the modify\_header() to be precise. We will need to access to label, change it, and bold it:

```r
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1) %>% 
  modify_header(label = '**Information**')
```

The application of \*\* before and after the new header title will bold it. The output:

<figure><img src="/files/dVj7nzS6twhLmCF4KGfz" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622 -</em> with a new header - "Information".</p></figcaption></figure>

Next, the statistical test results should be added. For all continuous2 variables (p-value from *t*-test). For `Gender`, Pearson’s Chi-squared test should be performed. For the remaining variables - no statistical test should be performed. If we do not exclude the variables from the testing, the add\_p() function will select and perform tests automatically. The obtained p-values should also be presented with the Bonferroni correction.

Adding additional columns with statistics can be achieved through add\_... functions, here: add\_p() and add\_q(). In the add\_p() function, we need to specify the *t*-test for all continuous2 variables and Pearson’s Chi-squared test for all dichotomous variables, and what variables should be excluded from the testing; test types are specified in the following way:

```r
all_continuous2() ~ 't.test'
```

or

```r
all_dichotomous() ~ 'chisq.test'
```

If we want to specify more than one test type in add\_p(), we need to additionally use a **test** argument in the add\_p() function and add all tests as a list():

```r
add_p(..., test = list(<here specify all tests>))
```

for example:

```r
add_p(..., test = list(all_continuous2() ~ 't.test', all_dichotomous() ~ 'chisq.test'))
```

We need to characterize the type of *t*-test. In this case, we will use a classic two-sample *t*-test, so we need to add the argument **test.args** in the add\_p() function:

```r
add_p(..., test.arg = all_continuous2() ~ list(var.equal = TRUE))
```

Finally, to exclude variables from testing we need to use the **include** argument, here:&#x20;

```r
add_p(..., include = -c(`Type of tumor`, `Collected samples`, `Tumor grade`))
```

To add the corrected p-value, we pipe the output to add\_q() function. In the add\_q() function, we can select the correction method through the **method** argument. We select 'bonferroni':

```r
add_q(method = 'bonferroni')
```

In summary, the code will look like this:

```r
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1) %>% 
  modify_header(label = '**Information**') %>%
  add_p(include=-c(`Type of tumor`,`Tumor grade`,`Collected samples`),
        test = list(all_continuous2() ~ 't.test',
                    all_dichotomous() ~ 'chisq. test'),
        test.args = all_continuous2() ~ list(var.equal = T)) %>%
  add_q(method = 'bonferroni')
```

And the output:

<figure><img src="/files/SUjbMmMhMlYpn4GRzEmq" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622 -</em> with statistical tests for <code>Gender</code>, <code>Age</code>, and <code>BMI</code>.</p></figcaption></figure>

In one step we will bold all labels (simply pipe output to bold\_labels()) and remove missing observations from the table (**missing** argument in tbl\_summary() set to 'no'):

```r
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1,
              missing = 'no') %>% 
  modify_header(label = '**Information**') %>%
  add_p(include=-c(`Type of tumor`,`Tumor grade`,`Collected samples`),
        test = list(all_continuous2() ~ 't.test',
                    all_dichotomous() ~ 'chisq. test'),
        test.args = all_continuous2() ~ list(var.equal = T)) %>%
  add_q(method = 'bonferroni') %>%
  bold_labels()
```

The output:

<figure><img src="/files/6cUDCaMqjBhAwzca6SYr" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622 -</em> with labels in bold and no missing observations presented in the table.</p></figcaption></figure>

Next, we want to modify the footnote to add the description of "P", "T", and "U". This is the first more complex task. We need to reference a specific cell of the final table so we can add the additional description in the footnote. The cell of interest is this one containing the label - Collected samples. The function which will be helpful here is modify\_table\_styling(). According to the documentation of this function, the arguments of this function enable accessing the tibble 'table\_body', which will then be printed as the final, publication-ready table. Let's save our final output from tbl\_summary() as a 'table'. The new object in the global environment appeared, and it is a list of 8. Now we can type:

```r
table$table_body
```

The output:

<figure><img src="/files/TRqXaW4IufP1P8nvysk1" alt=""><figcaption><p>The table_body tibble. This tibble is used to create the publication-ready table.</p></figcaption></figure>

To reference the cell containing the label 'Collected samples' in the table\_body, we can indicate to the modify\_table\_styling() function that we are interested in the column `label`, and in this column, we need to find a row containing the **label** named: 'Collected samples'. The argument **rows** uses **predicate expression** (TRUE/FALSE) to find out if a correct row is selected, i.e. label == 'Collected sample': returns TRUE or returns FALSE. Here is the code:

```r
... %>%
modify_table_styling(columns = label,
                     rows = label == "Collected samples")
```

If TRUE was returned (now, we reference our cell interest), then the footnote will be updated with a string supplied via the **footnote** argument:

```r
... %>%
modify_table_styling(columns = label,
                     rows = label == "Collected samples",
                     footnote = "P - plasma, T - tissue, U - urine")
```

The updated code to modify the footnote of our final table:

```r
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1,
              missing = 'no') %>% 
  modify_header(label = '**Information**') %>%
  add_p(include=-c(`Type of tumor`,`Tumor grade`,`Collected samples`),
        test = list(all_continuous2() ~ 't.test',
                    all_dichotomous() ~ 'chisq. test'),
        test.args = all_continuous2() ~ list(var.equal = T)) %>%
  add_q(method = 'bonferroni') %>%
  bold_labels() %>%
  modify_table_styling(
    columns = label,
    rows = label == "Collected samples",
    footnote = "P - plasma, T - tissue, U - urine")
```

The output of this code:

<figure><img src="/files/TTy8tPE4zztj0mjvC1LV" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622 -</em> with a modified footnote.</p></figcaption></figure>

Finally, we would like to remove the zero entries in `Type of tumor` and `Tumor grade`. We will substitute these entries with a minus symbol "-", meaning - no observations. Now, we need to introduce changes in the table body. We can do it via modify\_table\_body() function. Based on the documentation, this function can be used together with dplyr functions (like arrange(), mutate(), etc.) to introduce changes in the table body in the following way:

```r
modify_table_body(
.x %>%
<dplyr_function>

# For example:
modify_table_body(
.x %>%
arrange(variable)
```

Here, we will need to use it together with mutate() across all columns containing statistical summary (all\_stat\_cols() from the gtsummary). Now, we would need a tool that could recognize every string in these columns starting with 0. and then containing any number of characters until whitespace, so we can change it into "-". Such a function is gsub(). The gsub() function application is relatively simple:

```
gsub(pattern, replacement, in what vector or data frame)
```

The gsub understands regular expressions also known as regex. Regular expressions are sequences of characters describing certain patterns in a text. This sentence: *every string starting with 0. and then containing any number of characters until a whitespace* is represented by the following regex: **^0.\***

Therefore, our gsub function can be modified to:

```r
gsub("^0.*", "-", x)
```

If we implement the gsub() function into mutate() and modify\_table\_body() we obtain:

```r
modify_table_body(
    ~.x %>% 
      mutate(across(all_stat_cols(), ~gsub("^0.*", "-", .x)))
    )
```

And merging it with our code into a final form:

```r
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1,
              missing = 'no') %>% 
  modify_header(label = '**Information**') %>%
  add_p(include=-c(`Type of tumor`,`Tumor grade`,`Collected samples`),
        test = list(all_continuous2() ~ 't.test',
                    all_dichotomous() ~ 'chisq. test'),
        test.args = all_continuous2() ~ list(var.equal = T)) %>%
  add_q(method = 'bonferroni') %>%
  bold_labels() %>%
  modify_table_styling(
    columns = label,
    rows = label == "Collected samples",
    footnote = "P - plasma, T - tissue, U - urine") %>%
  modify_table_body(
    ~.x %>% 
      mutate(across(all_stat_cols(), ~gsub("^0.*", "-", .x)))
    )
```

Our final table looks like this:

<figure><img src="/files/sS3U8v95GyzaNBsLRmQJ" alt=""><figcaption><p>Table summary of patient information from the study published by Jirásko et al.<br>DOI: <em>10.3390</em>/<em>cancers14194622 -</em> final version.</p></figcaption></figure>

## Exporting the final table with a gt package

To export the final table from RStudio, we will need to install the package called 'gt'.&#x20;

```r
# Installing gt package
install.packages('gt')

# Calling the library
library(gt)
```

We can create the following formats in our working directory:

* .html
* .png
* .pdf
* .tex, .rnw
* .rtf
* .docx

To save your chart, add these lines of code:

```r
... %>%
as_gt %>%
  gt::gtsave(filename = 'table.png') # Under filename, specify the name and format
```

For our table from the previous example, we would add:

```r
data.ccRCC %>%
  select(`Label`,
         `Gender`,
         `Age`,
         `BMI`,
         `Type of tumor`,
         `Tumor grade`,
         `Collected samples`) %>%
  tbl_summary(by = `Label`,
              type = list(`Age` ~ 'continuous2',
                          `BMI` ~ 'continuous2'),
              statistic = list(`Age` ~ c("{mean} ({sd})", "{min}, {max}"),
                               `BMI` ~ c("{mean} ({sd})", "{min}, {max}")),
              digits = all_continuous() ~ 1,
              missing = 'no') %>% 
  modify_header(label = '**Information**') %>%
  add_p(include=-c(`Type of tumor`,`Tumor grade`,`Collected samples`),
        test = list(all_continuous2() ~ 't.test',
                    all_dichotomous() ~ 'chisq. test'),
        test.args = all_continuous2() ~ list(var.equal = T)) %>%
  add_q(method = 'bonferroni') %>%
  bold_labels() %>%
  modify_table_styling(
    columns = label,
    rows = label == "Collected samples",
    footnote = "P - plasma, T - tissue, U - urine") %>%
  modify_table_body(
    ~.x %>% 
      mutate(across(all_stat_cols(), ~gsub("^0.*", "-", .x)))
    ) %>%
  as_gt %>%
  gt::gtsave(filename = 'table_example.png')
```

The plot will be ready in your working directory (wd). You can always check your current working directory in this way:

```r
# Checking your current working directory (wd):
getwd()
```

## Additional references

We highly recommend watching this lecture by Daniel D. Sjoberg:

{% embed url="<https://www.youtube.com/watch?ab_channel=DanielSjoberg&t=507s&v=tANo9E1SYJE>" %}
Lecture on gtsummary by Daniel D. Sjoberg
{% endembed %}

But also reading and citing the paper published in The R Journal:

{% embed url="<https://journal.r-project.org/archive/2021/RJ-2021-053/index.html>" %}
Manuscript in The R Journal presenting gt summary. DOI: *10.32614*/*RJ*-*2021-053.*&#x20;
{% endembed %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python/metabolites-and-lipids-descriptive-statistical-analysis-in-r/using-gtsummary-to-create-publication-ready-tables.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
