Constructing Frequency and Relative Frequency Distribution Tables with gt and dplyr Packages.

The purpose of this article is to demonstrate how to construct both frequency and relative frequency distribution tables with gt and dplyr packages. In this article, I will use Boston Housing dataset from MASS package (specifically the median value of owner-occupied homes in $1000s).

**Definition**

A **frequency distribution** is a table that shows classes or intervals of data entries with counts of the number of entries in each class. The **frequency** ** f** of a class is the number of data entries in the class.

A **relative frequency** of a class is the portion or percentage of the data that falls in that class. To find the relative frequency of a class, divide the frequency ** f** by the sample size

** Rel. Frequency** = \(\frac{Class frequency}{Sample size}\) = \(\frac{f}{n}\)

The **cumulative frequency** of a class is the sum of the frequency for that class and all previous classes. The cumulative frequency of the last class is equal to the sample size ** n**.

```
library(MASS)
library(gt)
library(dplyr)
library(ggplot2)
data(Boston) # Download Boston Housing data from MASS package.
housing_data <- factor(cut(Boston$medv, breaks=nclass.Sturges(Boston$medv))) # Create classes of data by using Sturge's formula and convert these classes into factors.
house_data <- as.data.frame(table(housing_data)) %>% # Convert dataset into a data frame.
transform(relative = round(prop.table(Freq), digits = 3), # Round values to three decimal places.
Percentages = round(prop.table(Freq), # Calculate cumFreq, proportions and percentages.
digits = 3) * 100)
gt_data <- house_data %>% # Subset the dataset.
rename(Classes = "housing_data", `Rel. Freq` = "relative") %>% # Rename the columns.
add_row(Classes = "Total", Freq = sum(.$Freq), # Add the row totals.
`Rel. Freq` = sum(.$`Rel. Freq`),
Percentages = sum(.$Percentages)) %>%
gt() %>% # Initiate gt table.
tab_header(
title = "Frequency and Relative Frequency Distribution Table(s)",
subtitle = "Constructed with gt and dplyr Packages" # Add the title and subtitle.
) %>%
tab_options(heading.background.color = "darkgreen", # Change the heading background color to dark green.
column_labels.background.color = "grey", # Change columns background color to grey.
table.width = "100%") %>% # Maximize table width (100%).
tab_style(
style = cells_styles(
bkgd_color = "black", # Make the background of the last row black.
text_weight = "bold", # Make the font bold.
text_color = "white"), # Change text color to white.
locations = cells_data(
columns = vars(Classes, Freq, `Rel. Freq`, Percentages),
rows = Classes == "Total" & Freq == 506 &
`Rel. Freq` == 1.000 # Tell gt package where to apply the above changes.
& Percentages == 100.0)
) %>%
cols_align(align = "center", # Center column values.
columns = TRUE) %>%
tab_source_note(
source_note = "From https://www.rengdatascience.io; by Alier Ëë Reng, 02/26/2019"
) # Add citation information.
gt_data # Print the distribution table.
```

Frequency and Relative Frequency Distribution Table(s) | ||||
---|---|---|---|---|

Constructed with gt and dplyr Packages | ||||

Classes | Freq | Rel. Freq | Percentages | |

(4.96,9.5] | 22 | 0.043 | 4.3 | |

(9.5,14] | 55 | 0.109 | 10.9 | |

(14,18.5] | 85 | 0.168 | 16.8 | |

(18.5,23] | 154 | 0.304 | 30.4 | |

(23,27.5] | 84 | 0.166 | 16.6 | |

(27.5,32] | 39 | 0.077 | 7.7 | |

(32,36.5] | 29 | 0.057 | 5.7 | |

(36.5,41] | 7 | 0.014 | 1.4 | |

(41,45.5] | 10 | 0.020 | 2.0 | |

(45.5,50] | 21 | 0.042 | 4.2 | |

Total | 506 | 1.000 | 100.0 | |

From https://www.rengdatascience.io; by Alier Ëë Reng, 02/26/2019 |

For attribution, please cite this work as

Reng (2019, Feb. 27). Reng Data Science Institute: Frequency Distribution Table with a gt Package. Retrieved from https://www.rengdatascience.io/posts/2019-02-27-a-frequency-distribution-with-a-gt-package/

BibTeX citation

@misc{reng2019frequency, author = {Reng, Alier Ëë}, title = {Reng Data Science Institute: Frequency Distribution Table with a gt Package}, url = {https://www.rengdatascience.io/posts/2019-02-27-a-frequency-distribution-with-a-gt-package/}, year = {2019} }