Frequency Distribution Table with a gt Package

Constructing Frequency and Relative Frequency Distribution Tables with gt and dplyr Packages.

Alier Ëë Reng https://www.quanfistatistics.io/
02-27-2019

The purpose of this article is to demonstrate how to construct both frequency and relative frequency distribution tables with gt and dplyr packages. In this article, I will use Boston Housing dataset from MASS package (specifically the median value of owner-occupied homes in $1000s).

Definition

A frequency distribution is a table that shows classes or intervals of data entries with counts of the number of entries in each class. The frequency f of a class is the number of data entries in the class.

A relative frequency of a class is the portion or percentage of the data that falls in that class. To find the relative frequency of a class, divide the frequency f by the sample size n.

Rel. Frequency = \(\frac{Class frequency}{Sample size}\) = \(\frac{f}{n}\)

The cumulative frequency of a class is the sum of the frequency for that class and all previous classes. The cumulative frequency of the last class is equal to the sample size n.

Load the packages


library(MASS)
library(gt)
library(dplyr)
library(ggplot2)

data(Boston) # Download Boston Housing data from MASS package.

housing_data <- factor(cut(Boston$medv, breaks=nclass.Sturges(Boston$medv))) # Create classes of data by using Sturge's formula and convert these classes into factors.
house_data <- as.data.frame(table(housing_data)) %>% # Convert dataset into a data frame.
  transform(relative = round(prop.table(Freq), digits = 3), # Round values to three decimal places.
                        Percentages = round(prop.table(Freq), # Calculate cumFreq, proportions and percentages.
                                            digits = 3) * 100)

gt_data <- house_data %>% # Subset the dataset.
  rename(Classes = "housing_data", `Rel. Freq` = "relative") %>% # Rename the columns.
  add_row(Classes =  "Total",   Freq = sum(.$Freq), # Add the row totals.
        `Rel. Freq` = sum(.$`Rel. Freq`),
        Percentages = sum(.$Percentages)) %>%
  gt() %>% # Initiate gt table.
  tab_header(
    title = "Frequency and Relative Frequency Distribution Table(s)",
    subtitle = "Constructed with gt and dplyr Packages" # Add the title and subtitle.
  ) %>% 
  tab_options(heading.background.color = "darkgreen", # Change the heading background color to dark green.
              column_labels.background.color = "grey", # Change columns background color to grey.
              table.width = "100%") %>% # Maximize table width (100%).
  tab_style(
    style = cells_styles(
      bkgd_color = "black", # Make the background of the last row black.
      text_weight = "bold", # Make the font bold.
      text_color = "white"), # Change text color to white.
    locations = cells_data(
      columns = vars(Classes, Freq, `Rel. Freq`, Percentages),
      rows = Classes == "Total" & Freq == 506 &
              `Rel. Freq` == 1.000  # Tell gt package where to apply the above changes.
            & Percentages == 100.0)
  ) %>% 
  cols_align(align = "center", # Center column values.
             columns = TRUE) %>% 
  tab_source_note(
    source_note = "From https://www.rengdatascience.io; by Alier Ëë Reng, 02/26/2019"
    ) # Add citation information.
  

gt_data # Print the distribution table.
Frequency and Relative Frequency Distribution Table(s)
Constructed with gt and dplyr Packages
Classes Freq Rel. Freq Percentages
(4.96,9.5] 22 0.043 4.3
(9.5,14] 55 0.109 10.9
(14,18.5] 85 0.168 16.8
(18.5,23] 154 0.304 30.4
(23,27.5] 84 0.166 16.6
(27.5,32] 39 0.077 7.7
(32,36.5] 29 0.057 5.7
(36.5,41] 7 0.014 1.4
(41,45.5] 10 0.020 2.0
(45.5,50] 21 0.042 4.2
Total 506 1.000 100.0
From https://www.rengdatascience.io; by Alier Ëë Reng, 02/26/2019

Citation

For attribution, please cite this work as

Reng (2019, Feb. 27). Reng Data Science Institute: Frequency Distribution Table with a gt Package. Retrieved from https://www.rengdatascience.io/posts/2019-02-27-a-frequency-distribution-with-a-gt-package/

BibTeX citation

@misc{reng2019frequency,
  author = {Reng, Alier Ëë},
  title = {Reng Data Science Institute: Frequency Distribution Table with a gt Package},
  url = {https://www.rengdatascience.io/posts/2019-02-27-a-frequency-distribution-with-a-gt-package/},
  year = {2019}
}