Skip to contents

Introduction

Normalize logical columns

For data frames or tibbles that have character or factor columns storing logical data, as may happen when reading from a database, CSV, or Excel file, use normalize_logicals() to find and convert these columns to logical type. This is a nice one-liner in a dplyr pipe

library(dplyr, warn.conflicts = FALSE)
ugly_data <- tibble(
  logical_char = c("T", "F", "T"),
  logical_factor = factor(c("TRUE", "FALSE", "TRUE")),
  non_logical_char = c("a", "b", "c"),
  non_logical_factor = factor(c("x", "y", "z")),
  mixed_char = c("T", "F", "a"),
  mixed_factor = factor(c("TRUE", "FALSE", "x")),
  numeric_col = c(1.1, 2.2, 3.3)
)

ugly_data
#> # A tibble: 3 × 7
#>   logical_char logical_factor non_logical_char non_logical_factor mixed_char
#>   <chr>        <fct>          <chr>            <fct>              <chr>     
#> 1 T            TRUE           a                x                  T         
#> 2 F            FALSE          b                y                  F         
#> 3 T            TRUE           c                z                  a         
#> # ℹ 2 more variables: mixed_factor <fct>, numeric_col <dbl>
df <- ugly_data |> normalize_logicals()
#> Converted "logical_char" and "logical_factor" columns to
#> logical.
df
#> # A tibble: 3 × 7
#>   logical_char logical_factor non_logical_char non_logical_factor mixed_char
#>   <lgl>        <lgl>          <chr>            <fct>              <chr>     
#> 1 TRUE         TRUE           a                x                  T         
#> 2 FALSE        FALSE          b                y                  F         
#> 3 TRUE         TRUE           c                z                  a         
#> # ℹ 2 more variables: mixed_factor <fct>, numeric_col <dbl>

Auto-formatted datattables

auto_dt() uses guess_col_fmts() to determine the format of each column. You can provide pct_flags and curr_flags (character vectors) if you need to control the list of “signal” words that indicate a column is a percentage or currency.

tribble(
  ~product, ~weight, ~dollaz_earned, ~growth_pct,
  "Widget A", 13.53, 1023.21, 0.051,
  "Widget B", 22.61, 150.24, 0.103,
  "Widget C", 40.54, 502.26, 0.021,
  "Widget D", 34.21, 2000.95, 0.154
) |>
  mutate(product = as.factor(product)) |>
  auto_dt(numeric_digits = 1, pct_digits = 0, curr_flags = c("revenue", "dollaz"))

Quarterly breaks and labels

scales::label_date_short() is a great function for labeling dates in ggplot2, but unfortunately it can’t support quarterly breaks and labels out of the box.

mcrutils provides a set of functions to create quarterly breaks and labels for date scales in ggplot2. The breaks_quarters() function generates breaks for quarters, and label_quarters_short() generates minimal labels for these breaks in a two-line format (like scales::label_date_short()), labeling every quarter, but only including the year when it changes from the previous label.

library(ggplot2)

economics |>
  filter(date >= "2005-05-01", date <= "2007-03-01") |>
  ggplot(aes(date, pce)) +
  geom_line() +
  scale_x_date(
    breaks = breaks_quarters(),
    labels = label_quarters_short()
  ) +
  labs(title="Automatic Quarterly Breaks",
       subtitle ="with concise labels") +
  theme(panel.grid.minor.x = element_blank())

The automatic version of breaks_quarters() tries to return a reasonable number of breaks over a wide range of dates, down-sampling to semesters and years as needed.

economics |>
  filter(date >= "2005-05-01", date <= "2009-03-01") |>
  ggplot(aes(date, pce)) +
  geom_line() +
  scale_x_date(
    breaks = breaks_quarters(),
    labels = label_quarters_short()
  ) +
  labs(title="Switching to semesters for longer ranges",
       subtitle = "always labelling Q1/Q3, never Q2/Q4") +
  theme(panel.grid.minor.x = element_blank())

economics |>
  filter(date >= "2000-05-01", date <= "2010-03-01") |>
  ggplot(aes(date, pce)) +
  geom_line() +
  scale_x_date(
    breaks = breaks_quarters(),
    labels = label_quarters_short()
  ) +
  labs(title="Switching to yearly for very long ranges",
       subtitle = "rather silly") +
  theme(panel.grid.minor.x = element_blank())

With very long date ranges like this, you are likely better off switching from these quarterly functions to the more standard date breaks and labels in ggplot2.

You can force a fixed break width if quarters are desired regardless of the date range.

economics |>
  filter(date >= "2005-02-01", date <= "2008-12-01") |>
  ggplot(aes(date, pce)) +
  geom_line() +
  scale_x_date(
    breaks = breaks_quarters(width = "1 quarter"),
    labels = label_quarters_short()
  ) +
  labs(title="Fixed quarterly breaks") +
  theme(panel.grid.minor.x = element_blank())

Year-to-date helpers

mcrutils provides a handful functions that can be helpful in creating year-to-date analyses

With ytd_bounds(), quickly find the date bounds of the latest year-to-date period in vector of dates, possibly spanning multiple years:

c(
  "2023-01-01", "2023-06-15", "2023-12-31",
  "2024-01-01", "2024-03-15", "2024-07-15"
) |>
  as.Date() |>
  ytd_bounds()
#> [1] "2024-01-01" "2024-07-15"

is_ytd_comparable() is a logical vector that indicates whether the dates in a vector are within a year-to-date period relative to a given end_date.

Below we have 2.5 years of historical sales data ending on June 1, 2025. How were the sales in the comparable period of 2024 and 2023?

set.seed(123)
sales <- data.frame(
  date = seq(
    from = as.Date("2023-01-01"),
    to = as.Date("2025-06-01"),
    by = "month"
  ),
  amount = rpois(30, lambda = 100)
)

head(sales)
#>         date amount
#> 1 2023-01-01     94
#> 2 2023-02-01    111
#> 3 2023-03-01     83
#> 4 2023-04-01    101
#> 5 2023-05-01    117
#> 6 2023-06-01    104

sales |>
  group_by(year = lubridate::year(date)) |>
  summarise(annual_sales = sum(amount))
#> # A tibble: 3 × 2
#>    year annual_sales
#>   <dbl>        <int>
#> 1  2023         1199
#> 2  2024         1166
#> 3  2025          600

How were the year-to-date sales in the comparable period of 2024 and 2023?

(bounds <- ytd_bounds(sales$date))
#> [1] "2025-01-01" "2025-06-01"

sales |>
  filter(is_ytd_comparable(date, max(bounds))) |>
  group_by(year = lubridate::year(date)) |>
  summarise(ytd_sales = sum(amount))
#> # A tibble: 3 × 2
#>    year ytd_sales
#>   <dbl>     <int>
#> 1  2023       610
#> 2  2024       594
#> 3  2025       600

With py_dates() you can rollback a vector of dates to the same period in the previous year, moving any fictitious dates to the prior valid day.

c("2024-01-01", "2024-02-29", "2025-07-15") |>
  as.Date() |>
  py_dates()
#> [1] "2023-01-01" "2023-02-28" "2024-07-15"