Finding new wedding bops with {tidyclust} and {spotifyr}

rstats
tidymodels
Author

Mark Rieke

Published

August 20, 2022

Code
sysfonts::font_add_google("Roboto Slab")
showtext::showtext_auto()

ggplot2::theme_set(
  ggplot2::theme_minimal(base_family = "Roboto Slab", 
                         base_size = 14) +
    ggplot2::theme(plot.title.position = "plot",
                   plot.background = ggplot2::element_rect(fill = "white", color = "white"),
                   plot.title = ggtext::element_markdown(),
                   plot.subtitle = ggtext::element_markdown())
)

Last November, I (finally) popped the big question and proposed! Since then, my fiance and I have been diligently planning our wedding. While we have most of the big-ticket items checked off (venue, catering, photography, etc.), one area we still have more work to do is on the wedding playlist. We’ve started putting together a playlist on spotify, but it feels like it’s come to a bit of a stand-still. Currently, there’s a mix of zesty bops and tame songs on the playlist (we need to accommodate both our college friends and our grandparents!), but spotify’s track recommender only wants to suggest tamer songs right now. Our goal is to have a full dance floor the entire night — to achieve this, we can use spotifyr and the new tidyclust package to pull in the current playlist, cluster the songs based on their features, and find new songs based on the bop cluster.

Code
library(tidymodels)
library(tidyclust)
library(spotifyr)

If you’d like to follow along, I’d recommend installing the development versions of parsnip and workflows, as some of the functionality that interacts with tidyclust isn’t yet on CRAN.

Pulling in the playlist

spotifyr is an R interface to spotify’s web API and gives access to a host of track features (you can follow this tutorial to get it setup). I’ll use the functions get_user_playlists() and get_playlist_tracks() to pull in songs that are currently on our wedding playlist (appropriately named “Ding dong”).

Code
# get the songs that are currently on the wedding playlist
ding_dong <- 
  get_user_playlists("12130039175") %>%
  filter(name == "Ding dong") %>%
  pull(id) %>%
  get_playlist_tracks() %>% 
  as_tibble() %>%
  select(track.id, track.name, track.popularity) %>%
  rename_with(~stringr::str_replace(.x, "\\.", "_"))

ding_dong %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_id track_name track_popularity
5jkFvD4UJrmdoezzT1FRoP Rasputin 61
1D066zixBwqFYqBhKgdPzp Fergalicious 66
12jjuxN1gxlm29cqL5M6MW I Got You 62
2grjqo0Frpf2okIBiifQKs September 78
2RlgNHKcydI9sayD2Df2xp Mr. Blue Sky 76
6x4tKaOzfNJpEJHySoiJcs Mambo No. 5 (a Little Bit of…) 72
3n3Ppam7vgaVa1iaRUc9Lp Mr. Brightside 62
7Cp69rNBwU0gaFT8zxExlE Ymca 45
3Gf5nttwcX9aaSQXRWidEZ Ride Wit Me 72
3wMUvT6eIw2L5cZFG1yH9j Country Grammar (Hot Shit) 65

Spotify estimates quite a few features for each song in their catalog: speechiness (the presence of words on a track), acousticness (whether or not a song includes acoustic instruments), liveness (estimates whether or not the track is live or studio-recorded), etc. We can use get_track_audio_features() to get the features for each song based on its track_id.

Code
# pull in track features of songs on the playlist
track_features <- 
  ding_dong %>%
  pull(track_id) %>%
  get_track_audio_features()

# join together
ding_dong <- 
  ding_dong %>%
  left_join(track_features, by = c("track_id" = "id"))

In my case, I’m interested in the energy and valence (positivity) of each song, so I’ll select these variables to use in the cluster analysis.

Code
ding_dong %>%
  select(track_name, valence, energy) %>%
  slice_head(n = 10) %>%
  knitr::kable()
track_name valence energy
Rasputin 0.966 0.893
Fergalicious 0.829 0.583
I Got You 0.544 0.399
September 0.979 0.832
Mr. Blue Sky 0.478 0.338
Mambo No. 5 (a Little Bit of…) 0.892 0.807
Mr. Brightside 0.240 0.918
Ymca 0.671 0.951
Ride Wit Me 0.722 0.700
Country Grammar (Hot Shit) 0.565 0.664

Clustering with tidyclust

Currently, the playlist covers a wide spectrum of songs. For new songs on the playlist, I’m really just interested in songs similar to others in the top right corner of the below chart with high energy and valence.

Code
# how are valence/energy related?
obj <- 
  ding_dong %>%
  ggplot(aes(x = valence,
             y = energy,
             tooltip = track_name)) + 
  ggiraph::geom_point_interactive(size = 3.5, alpha = 0.5) +
  scale_x_continuous(labels = scales::label_percent(accuracy = 1)) +
  scale_y_continuous(labels = scales::label_percent(accuracy = 1)) +
  labs(title = "The current wedding playlist",
       subtitle = "Hover over each point to see the song's name!")

ggiraph::girafe(
  ggobj = obj,
  options = list(
    ggiraph::opts_tooltip(opacity = 0.8,
                          css = "background-color:gray;color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"),
    ggiraph::opts_hover(css = "fill:#1279BF;stroke:#1279BF;cursor:pointer;")
  )
)

Broadly, there are three generic categories that the songs on the current playlist fall into: high energy and valence, low energy, or low valence (songs with low energy and valence will fall into one of the “low” categories). Rather than manually assign categories, we can use tidyclust to cluster the songs into three groups using the kmeans algorithm.

There’s some great documentation on the tidyclust site, but to get started, we’ll categorize the songs on the current playlist by “fitting” a kmeans model (using the stats engine under the hood).

Code
# create a clustering obj
set.seed(918)
ding_dong_clusters <- 
  k_means(num_clusters = 3) %>%
  fit(~ valence + energy,
      data = ding_dong) 
Code
pal <- MetBrewer::MetPalettes$Egypt[[1]]

obj <- 
  ding_dong_clusters %>%
  augment(ding_dong) %>%
  ggplot(aes(x = valence,
             y = energy,
             color = .pred_cluster,
             tooltip = track_name)) +
  ggiraph::geom_point_interactive(size = 3.5, alpha = 0.75) +
  scale_x_continuous(labels = scales::label_percent(accuracy = 1)) +
  scale_y_continuous(labels = scales::label_percent(accuarcy = 1)) +
  theme(legend.position = "none") +
  labs(title = "Clusters in the current playlist",
       subtitle = glue::glue("Clustered into",
                             "**{riekelib::color_text('zesty bops',pal[1])}**,",
                             "**{riekelib::color_text('angsty bangers', pal[3])}**,",
                             "and",
                             "**{riekelib::color_text('mellow jams', pal[2])}**",
                             .sep = " ")) +
  MetBrewer::scale_color_met_d("Egypt")

ggiraph::girafe(
  ggobj = obj,
  options = list(
    ggiraph::opts_tooltip(opacity = 0.8,
                          use_fill = TRUE,
                          css = "color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"),
    ggiraph::opts_hover(css = "fill:#1279BF;stroke:#1279BF;cursor:pointer;")
  )
)