Last November, I (finally) popped the big question and proposed! Since then, my fiance and I have been diligently planning our wedding. While we have most of the big-ticket items checked off (venue, catering, photography, etc.), one area we still have more work to do is on the wedding playlist. We’ve started putting together a playlist on spotify, but it feels like it’s come to a bit of a stand-still. Currently, there’s a mix of zesty bops and tame songs on the playlist (we need to accommodate both our college friends and our grandparents!), but spotify’s track recommender only wants to suggest tamer songs right now. Our goal is to have a full dance floor the entire night — to achieve this, we can use spotifyr and the new tidyclust package to pull in the current playlist, cluster the songs based on their features, and find new songs based on the bop cluster.
If you’d like to follow along, I’d recommend installing the development versions of parsnip and workflows, as some of the functionality that interacts with tidyclust isn’t yet on CRAN.
Pulling in the playlist
spotifyr is an R interface to spotify’s web API and gives access to a host of track features (you can follow this tutorial to get it setup). I’ll use the functions get_user_playlists() and get_playlist_tracks() to pull in songs that are currently on our wedding playlist (appropriately named “Ding dong”).
Code
# get the songs that are currently on the wedding playlistding_dong <-get_user_playlists("12130039175") %>%filter(name =="Ding dong") %>%pull(id) %>%get_playlist_tracks() %>%as_tibble() %>%select(track.id, track.name, track.popularity) %>%rename_with(~stringr::str_replace(.x, "\\.", "_"))ding_dong %>%slice_head(n =10) %>% knitr::kable()
track_id
track_name
track_popularity
5jkFvD4UJrmdoezzT1FRoP
Rasputin
61
1D066zixBwqFYqBhKgdPzp
Fergalicious
66
12jjuxN1gxlm29cqL5M6MW
I Got You
62
2grjqo0Frpf2okIBiifQKs
September
78
2RlgNHKcydI9sayD2Df2xp
Mr. Blue Sky
76
6x4tKaOzfNJpEJHySoiJcs
Mambo No. 5 (a Little Bit of…)
72
3n3Ppam7vgaVa1iaRUc9Lp
Mr. Brightside
62
7Cp69rNBwU0gaFT8zxExlE
Ymca
45
3Gf5nttwcX9aaSQXRWidEZ
Ride Wit Me
72
3wMUvT6eIw2L5cZFG1yH9j
Country Grammar (Hot Shit)
65
Spotify estimates quite a few features for each song in their catalog: speechiness (the presence of words on a track), acousticness (whether or not a song includes acoustic instruments), liveness (estimates whether or not the track is live or studio-recorded), etc. We can use get_track_audio_features() to get the features for each song based on its track_id.
Code
# pull in track features of songs on the playlisttrack_features <- ding_dong %>%pull(track_id) %>%get_track_audio_features()# join togetherding_dong <- ding_dong %>%left_join(track_features, by =c("track_id"="id"))
In my case, I’m interested in the energy and valence (positivity) of each song, so I’ll select these variables to use in the cluster analysis.
Currently, the playlist covers a wide spectrum of songs. For new songs on the playlist, I’m really just interested in songs similar to others in the top right corner of the below chart with high energy and valence.
Code
# how are valence/energy related?obj <- ding_dong %>%ggplot(aes(x = valence,y = energy,tooltip = track_name)) + ggiraph::geom_point_interactive(size =3.5, alpha =0.5) +scale_x_continuous(labels = scales::label_percent(accuracy =1)) +scale_y_continuous(labels = scales::label_percent(accuracy =1)) +labs(title ="The current wedding playlist",subtitle ="Hover over each point to see the song's name!")ggiraph::girafe(ggobj = obj,options =list( ggiraph::opts_tooltip(opacity =0.8,css ="background-color:gray;color:white;padding:2px;border-radius:2px;font-family:Roboto Slab;"), ggiraph::opts_hover(css ="fill:#1279BF;stroke:#1279BF;cursor:pointer;") ))
Broadly, there are three generic categories that the songs on the current playlist fall into: high energy and valence, low energy, or low valence (songs with low energy and valence will fall into one of the “low” categories). Rather than manually assign categories, we can use tidyclust to cluster the songs into three groups using the kmeans algorithm.
There’s some great documentation on the tidyclust site, but to get started, we’ll categorize the songs on the current playlist by “fitting” a kmeans model (using the stats engine under the hood).