Last November, I (finally) popped the big question and proposed! Since then, my fiance and I have been diligently planning our wedding. While we have most of the big-ticket items checked off (venue, catering, photography, etc.), one area we still have more work to do is on the wedding playlist. We’ve started putting together a playlist on spotify, but it feels like it’s come to a bit of a stand-still. Currently, there’s a mix of zesty bops and tame songs on the playlist (we need to accommodate both our college friends and our grandparents!), but spotify’s track recommender only wants to suggest tamer songs right now. Our goal is to have a full dance floor the entire night — to achieve this, we can use spotifyr and the new tidyclust package to pull in the current playlist, cluster the songs based on their features, and find new songs based on the bop cluster.
library(tidymodels) library(tidyclust) library(spotifyr)
Pulling in the playlist
spotifyr is an R interface to spotify’s web API and gives access to a host of track features (you can follow this tutorial to get it setup). I’ll use the functions
get_playlist_tracks() to pull in songs that are currently on our wedding playlist (appropriately named “Ding dong”).
# get the songs that are currently on the wedding playlist ding_dong <- get_user_playlists("12130039175") %>% filter(name == "Ding dong") %>% pull(id) %>% get_playlist_tracks() %>% as_tibble() %>% select(track.id, track.name, track.popularity) %>% rename_with(~stringr::str_replace(.x, "\\.", "_"))
|12jjuxN1gxlm29cqL5M6MW||I Got You||65|
|2RlgNHKcydI9sayD2Df2xp||Mr. Blue Sky||80|
|6x4tKaOzfNJpEJHySoiJcs||Mambo No. 5 (a Little Bit of…)||77|
|3Gf5nttwcX9aaSQXRWidEZ||Ride Wit Me||76|
|3wMUvT6eIw2L5cZFG1yH9j||Country Grammar (Hot Shit)||70|
Spotify estimates quite a few features for each song in their catalog: speechiness (the presence of words on a track), acousticness (whether or not a song includes acoustic instruments), liveness (estimates whether or not the track is live or studio-recorded), etc. We can use
get_track_audio_features() to get the features for each song based on its
# pull in track features of songs on the playlist track_features <- ding_dong %>% pull(track_id) %>% get_track_audio_features() # join together ding_dong <- ding_dong %>% left_join(track_features, by = c("track_id" = "id"))
In my case, I’m interested in the energy and valence (positivity) of each song, so I’ll select these variables to use in the cluster analysis.
|I Got You||0.544||0.399|
|Mr. Blue Sky||0.478||0.338|
|Mambo No. 5 (a Little Bit of…)||0.892||0.807|
|Ride Wit Me||0.722||0.700|
|Country Grammar (Hot Shit)||0.565||0.664|
Currently, the playlist covers a wide spectrum of songs. For new songs on the playlist, I’m really just interested in songs similar to others in the top right corner of the below chart with high energy and valence.
Broadly, there are three generic categories that the songs on the current playlist fall into: high energy and valence, low energy, or low valence (songs with low energy and valence will fall into one of the “low” categories). Rather than manually assign categories, we can use tidyclust to cluster the songs into three groups using the kmeans algorithm.
# create a clustering obj set.seed(918) ding_dong_clusters <- k_means(num_clusters = 3) %>% fit(~ valence + energy, data = ding_dong)
As expected, the majority of songs in the current playlist fall into the bop cluster. Let’s explore this cluster using in more detail with the custom metric
# assign to clusters ding_dong_vibes <- ding_dong_clusters %>% augment(ding_dong) %>% select(track_name, valence, energy, .pred_cluster) %>% mutate(vibe = valence + energy) # what are songs with the biggest vibe? ding_dong_vibes %>% arrange(desc(vibe)) %>% slice_head(n = 10) %>% knitr::kable()
|She Bangs - English Version||0.858||0.950||Cluster_1||1.808|
|Take on Me||0.876||0.902||Cluster_1||1.778|
|The Legend of Chavo Guerrero||0.913||0.858||Cluster_1||1.771|
|Can’t Hold Us (feat. Ray Dalton)||0.847||0.922||Cluster_1||1.769|
|Timber (feat. Ke$ha)||0.788||0.963||Cluster_1||1.751|
|Shake It Off||0.942||0.800||Cluster_1||1.742|
As expected, when arranging by
vibe, the top songs are all a part of the first cluster. And they are, indeed, a vibe:
Compare that with the second cluster, which are generally lower energy (I’d personally disagree with spotify ranking Mr. Blue Sky and Single Ladies as “low energy,” but most others make sense).
ding_dong_vibes %>% filter(.pred_cluster == "Cluster_2") %>% arrange(vibe) %>% slice_head(n = 10) %>% knitr::kable()
|Mr. Blue Sky||0.478||0.338||Cluster_2||0.816|
|Single Ladies (Put a Ring on It)||0.272||0.584||Cluster_2||0.856|
|Low (feat. T-Pain)||0.304||0.609||Cluster_2||0.913|
|I Got You||0.544||0.399||Cluster_2||0.943|
|Wake Up in the Sky||0.367||0.578||Cluster_2||0.945|
|Summer, Highland Falls - Live at the Bayou, Washington, D.C. - July 1980||0.452||0.544||Cluster_2||0.996|
|Take Me Out||0.527||0.663||Cluster_2||1.190|
|Country Grammar (Hot Shit)||0.565||0.664||Cluster_2||1.229|
Finally, the third cluster mostly contains songs with low valence but relatively high energy.
ding_dong_vibes %>% filter(.pred_cluster == "Cluster_3") %>% arrange(vibe) %>% slice_head(n = 10) %>% knitr::kable()
|Titanium (feat. Sia)||0.301||0.787||Cluster_3||1.088|
|All Night (feat. Knox Fortune)||0.392||0.777||Cluster_3||1.169|
|Shout, Pts. 1 & 2||0.416||0.866||Cluster_3||1.282|
|Club Can’t Handle Me (feat. David Guetta)||0.473||0.869||Cluster_3||1.342|
|Body (feat. Brando)||0.582||0.764||Cluster_3||1.346|
|Levels - Radio Edit||0.464||0.889||Cluster_3||1.353|
Now that I have the songs in the current playlist sorted by cluster, let’s pull in some new songs and assign them to the appropriate cluster!
Adding new songs
To go searching for new songs, we’ll start by casting a wide net then narrow the search with some of the
get_*() functions from spotifyr. I’ll start by using
get_categories() to explore the categories available in spotify.
get_categories() %>% as_tibble() %>% select(id, name) %>% slice_head(n = 10) %>% knitr::kable()
I don’t really want to play country music or R&B during the wedding, so I’ll filter to a few categories before using
get_category_playlists() to pull in the featured playlists available in each category.
# pull in playlist ids playlists <- get_categories() %>% as_tibble() %>% filter(id %in% c("toplists", "hiphop", "pop", "rock", "summer")) %>% pull(id) %>% map_dfr(get_category_playlists) %>% as_tibble() %>% select(id, name, description) %>% distinct(id, .keep_all = TRUE) playlists %>% slice_head(n = 10) %>% knitr::kable()
|37i9dQZF1DXcBWIGoYBM5M||Today’s Top Hits||Steve Lacy is on top of the Hottest 50!|
|37i9dQZF1DX0XUsuxWHRQd||RapCaviar||Music from Drake, Offset and 42 Dugg.|
|37i9dQZF1DXcF6B6QPhFDv||Rock This||The latest from Panic! At The Disco along with the Rock songs you need to hear today.|
|37i9dQZF1DX4dyzvuaRJ0n||mint||The world’s biggest dance hits. Cover: Zedd & Maren Morris|
|37i9dQZF1DX1lVhptIYRda||Hot Country||Today’s top country hits of the week, worldwide! Cover: Tyler Hubbard|
|37i9dQZF1DX10zKzsJ2jva||Viva Latino||Today’s top Latin hits, elevando nuestra música. Cover: Anitta, Maluma.|
|37i9dQZF1DX4SBhb3fqCJd||Are & Be||The pulse of R&B music today. Cover: Tink|
|37i9dQZEVXbLRQDuF5jeBp||Top 50 - USA||Your daily update of the most played tracks right now - USA.|
|37i9dQZEVXbMDoHDwVN2tF||Top 50 - Global||Your daily update of the most played tracks right now - Global.|
|37i9dQZEVXbLiRSasKsNU9||Viral 50 - Global||Your daily update of the most viral tracks right now - Global.|
There’s a lot of playlists in
playlists, so I’ve gone through and selected a few that I’m interested in exploring further.
selected_playlists <- c("Today's Top Hits", "mint", "Top 50 - US", "Top 50 - Global", "Viral 50 - US", "Viral 50 - Global", "New Music Friday", "Most Necessary", "Internet People", "Gold School", "Hot Hits USA", "Pop Rising", "teen beats", "big on the internet", "Party Hits", "Mega Hit Mix", "Pumped Pop", "Hit Rewind", "The Ultimate Hit Mix", "00s Rock Anthems", "Summer Hits", "Barack Obama's Summer 2022 Playlist", "Summer Hits of the 10s", "Family Road Trip")
With this shorter list of playlists, I can pull in the all the songs that appear on each with
get_playlist_tracks(). Some songs may appear on multiple playlists, so we’ll only look at unique songs by
track_id. I’ve already pulled in features for songs currently on the playlist, so we can filter those out as well. Finally,
get_track_audio_features() limits queries to a maximum of 100 songs, so we’ll select the top 100 most popular songs within the sample.
new_songs <- playlists %>% filter(name %in% selected_playlists) %>% pull(id) %>% map_dfr(get_playlist_tracks) %>% as_tibble() new_songs <- new_songs %>% select(track.id, track.name, track.popularity) %>% rename_with(~stringr::str_replace(.x, "\\.", "_")) %>% distinct(track_id, .keep_all = TRUE) %>% arrange(desc(track_popularity)) %>% filter(!track_id %in% ding_dong$track_id) %>% slice_head(n = 100)
|2tTmW7RDtMQtBk7m2rYeSw||Quevedo: Bzrp Music Sessions, Vol. 52||100|
|6Sq7ltF9Qa7SNFBsV5Cogx||Me Porto Bonito||99|
|1IHWl5LamUGEuP4ozKQSXZ||Tití Me Preguntó||97|
|4LRPiXqCikLlN15c3yImP7||As It Was||96|
|6xGruZOHLs39ZbVccQTuPZ||Glimpse of Us||96|
|0mBP9X2gPCuapvpZ7TGDk3||Left and Right (Feat. Jung Kook of BTS)||94|
Now let’s assign these 100 news songs to the clusters we found earlier based on their valence and energy!
new_song_features <- new_songs %>% pull(track_id) %>% get_track_audio_features() new_songs <- new_songs %>% left_join(new_song_features, by = c("track_id" = "id")) new_songs_clustered <- ding_dong_clusters %>% augment(new_songs) %>% select(track_name, valence, energy, .pred_cluster) %>% mutate(vibe = valence + energy)