Spotify Rewrapped (Python, JSON, Tableau)


This page contains a walkthrough of the theory and process used in this data analysis project. You can find a summary presentation of the findings on Canva. The source code for the script used to pull data from the Spotify API is available on GitHub.


My Spotify Wrapped for 2023 was disappointing. Although the statistics were accurate, my top songs came from a pair of playlists I listen to when I need to be productive. Similarly, 3 of my top 5 artists resulted from my decision to listen to their entire discographies a single time. There was little insight to be gleaned from this recap.

Spotify Wrapped screenshot

Spotify allows you to request a copy of your listening history, so I decided to make some tweaks and see what I could learn by digging into the data myself. With a bit of cleaning and filtering, I suspected I could tell a more accurate story about my year in music, what I listened to when I was listening for the sake of listening.


Getting Started

I began by going to the privacy section of my Spotify account. At the bottom of that page, there is a section where you can request your data. The default selection provides various account information (shown below) plus your streaming history for the previous year. You also have the option to request the streaming history for the entire lifetime of your account and any technical information Spotify has logged about your account.

The basic data notes a 5 day preparation time, whereas the other options indicate it can take 30 days before your data is compiled. In my personal experience, I received the basic data in less than 72 hours and the extended data in 9 days, so the lead times vary.

Screenshot of Spotify account screen where you can request your personal data.

The streaming history provided for the last year only includes limited information about each track:

Basic Streaming History
{
    "endTime" : "2023-06-20 17:23",
    "artistName" : "Royal & the Serpent",
    "trackName" : "ONE NATION UNDERDOGS",
    "msPlayed" : 212459
},
		

While Spotify's Search API does allow you to search for information using artist and track names, the results seem... *ahem* ...spotty at best when dealing with more obscure artists. Now, I'm far from a hipster, but over 75% of the songs I listened to in 2023 scored in the lower 50% of the popularity index. My experiences with the results of searching the API involved a lot of incorrect album results. Fortunately, the extended streaming history files contain much more information, as you can see below:

Extended Streaming History
{
	"ts":"2023-06-20T00:40:44Z",
	"username":"------",
	"platform":"windows",
	"ms_played":212459,
	"conn_country":"US",
	"ip_addr_decrypted":"--.--.--.--",
	"user_agent_decrypted":"unknown",
	"master_metadata_track_name":"ONE NATION UNDERDOGS",
	"master_metadata_album_artist_name":"Royal & the Serpent",
	"master_metadata_album_album_name":"RAT TRAP 1: the blueprint",
	"spotify_track_uri":"spotify:track:0rlcL9H8ZJUEDQ3N0tmTZf",
	"episode_name":null,
	"episode_show_name":null,
	"spotify_episode_uri":null,
	"reason_start":"trackdone",
	"reason_end":"trackdone",
	"shuffle":false,
	"skipped":false,
	"offline":false,
	"offline_timestamp":1687221431,
	"incognito_mode":false
},	
		

Of particular interest is the spotify_track_uri field, which includes the song's track_id. With this bit of information, you can directly query the track and get a considerable amount of information about the album on which it appears, as well as on the performer:

Track API Call
{  
	"album": {    
		"album_type": "single",
		"total_tracks": 2,
		"external_urls": {
			"spotify": "https://open.spotify.com/album/0otIBMv5zMV7ikDvfWCD3r"
		},    
		"href": "https://api.spotify.com/v1/albums/0otIBMv5zMV7ikDvfWCD3r",
		"id": "0otIBMv5zMV7ikDvfWCD3r",
		"images": [
			{
				"url": "https://i.scdn.co/image/ab67616d0000b2738217406bb3ac7ef2ef3ef5cf",
				"height": 640,
				"width": 640
			},
			{
				"url": "https://i.scdn.co/image/ab67616d00001e028217406bb3ac7ef2ef3ef5cf",
				"height": 300,
				"width": 300      
			},      
			{        
				"url": "https://i.scdn.co/image/ab67616d000048518217406bb3ac7ef2ef3ef5cf",
				"height": 64,
				"width": 64
			}
		],
		"name": "RAT TRAP 1: the blueprint",
		"release_date": "2023-05-26",
		"release_date_precision": "day",
		"type": "album",
		"uri": "spotify:album:0otIBMv5zMV7ikDvfWCD3r",
		"artists": [
			{
				"external_urls": {
					"spotify": "https://open.spotify.com/artist/64EHXDoln95lnccszdPum0"
				},
				"href": "https://api.spotify.com/v1/artists/64EHXDoln95lnccszdPum0",
				"id": "64EHXDoln95lnccszdPum0",
				"name": "Royal & the Serpent",
				"type": "artist",
				"uri": "spotify:artist:64EHXDoln95lnccszdPum0"
			}
		],    
		"is_playable": true  
	},  
	"artists": [    
		{      
			"external_urls": {
				"spotify": "https://open.spotify.com/artist/64EHXDoln95lnccszdPum0"
			},      
			"href": "https://api.spotify.com/v1/artists/64EHXDoln95lnccszdPum0",      
			"id": "64EHXDoln95lnccszdPum0",      
			"name": "Royal & the Serpent",      
			"type": "artist",      
			"uri": "spotify:artist:64EHXDoln95lnccszdPum0"    
		}  
	],  
	"disc_number": 1,  
	"duration_ms": 212459,  
	"explicit": true,  
	"external_ids": {    
		"isrc": "USAT22304514"  
	},  
	"external_urls": {
		"spotify": "https://open.spotify.com/track/0rlcL9H8ZJUEDQ3N0tmTZf"  
	},  
	"href": "https://api.spotify.com/v1/tracks/0rlcL9H8ZJUEDQ3N0tmTZf",  
	"id": "0rlcL9H8ZJUEDQ3N0tmTZf",  
	"is_playable": true,  
	"name": "ONE NATION UNDERDOGS",  
	"popularity": 47,  
	"preview_url": "https://p.scdn.co/mp3-preview/ba2d2fb20f661aa77eda7bdc81779d8b03728c38?cid=1b3fa221d08447568e2e084645e7b152",  
	"track_number": 2,  
	"type": "track",  
	"uri": "spotify:track:0rlcL9H8ZJUEDQ3N0tmTZf",  
	"is_local": false
}
		

In addition to the earlier information, this now gives us the artist and album IDs, the album release date, and the popularity of the song, along with several other pieces of information that aren't relevant to my current analysis. Now, using the artist ID, we can do an API request for the artist, providing us with the following response:

Artist API Call
{
	"external_urls": {
		"spotify": "https://open.spotify.com/artist/64EHXDoln95lnccszdPum0"
	},
	"followers": {
		"href": null,
		"total": 224222
	},
	"genres": [
		"alt z",
		"indie electropop",
		"modern alternative pop"
	],
	"href": "https://api.spotify.com/v1/artists/64EHXDoln95lnccszdPum0",
	"id": "64EHXDoln95lnccszdPum0",
	"images": [
		{
			"height": 640,
			"url": "https://i.scdn.co/image/ab6761610000e5ebce8b127d23f4bc04d7195e31",
			"width": 640
		},
		{
			"height": 320,
			"url": "https://i.scdn.co/image/ab67616100005174ce8b127d23f4bc04d7195e31",
			"width": 320
		},
		{
			"height": 160,
			"url": "https://i.scdn.co/image/ab6761610000f178ce8b127d23f4bc04d7195e31",
			"width": 160
		}
	],
	"name": "Royal & the Serpent",
	"popularity": 57,
	"type": "artist",
	"uri": "spotify:artist:64EHXDoln95lnccszdPum0"
}	
		

With the artist's genres, I now have everything necessary for my own version of Spotify Wrapped!


Wrangling the Data

While it is easy to process JSON files as-is with Python, I elected to convert my source data into a CSV containing only the information I need. This makes it easier to clean/transform the data in case any issues arise later. While Python can quickly parse JSON, humans can scan through a 5 field CSV much quicker than a 21 field JSON file. To facilitate this, I wrote a Python script to extract the timestamp, names of the artist, album, and track, as well as the track ID, and output them into a CSV. The script excludes songs that I listened to for less than 30 seconds, songs that I pressed skip on, and songs I listened to in Incognito Mode. (Note: The only fields I need for the next step are timestamp and track ID, but including the names will make it easier to locate records when troubleshooting.

There is a Python library named Spotipy that would have expedited this process, but as my goal is to advance my general knowledge of Python and web API useage, I chose to manually code my script from scratch.

The first step was to obtain an access token that would allow me to access the Spotify API. There are two authorization flows for the Spotify API, depending on if you're only using public data or if you need to access a user's profile (for instance, to analyze their playlists.) As I have the "private" data in the JSON files, I don't need to access my account directly, so I opted for the standard authorization flow. After providing my client ID and client secret codes, which I obtained by registering my app in Spotify's developer console, I was given an access key I could use to access the API.

There are 2194 songs on my listening list. I could request data for each of these individually, but Spotify has an unspecified limit of requests you can make in the span of 30 seconds. If you exceed that limit, you'll receive an Error 429 and will be blocked from accessing the API for a period of time. Fortunately, Spotify allows you to make your requests in groups of 50 track IDs, which chops the # of requests from 2194 to 44, quite a difference.

Of course, while I listened to 2194 songs in 2023, I didn't listen to 2194 different songs. The actual number of unique IDs was 1894. While those 300 repeated plays wouldn't have a noticeable impact on bandwidth, I still wanted to avoid making duplicate requests for the sake of writing scalable code that would be as efficient as possible with larger datasets.

Screenshot of some of the Python code discussed in this section.

To facilitate this, I first iterated through the CSV, checking each row to see if the track ID was already in the track_ids list, and adding it if it wasn't. I then iterated through the track_id list, adding each ID to an id_list list. Each time a track was added, I would check the length of the id_list. When it reached 50 tracks, or when my row_count equaled the total number of rows in the file, then I would send the list of IDs to the Spotify API, which would return a set of JSON objects containing the track data.

I stored the track data in a Python dictionary (track_dict) where the key was the track ID and the value was a list containing the artist name, artist ID, album name, album ID, track name, track ID, release date, date precision, track length, track popularity, and whether or not the track was explicit.

Once the dictionary was assembled, I reiterated through the CSV file, this time pulling the timestamp and the track ID. I would search the track_dict for the track ID and add the full set of data to a list I called master_list. I then iterated through the master list, assembling the lists into strings, and writing that data to a new CSV file I named 'extended-output.csv'.

Finally, I wrote a similar script to acquire the artist's popularity and list of associated genres. Rather than add this information to each track listing, I created a separate CSV file that could be joined with 'extended-output.csv' based on the artist ID.


Conducting Analysis

With all of the data in place, I began my analysis.

Artists

I started by looking at artists by total number of songs played. The top two artists were Green Day (100 songs) and Stone Temple Pilots (71 songs), which matches Spotify's findings. The third and fourth artists were Bill Withers (59) and Royal & the Serpent (56), which is reversed from my Wrapped top artists, where Royal & The Serpent was ranked #3 and Bill Withers #4. Spotify doesn't publish the criteria used to produce their rankings, but it doesn't appear to be strictly the number of songs played by the artist. It also isn't number of distinct songs played, because I listened to 59 different Bill Withers songs, but only 40 Royal & the Serpent songs.

Perhaps it takes into account the period of time you listened to a particular artist. I first listened to Royal & the Serpent at the beginning of March and most recently listened to her at the end of August. On the other hand, I listened to Bill Withers' entire discography once over the course of 3 days in July. Another possibility is the consideration of guest artist appearance. There are six songs I listened to (a total of 15 times) on which Royal & The Serpent was a guest artist and would not have been included in the previous counts, as she was not the primary artist. Knowing my listening habits, I would expect Royal & The Serpent to be ranked higher than Bill Withers, I'm just not sure what criteria Spotify used to reach that conclusion.

The #5 ranking is also interesting. Spotify chose Leo, an artist who does heavy metal covers of non-metal songs, for the final spot on my Top Artists list. However, my analysis shows that I listened to more Metallica songs (33) than Leo (31). Going by distinct tracts makes the divide even wider, as I played 32 different Metallica songs, but only 23 Leo songs. This suggests that Spotify places greater weight on songs that get replayed multiple times, which makes sense. Two of Leo's covers, Turn Down for What (DJ Snake & Lil Jon) and ABCDEFU (Gayle), are on my workout playlist, so were played several times each. Three songs from that playlist also made my top 5 songs list, with the other two being from the playlist I listen to when doing dishes. This could be the reason Leo edged out Metallica for the final spot on Spotify's list.

But, as I mentioned, I don't feel like this list is really representative of my listening habits. Over two days in mid-May, I listened to Green Day's discography from Dookie (their third album) to American Idiot (their eighth), which accounts for the majority of my plays of the band. I suspect I did this while playing Minecraft and not really paying attention, because until I started this analysis, I wasn't even consciously aware that Green Day had an album titled Shenanigans, let alone remember listening to it. Aside from that, I listened to Dookie again in September, along with a few of the 4-track demos from their 30th anniversary re-release of that album, as well as the new singles they released near the end of the year, any time they showed up in my Discover Weekly and Release Radar playlists.

As for Stone Temple Pilots, I listened to their discography over the course of 3 days in May, and never again in 2023. I did the same with Bill Withers a couple of months later. Both artists are great, but their inclusion skews a portrait of my entire year of listening. This also holds true for Metallica, whose early albums I listened to at the beginning of November. Leo is slightly different, as I listened to several of his songs during a couple of different times of the year, but the majority of the times I listened to him, it was as part of my workout playlist.

If you exclude those artists from the analysis, my top ten most played artists are:

  1. Royal & the Serpent (56)
  2. Ashnikko (28)
  3. Delicious Friction (26)
  4. Everclear (18)
  5. GAYLE (18)
  6. Medusa (17)
  7. Violent Femmes (17)
  8. CHINCHILLA (15)
  9. Dolly Parton (14)
  10. Banshee (14)

This feels much more representative of my listening habits across all of 2023. If I was guessing with no data, I would have had Royal & the Serpent and Ashnikko at the top, though I'm not sure if they would be in the same order. If I was going purely off how often I remembered listening to them, I may have went with Ashnikko because I would have failed to consider that all of the times I listened to Ashnikko while on vacation in June, it was actually on my girlfriend's account, not mine, thus are not included in my statistics. Either way, you get a much more characteristic portrait of me from this list than from Spotify's.

Songs

So what about individual songs? Going strictly by number of plays, the data matches my Wrapped exactly. As I noted, all of these songs come from a pair of playlists that serve specific purposes. In fact, my top 13 most listened tracks, and 16 of the top 20, come from these two playlists. Excluding those playlists produces a completely different list of songs.

Ashnikko and Royal & the Serpent are both on there, which makes sense as they're my top artists. Brooke Candy is also there. I believe Nuts was my top song of 2022. If I'm being honest, I don't remember Always by niquo. I just listened to it again and it's a chill instrumental song that I likely discovered on a lofi hip hop focus/study playlist and put on repeat because it helped me relax. There are actually 24 songs tied for 5th place. I awarded the final spot on the list to Giovanni & the Hired Guns' Ramon Ayala because it was the only song on the list that I listened to several times on the radio as well.

My top five most played songs when not listening to playlists for specific purposes are:

  1. You Make Me Sick! by Ashnikko
  2. Nuts by Brooke Candy
  3. Always by niquo
  4. ONE NATION UNDERDOGS by Royal & the Serpent
  5. Ramon Ayala by Giovannie and the Hired Guns

Again, I feel this paints a much more authentic picture of my yearly listening than my Spotify Wrapped. And rather than simply being what I want it to be, it's determined by data and systematic reasoning.

Genres

So what about genres? Spotify lists Alt Z as my main genre for the second (maybe third) year in a row. Rock comes in second. The first is a very specific genre that includes my top two artists, the second is about as generic as you can get. The other three genres Spotify highlighted in my recap were Escape Room, Dance Pop, and Dark Pop. I don't disagree with any of this. Escape Room is closely linked to Alt Z, and I feel like most songs in those genres could also be categorized as Dance Pop and/or Dark Pop. But what does the data say?

Well, its complicated. For starters, genres are assigned to artists, not songs. Secondly, an artist can have anywhere from 0 to 15 genres assigned to them. (Perhaps more, 15 was the most of anyone in my listening history.) And the genres are listed alphabetically, not in a quantifiable order. I decided to assign each genre a score based on how often it appeared and how many genres the artist was categorized under. If the artist only had 1 genre, that genre received 5 points. If there were 2 genres, each received 4 points, 3 each for 3 genres, 2 points for 4 genres, and if there were 5 or more genres listed, they each received a single point.

Based on this scoring method, Alt Z was still my #1 genre. Lo-Fi Jazzhop came in second according to my analysis, which makes sense since I did listen to a lot of playlists designed to promote focus. For some reason, however, Spotify didn't include it in my top genres. Escape Room came in third in both mine and Spotify's rankings. According to my analysis, Bubblegrunge and Indie Electropop rounded out the top 5, whereas Spotify gave those spots to Dance Pop and Dark Pop. Dark Pop comes in number six in my scoring method, so not too far off, and I could see Indie Electropop being grouped in with Dance Pop.

Pie chart showing the top 10 genres I listened to in 2023.

The only major point of disagreement between the lists is Spotify's inclusion of the incredibly generic Rock genre. Considering my analysis eliminated most Green Day and Stone Temple Pilots from consideration, that probably explains the diminishment of Rock. For what it's worth, Rock comes in 14th out of 397 genres in my analysis, and alternative rock and modern alternative rock rank #s 7 & 9 respectively.

Though Spotify's genre rankings are closer than the song & artist rankings, my analysis still does a better job of painting a representative picture.


(Re)Wrap Up

Image listing my top 5 artists of 2023.Image listing my top 5 songs of 2023.

So there you have it, my year in music. At least on Spotify. This doesn't include the songs on my mp3 player, the car radio, or live concerts, of course, as that data isn't available or doesn't exist. But I do a majority of my music listening on Spotify, so this is the best overall representation of what I listened to in 2023. Here's hoping for an audibly fruitful 2024.


Conclusion

While it was interesting to develop new insights into my musical listening habits, the biggest takeaway from this project was examining the difference between accuracy and representation. It is important to consider not just what the data says, but why it says it. What outside factors have influenced the story told by the data and how does that affect the knowledge you can glean from that data? Much like art, data can be interpreted in different ways. Unlike art, however, not every interpretation is equally valid. It is important to consider which interpretation will best address your needs, while being sure to avoid cherry picking data that "proves" a predetermined outcome.