sTeamRex - A Backlog Recommendation Engine

(Python, Pandas, JSON)


This page contains a walkthrough of the theory and process used in this project. The source code for the script used to pull data from the Steam API and generate recommendations is available on GitHub.


Between Steam Sales and Humble Bundles, many PC gamers find themselves with an extensive backlog of unplayed games. Even if you resolve to play games you already own, it can be difficult to choose what to play next. If you've purchased several Humble Bundles, you may have several games you know nothing about. Or perhaps you bought a game because it was cheap during a Steam sale, but five years later you don't even remember what it's about. You might even have some games that were gifted to you as a joke that you don't want to play by mistake.

Distracted Boyfriend meme depicting ME looking at a STEAM SALE and ignoring MY BACKLOG.

To make this decision easier, I dug into the Steam API and developed a script that looks at the games you've spent the most time playing and then scans through your unplayed games in search of the closest matches.


Getting Started

The Steam API is only loosely documented and can change without warning, so it isn't always straightforward when determining your next step. Even when you figure something out, there's no guarantee it will still work later that day. When I started this project, the JSON file that was returned when you requested a user's games included the title of the game and the URL of the game's icon. A few days later, they removed both of them. This meant that an additional API call was required for each game just to get the title.

User Game Data
{
	"appid" : 107100
	"playtime_forever" : 5499
	"playtime_windows_forever" : 0
	"playtime_mac_forever" : 0
	"playtime_linux_forever" : 0
	"playtime_deck_forever" : 0
	"rtime_last_played" : 1558831382
	"playtime_disconnected" : 0
},
			

I'm curious how many people actually play the same game on more than one operating system. It doesn't seem useful enough to include in the default API response. Of course, that information is only provided for your own account. For other users, you just get the app ID and their total playtime, no breakdown by OS or last played date.

Other User's Game Data
{
	"appid": 4000,
	"playtime_forever": 225
}	
			

The platform specific times would be useful for people who play across multiple platforms, but I suspect the majority of gamers stick to one OS. However, the appid and total playtime are enough to put together a recommendation engine when combined with the information that can be obtained from the API.

Game Data
{
	"107100":{
		"success":true,"data":{
			"type":"game",
			"name":"Bastion",
			"steam_appid":107100,
			"required_age":0,
			"is_free":false,
			"controller_support":"full",

			*** TRUNCATED ***

			"developers":["Supergiant Games"],
			"publishers":["Supergiant Games"],

			*** TRUNCATED ***

			"categories":[
				{
					"id":2,
					"description":"Single-player"
				},
				{
					"id":22,"description":"Steam Achievements"
				},
				{
					"id":28,"description":"Full controller support"
				},
				{
					"id":29,"description":"Steam Trading Cards"
				},
				{
					"id":23,"description":"Steam Cloud"
				},
				{
					"id":25,"description":"Steam Leaderboards"
				},
				{
					"id":41,"description":"Remote Play on Phone"
				},
				{
					"id":42,"description":"Remote Play on Tablet"
				},
				{
					"id":43,"description":"Remote Play on TV"
				},
				{
					"id":62,"description":"Family Sharing"
				}
			],
			"genres":[
				{
					"id":"1",
					"description":"Action"
				},
				{
					"id":"23",
					"description":"Indie"
				},
				{
					"id":"3",
					"description":"RPG"
				}
			],
			"screenshots":[
				*** TRUNCATED ***
			],
			"movies":[
				{
				*** TRUNCATED ***				}
			],
			"recommendations":{
				"total":27136
			},
			"achievements":{
				"total":24,
				*** TRUNCATED ***
			},
			"release_date":{
				"coming_soon":false,
				"date":"Aug 16, 2011"
			},

			*** TRUNCATED ***

			"ratings":{
				"esrb":{
					"rating":"e10",
					"descriptors":"Animated Blood - Fantasy Violence - Use of Alcohol and Tobacco"
				},
			*** TRUNCATED ***			}
		}
	}
}
			

The key information from the JSON object above is the game's title and genres. There are other fields that could proves useful, such as age rating, controller support, and whether the game supports achievements and trading cards. If I were building a robust commercial application, I would include those features, but as this is a practice exercise, I'm going to keep it simple for the initial version.


Building the Engine

The first step is to obtain a list of the user's games and play times. This is faciliated through a simple API call to iPlayerService/GetOwnedGames with the user's SteamID. (If you don't have the user's SteamID, it can be obtained by sending a request to ISteamUser/VanityURL with their custom username.) This will provide you with a JSON list of appIDs for every game the user owns, along with a playtime_forever field that lists how many minutes they have played each game.

def get_users_games(api_key, steam_id):
    url = f"http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key={api_key}&steamid={steam_id}&format=json"
    headers = {
         "Accept": "application/json",
         "Content-Type": "application/json",
    }

    myreq = r.get(url, headers=headers)
    content = myreq.content

    return json.loads(content)

Once this data has been obtained, it is time to pull the listing of each game. To facilitate this, I first load the JSON object and iterate through the appids, making the API request for each game's info. This data goes into an object that includes the 'title', 'appid', 'playtime', 'release_date', 'genres', and 'tags'.

games_df.loc[len(games_df)] = api.get_game_data(game_data)

Every 100 games, I initiate a 5 minute rest to avoid being rate limited. While Steam's 100,000 requests-per-day limit is well documented, the short term limits are not publicized. Without the rest periods, requests will begin timing out after a couple of hundred requests.

Once all of the game data has been acquired, I calculate the average playtime of all games. If that result equals zero, it is highly likely that the user has chosen to keep their playtime private, a possibility even if the games list is public. As the engine bases its recommendations on playtime, nothing more can be achieved, so the script ceases execution.

if games_df['playtime'].mean() == 0:
    print(f'{steam_id} has set their play time to private, no recommendations can be made.')
    exit()

If playtime is available, the next step is determining how many games to consider when determining recommendations. I decided to use 10 games if the user had less than 100 games, or 15, 20, or 25 games if the user owns 250, 500, or 1000+ games respectively. The nth most played games are then copied into a new dataframe, through which I iterate, counting the genres and tags of all of these games. Next, I separated the tags based on which appear the most often and which appear the least, with the rest of the genres & tags being grouped depending on if they appear above or below the midpoint.

Once the scoring criteria has been established, I create a new dataframe that contains all games with less than 30 minutes of playtime. Iterating through this dataframe, I assign a score based on each genre and tag as determined by their positioning in the 4 ranks. Then I accessed the appreviews API to acquire review scores for the game, multiplying the calculated score by the percentage of positive reviews. If the game does not have review scores available in the API, I scrape the game's page and parse it with BeautifulSoup. Finally, I sorted the unplayed dataframe based on the final score, saving the results to a CSV file.


Results

My ten most-played games are:

  1. Alan Wake
  2. Papo & Yo
  3. Guacamelee! Super Turbo Championship Edition
  4. Contrast
  5. Goat Simulator
  6. Psychonauts
  7. Elder Scrolls V, The: Skyrim
  8. Just Cause
  9. Guacamelee! Gold Edition
  10. Rogue Legacy

According to Steam, I've logged over 380 hours in Alan Wake. I do like Alan Wake, but I'm pretty sure I didn't spend 15 full days playing it. This time must include all of the idle time where I left the game running but wasn't actively playing it. I suspect it's a similar situation for Papo & Yo, Contrast, Psychonauts, and Just Cause. Guacamelee! likely not only suffers from logged time while not actively playing the game but also from 2 different versions of the same game appearing in the list. (I was able to confirm that my last save in Psychonauts was after just under 12 hours of gameplay. Unless I've repressed the memory of a 150 hour boss fight, this is probably closer to my actual playtime.) Goat Simulator and Rogue Legacy's times are probably accurate, Skyrim's time is technically less than recorded because it doesn't include the 100+ hours I put into the game on console before a glitch caused me to switch to the PC version.

While the playtimes may or may not be accurate, it is fair to say that I really enjoyed all of these games, so any recommendations based on them should be valid. Let's check out the recommendations based on the data as-is.

My top ten recommendations are:

  1. Pikuniku
  2. Brothers - A Tale of Two Sons
  3. Pumpkin Jack
  4. Super Daryl Deluxe
  5. Trine 4: The Nightmare Prince
  6. Supraland
  7. Nobody Saves the World
  8. Omno
  9. Fable - The Lost Chapters
  10. Darkwood

Fable immediately stands out because it's a game I definitely enjoy. However, since I played it on console, it registers as unplayed on Steam. As for the other nine games, the only one I purchased individually was Darkwood. The rest likely came from Humble Bundles, and I'm really only familiar with two of them.

So how can I determine if my engine is providing good results? I can check the genres and tags to ensure the scores are calculating accurately. My top genres are:

  1. Action (6)
  2. Adventure (6)
  3. Indie (6)
  4. Casual (2)
  5. RPG (2)
  6. Simulation (1)

And my top tags are:

  1. Singleplayer (10)
  2. Adventure (9)
  3. Action (8)
  4. Third Person (7)
  5. Great Soundtrack (6)
  6. Indie (6)
  7. Platformer (6)
  8. Atmospheric (5)
  9. Funny (5)
  10. Story Rich (5)

Of course, that presupposes that the genre and tag scoring system will provide meaningful results. To test that theory, I'll have to play the games. What a chore.



Pikuniku

I was aware that I owned Pikuniku courtesy of a Humble Bundle, but I didn't know much about it aside from the graphics style making me think of a fusion of Baba Is You and Night in the Woods.

All four of its genres match my top four genres. Of its twenty tags, seven of them are in my top ten. Twelve of the other thirteen tags also match with my ten most played games. The only outlier is dystopian, which could accurately be applied to half of my top games. From a scoring standpoint, it certainly matches, but how does it play?

Well, after an hour of game time, I pronounce it an excellent recommendation. It's nothing like any of my most played games, but I very much enjoyed it. Luck? Or does the engine uncover an intangible link in the tags and genres?



Brothers - A Tale of Two Sons

This game has been on my to-play list for... ten years, apparently, since I purchased it during the Summer Sale of 2014. Nine of the twenty tags match my top ten, nine more match at least one of my most played games. The only tags that aren't found in my top games are beautiful and walking simulator. The former is obviously subjective, there are a few games on my list to which I think it applies. The later is a category I enjoy, but games within it are generally short, thus wouldn't rank high on playtime.

As a twin stick adventure game, in which you control two characters at the same time, it's quite different from anything on my list, but ignoring that mechanic, there are a couple of games it is quite similar to, including my favorite game of all time, Bastion. In the short time I played it, this seems to be a very good recommendation.



Pumpkin Jack

This game arrived in a Humble Bundle, but I never paid it any mind. The game I was interested in from that bundle was SuperHot: Mind Control Delete. Nine out of my ten most played tags, nineteen out of twenty total tag matches. The "new" tag this time is villain protagonist. Fable lets you play as an evil character if you so desire. It could also be argued that Pilgor in Goat Simulator is a villain. And in Just Cause, you play a CIA operative overthrowing a foreign government. I'm no stranger to playing as the bad guy.

A few of my top games feel similar to Pumpkin Jack from a gameplay perspective. My first impression is that it's what you would get if Tim Burton was working for Rare in the N64 days. Seems to be another quality recommendation.



Super Daryl Deluxe

This game entered my library with the February 2019 Humble Monthly bundle, meaning it's been neglected for 5 years. I never once considered playing it. I didn't even know what sort of game it was, and now that I've played it for a bit, I'm still not entirely sure. It's an RPG beat 'em up that looks like a point-and-click adventure and gives off Napoleon Dynamite vibes.

Ninety percent tag match, with hand-drawn and crowdfunded as outliers. I've played many hand-drawn games, but the work involved means they're usually shorter. I've only played through the first section of the game and it has nothing in common with my most-played games, but it does seem akin to several titles I've beaten on Game Pass, so I'll count this as another successful recommendation.



The Rest

As previously mentioned, I'm a big fan of Fable. Darkwood is a game I bought after watching someone stream it, so I'm confident it's a good recommendation. I've played the first Trine game and enjoyed it, so Trine 4 is probably a strong recommendation as well. After playing ~20 minutes of Supraland, I'll certainly be putting it into my rotation. Nobody Saves The World is from the developers of Guacamelee, so I've no doubt it will be enjoyable. Omno is a game I know nothing about, but it looks like exploratory adventures that I love, so I'll have to delve into it as well.


Conclusion

It appears the recommendation engine is a success. There are certainly several ways it could be refined, which I might look into in the future, but it currently works as intended and provides good recommendations.

I currently own 686 games on Steam that I have never played, plus another 76 that I've played less than 20 minutes. Quite a few of these are bound to be worth more time than I have to give, so hopefully this project will help me make a dent in that backlog.