Microsoft TrueSkill and the Art of Gaming Statistics
Anyone who has spent any amount of time gaming online knows that a player’s volume of trash talk is often inversely proportional to actual skill. That is to say, those who brag the most tend to have the least to back it up. That said, depending on the game, actually quantifying and relating a player’s skill is no easy matter. Determining and ranking players in games where one opponent faces another is an often trivial matter. Systems have been around for decades that can accurately and effectively gauge one player’s abilities against another even if they haven’t actually met.
However, when you throw 32 gamers into a room to compete simultaneously or introduce team-based gameplay making a competitor’s win ratio as dependant on his teammates’ skill as his own, things begin to get much more complicated. This is where Microsoft’s new TrueSkill ranking system comes into play.
Ring of Inspiration
Microsoft’s Thore Graepel and Ralf Herbrich aren’t your typical researchers. Members of Microsoft’s Machine Learning and Perception group in the UK, the two are tasked to study various aspects of computer thinking and reasoning. Gaming is a natural application.
Back in 2004, Thore and Ralf were involved in the beta test of Bungie’s highly anticipated Halo 2, the sequel to the incredibly successful title that made the Xbox console as successful as it was. Halo 2 introduced some interesting concepts into the world of multiplayer gaming in terms of ranking players. Although it certainly wasn’t the first game to attempt to rank its players against each other, it featured some innovative matchmaking.
Instead of simply letting you join any multiplayer game you wanted, the game automatically fit you into matches or created new ones based on your skill relative to your opponents—as well as other criteria such as what type of game you wanted to play and how many opponents you desired. However, the experience-based system, which requires many online matches to establish your ranking, wasn’t received well by all. Thore and Ralf in particular identified some shortcomings:
During our long nights of testing this game, we wished that the system would converge faster to our "true" skills. With our background in machine learning, we started to work on the problem from a machine learning perspective. Three months later, we had the first version of TrueSkill developed, which was then tested extensively on data gathered both in the Halo 2 Beta testing and after the launch of Halo 2 on Xbox Live (November 2004).