E-Sports analytics

How data science can help you play Clash Royale better

Use EDA, statistical model, and machine learning model to up your Clash game

Harry Cheng
9 min readApr 19, 2022

For video representation of the project, click here.

Clash Royale is one of the most successful mobile games. Came out 6 years ago, the game is still popular. It is a game where players (typically 1 vs 1) use their customized deck of 8 cards to battle each other. They will use their cards at the expense of “elixir”, and try to destroy their opponents’ towers. (Gameplay shown below)

Source:https://tenor.com/view/clash-royal-gif-7410796

Here is a video introduction of Clash Royale.

Combining my passion for data science and Clash Royale, I decided to do a Clash Royale analytics project where I make use of the power of data to help answer some of the most important questions to understand/play the games better.

This article serves as an abstract for the project, where I explain the high-level overview of the project in plain English, mainly focusing on the methodology used and the result of analysis. If you are interested in the details of the project, you can check out this YouTube video.

The entire project is written in Python. All of my code can be found in this Github repository.

Now without further ado, lets get down to business!

Problem definition

The core problem we need to solve here is: “How do we play Clash Royale better?”

It is a very broad issue with countless possible approaches, some supported by data some not. To organize our thoughts and possible solutions for the problem, it helps to break the topic down and generate an issue tree.

Created by author

The first thing I do with the issue tree is to cut off the “Techniques” branch, as there is no data available to support that analysis, although it could be very relevant.

With only the “Decks” branch left, I narrowed down the scope of the question to: “How to play Clash Royale better by picking better decks?”. To answer that, I describe a deck with three dimensions— Card Level(players can upgrade their cards level in the game), Deck Stats(Attack, speed, hit-points, etc.), and Deck Composition(# of melee, sky units, tanks, etc.). I will do some exploration among these features before putting all of them into the model.

After some further breakdown of the three dimensions and brainstorming, as well as taking into consideration data availability, I listed out the 4 key questions to answer for this project:

  • Q1: How much does cards level contribute to winning?
  • Q2: What’s the meta?
    — Is there a super OP/broken champion?
    — What are the popular cards?
    — How do people usually pair up champions?
  • Q3: What’s the countering/synergetic relationships between champions?
  • Q4: How do we predict the outcome of a game based on players’ decks?
    —What are the relevant features?
    — Given two decks, who will win?

After answering all the questions above, hopefully we will have a clear guidance as to what champions to choose, what to pair it up with, what are the countering champions, and actually test out our deck with the predictive model.

Now I will walk you thru the approaches I used to answer the above questions and the results I got. In plain English, of course :)

Data Gathering

Given the scope of the questions, I need match level data with information regarding deck composition, cards level, as well as match result (win/lose).

The perfect data source to get such data is Clash Royale’s official API. I chose it because it is very reliable and up to date.

Clash Royale API

I used Python’s request library to interact with the API. More on this in my YouTube video.

I ended up gathering~2M rows of data (accidentally), but only used~20,000 samples for EDA, and ~12,000 samples for model training, validating and testing.

Q1: How much does Card Level contribute to winning?

Answer first: A lot, but the effect lower as your trophy number (think ranking) increases.

Strong linear association

After running a logistic regression on average cards level difference and winning, I found that: if your cards are on average 1 level higher than your opponent, your chance of winning that match is 97% higher than without the cards level advantage. This is crazy!

However, most of us don’t have such crazy advantage, and correlation does not imply causation.

Q2: What’s the meta?

Meta here refers to the common practice/popular usage of cards.

Answer first: The game is pretty well balanced (No exceptionally strong champions), but some cards are more popular than others. It is also common practice to put specific cards together.

In this part I segmented all cards into 17 detailed categories, and used informative charts as well as association rule mining to find insights that describe the current meta. Some of the outputs are as below: (All of them come from random sample of 20,000 matches)

Above we can see that win rate is pretty much the same for all cards, but some cards are extremely popular and others rarely used.

Popularity of melee units

Valkyrie is all time meta for sure…

association rule mining

Above is the top 5 result of associate rule mining for advanced players. The higher the “Lift” column, the stronger the association of the two cards. We can see that Goblin Barrel and Princess are extremely popular together (I also have a deck with them together and I didn’t even notice). Other pairs here are also popular combinations since people think they could create synergies. But we can find out whether they ACTUALLY have synergies in the next part.

Q3: What’s the countering/synergetic relationships between champions?

I define countering relationship as: If X has lower win rate when faced with Y, then Y counters X. Likely, synergetic relationship is defined as: If X and Y together has higher win rate than X alone, then X and Y creates synergy.

And to test if the lower/higher win rate is significant, I used the famous Chi-Square Test of Independence.

That is, given a champion, I would run the test on every possible pairs of cards and list out the most significant countering/synergetic cards for the given champion.

Take one of my favorite champion, Golem for example.

Golem (source: https://clashofclans.fandom.com/wiki/Golem)

Using my methodology, I generated the following tables describing Golem’s top 5 counters and synergetic cards:

Counter table

A is when Golem is played against the deck with countering champion, B is when Golem is played against deck without countering champions. P-value is the result of Chi-Square test.

Synergy table

A is when Golem is played with the synergetic champion, B is when Golem is played without the synergetic champion.

If you play Clash, you might find the result a bit weird. While it’s no brainer that P.E.K.K.A and Inferno Tower counter Golem, how is Executioner the best counter for Golem?

This is because the table not only captures the direct counter of the champion, it also captures the indirect counters. P.E.K.K.A and Inferno Tower directly counters Golem, while Executioner counters cards that are usually paired with Golem (Usually flying units).

Same thing goes to Synergy tables. All types of synergies are captured.

Q4: How do we predict the outcome of a game based on players’ decks?

It is pretty obvious that this is a predictive modeling question. The model needed is a typical binary classification machine learning model.

ML algorithm used: Many were tested, but ultimately landed on Logistic Regression!

Features used: Deck’s stats (Avg. hit-point, Avg. damage, Avg. attack radius, Avg. cards level, etc. ) , narrowed down from 60+ to just 10 features

Performance: ~60% accuracy in predicting match result. ~0.68 AUC.

Below is a simple summary of the model building process.

Chart created by author

As you can see from the chart above, a lot of experiments and decision making went into the process of building the model. For the purpose of this article, let’s just focus on the result. Follow my upcoming model building article, where I explain every step of the model building process in fine detail.

Below is the confusion matrix of the model when tested with test data. (I split the data into train, validation and test.)

From the matrix, we can calculate some metrics:

  • Accuracy: 59%, meaning 59% of the predictions are correct.
  • Precision: 73%, meaning within those matches model classified as “Won”, 73% are correct.
  • Recall: 24%, meaning the model only correctly identified 24% of all winning matches.

In other words, the model has pretty strict standard on identifying winning matches. If the model tells you “you are winning”, you very likely will. But expect the model to tell you a bunch of times that you’re losing, and 44% of them will be false alarm.

This kind of unbalanced result usually comes from unbalanced samples, but the training and validation samples I used were perfectly balanced(50% win, 50% lose). Also, since the purpose of the model is to identify winning decks, it helps to be a bit more strict on the result.

However, I still plotted out a ROC curve, and messed with the probability band a little bit to see other possibilities.

ROC Curve, a bit right skewed as expected

I found out that if I lower the probability threshold to 0.25, meaning the model classifies a sample as “won” when the winning probability predicted is just over 25% (default is 50%), the accuracy can be elevated to 62%, but precision will drop to 62%.

Now lets take a look at what does the model say about the strongest deck! For the below sample, the model predicted probability of winning is equal to 98.5%…and indeed it won.

You can see that the winning deck has ridiculously high card level advantage, champions are generally faster (speed diff = 5), and it has units that deals high damage to tower.

So if you have this kind of deck, there is basically no way you can lose.

Conclusion

I hope this project provided some useful insight for Clash Royale, both at a macro level and micro level.

Again, this article only represents a snapshot of the entire project. Check out my YouTube Video if you want to know more!

And although I tried to be as unbiased and exhaustive as possible, this project is still far from perfect. All comments are welcomed!

Give me some claps and click follow if you want to see more original side-project like this :)

If you are also passionate about data science and business analytics, or if you are a recruiter in the field, or you just want to talk to me, you are welcomed to connect with me on Linkedin.

See you next time!

--

--

Harry Cheng

Research Intern @ MBB Consulting. Data Science enthusiast.