Fact check: Does bronze benching work?

Share

In a recent Reddit post, Arlington69 claims to have evidence that bronze benching works. Allegedly, having a bench full of low rated bronze players means that you will get easier opposition in game modes like seasons and weekend league. But does Arlington69’s data really prove that bronze benching works? I decided to investigate.

A brief introduction to bronze benching

As you probably know, bronze benching means using a bench full of bronze players in an attempt to trick the game or the opponent into believing that your team is worse than it really is.

Due to the way FIFA calculates the squad overall rating (OVR), adding low rated bronze players as subs will lower the overall team rating considerably. In the example below, I lower the OVR rating for a full TOTS starting 11 to a mere 84 by using the lowest rated bronze players as subs, thereby bringing my team’s OVR on par with the 84-rated team on the right.

In some game modes of earlier renditions of FUT, people could see each other’s OVR ratings during matchmaking. And obviously, people were more likely to accept the match if the opposing team was 84-rated than if it was 91-rated. Hence, bronze benching not only meant that it was easier to find opponents, but also that you oftentimes would get a competitive edge.

Today, people mostly play game modes where the OVR rating isn’t shown before entering the match. Additionally, EA has stated that the OVR rating has no impact on matchmaking in game modes like FUT seasons and presumably weekend league.

Yet, there are players who still use bronze benches because they believe that FUT’s matchmaking algorithm matches them based on the OVR rating. If they are right, my 84-rated TOTS/bronze bench will be more likely to get matched up against the 84-rated squad on the right than the much scarier 91-rated TOTS squad on the left. And that obviously would seem more than a trifle unfair.

There are however a couple of obvious reasons to be skeptical about the idea that bronze benching leads to easier opponents — at least when it comes to game modes like FUT seasons and Weekend League.

As mentioned, EA sports has officially stated that FUT seasons is about bringing your best team to the pitch. I can’t come up with any reason why EA would lie about this. After all, gamers buy packs because they hope to find rare, high rated players which either can be sold at high value or improve their teams directly. If you were matched based on OVR rating, you would get matched with similarly rated opponents, meaning that there would be no incentive to buy packs. So, in a nutshell, OVR based matchmaking would undermine EA’s pack selling business. If anything, EA has an incentive in ensuring that it is worthwhile to spend your money on packs.

Still, bronze-benching believers like Arlington69 argue that EA uses OVR-based matchmaking as a way of making matches more even. But when you think about it, that claim is utter nonsense. EA already is in full control of the stats. There is no reason to build a system of +30 stats creating performance differences between players only to level neutralize those differences through a match making mechanism.

If EA wanted to level the playing field in regard to team stats, the easiest solution would be to simply get rid of stats. Which they haven’t done.

But all the above is of course purely hypothetical. Arlington69’s evidence appears convincing, so I have decided to take a closer look to see whether his claim holds up for closer scrutiny.

Arlington69’s bronze benching study

The basis of Arlington69’s claim is a dataset covering his own matches.

He recorded his and his opponent’s squad rating over the duration of 670 matches. His sample included FUT seasons, Weekend League and Daily Knockout Tournament matches.

He presents his results in the two charts below.

On the left, we see the distribution of rating gaps between Arlington69 and his opponent’s squads. In statistical terms, the blue, bell-shaped curve we are looking at is known as a normal distribution as the opposing squad ratings are normally distributed around Arlington69’s own squad rating.

We see that the most common rating gap is 0, although rating gaps as large as +/- 7 rating points occur. Hence, the most common opponent is an opposing squad with the same squad rating as Arlington69’s squad, but other scenarios happen frequently. We see that a rating gap of 0 occurred in around 20 % of the matches while absolute rating gaps of >=1 happened in 80 % of the matches.

At a glance, this actually doesn’t appear to support Arlington69’s claim. It is quite evidence that he often got opponents that used a squad with a different rating. This could however be a product of availability. FIFA could be picking best available fit in regard to squad rating.

The interesting part of Arlington69’s analysis is however the chart on the right.

To the right, we see a chart labeled “Distribution of opponents based on my rating”. We see that Arlington69 has plotted the rating distribution of opponent squads when using squads at different OVR rating levels. Each colored curve represents matches played by Arlington69 with a specific OVR rating. As an example, the yellow graph shows the distribution when using an 83 rated squad – and so forth.

Although less clear than in the chart on the left, we see that the colored bell-shaped curves are inclined to be normally distributed around Arlington69’s own squad ratings. For instance, the yellow curve peaks sharply at exactly 83 which was Arlington69’s own squad rating for the matches represented in that curve.

The result is indeed surprising, considering that EA claims that it is all about bringing your best squad to the pitch: At a glance, it indeed would seem that the most likely opposing squad is a squad with the same rating as yours, no matter what squad you are using. Indeed this is exactly what we would expect to see if the game uses OVR based matchmaking.

The question is however whether the results presented by Arlington69 in fact do deviate from what we would expect to see if the game doesn’t do OVR based matchmaking.

What normal looks like

Let me start by establishing that few FIFA players are stupid/masochistic enough to take a 65-rated squad into action while few will be wealthy enough to run a 95+ squad. In fact, the huge majority of players use average squads. If we were to conduct a survey of all squads currently in use in online matches, we most likely would see that squads are normally distributed around a certain average.

Therefore, a matchmaking algorithm that doesn’t consider OVR would lead to that a player using an average squad would see the opposing squads being normally distributed around his own squad. So provided that Arlington69 was using an average squad, the chart on the left looks perfectly normal.

Time matters

But what about the chart on the right, then? It is quite indisputable that Arlington69 got lower rated opponents when using a lower rated squad himself and vice versa. How could this possibly be normal?

There is a third factor to consider here – namely time. Arlington69’s sample consists of 670 matches. And you obviously don’t play 670 matches in a matter of days. In fact, Arlington69 played his matches over several months.

And during that time frame, he improved his squad gradually as did his average opponent.

Hence, a possible explanation to the results presented in the chart on the right is that each of the colored graphs in fact represents matches played at different points in time.

The mere fact that his own OVR rating appears to be correlated with the opponent OVR rating doesn’t in itself allow us to conclude that those two things are causally connected, right?

https://medium.com/@groksock/causation-vs-correlation-a6f940759d11
False cause fallacy

A look at the raw data

The above statements are so far mostly hypothetical. But I am able to back them up with facts taken from Arlington69’s own data.

Arlington69 kindly provided access to his raw data set. Although the sample doesn’t contain match dates, it happens to be divided into a pre patch and a post patch section. The patch in question was title update 6 released January 24th 2018. Therefore, we know for certain that all pre patch matches were played before January 24th whereas post patch matches were played after that date.

With that tiny piece on information, we are able to conduct a simple experiment allowing us to test our hypothesis: Namely that Arlington69’s OVR rating over time improved in roughly the same tempo as his opponents’ OVR ratings.

All we need to test the hypothesis is to compare Arlington69 and his opponent’s average squad ratings pre and post patch.

The results can be seen below:

Pre patch Post patch
Own rating 83,0 84,5
Opponent rating 82,8 84,7

Between pre patch and post patch, Arlington69’s own average squad improved from 83.0 to 84.5 (Δ +1,5), while the opposing squad average improved from 82.8 to 84.7 (Δ +1,9). In other words, Arlington69’s squad improved in roughly the same tempo as his average opponents’ squad ratings.

Of course, this observation doesn’t rule out Arlington69’s squad rating is causally connected with average opponent squad rating.

But it does fit very well with the hypothesis that both variables grew because of a third variable, namely the general growth in squad ratings over time as improved items are released. And in addition to that, one has to keep in mind that the exact reasons that allowed Arlington69 to improve his squad were present for all his opponents.

While we so far can’t rule out that bronze benching works, we can establish with certainty that Arlington69’s experiment doesn’t prove that it does. As long as there is another possible explanation to the results, we don’t have proof.

So, does bronze benching work?

But we are not done yet.

In the table below, I calculated average opponent squad ratings for each rating level used by Arlington69. I also divided the calculation into a pre patch and a post patch section and included 95 % confidence intervals.

Pre patch Post patch
Own squad Avg. opp. squad  Matches Avg. opp. squad  Matches
81  82.4 +/- 0.8 47  N/A 1
82  82.9 +/- 0.4 110  82.9 +/- 0.5 41
83  82.9 +/- 0.3 61  84.7 +/- 0.9 23
84  82.9 +/- 0.3 148 84.7 +/- 0.5 99
85  83.2 +/- 0.7 32 84.7 +/- 0.5 53

What you should notice above is that, no matter what squad Arlington69 used pre patch, the average opposing squads were rated approximately 82.9. And no matter what squad he used post patch, the average opposing squads were rated around 84.7.

In other words, Arlington69’s own squad rating in fact isn’t correlated with the opposing squad rating. The only factor that does influence the rating of the opposing squads is whether the matches are played pre patch or post patch.

82-rated squads appear to stand out as an exception, but that has a natural explanation: A closer look at the sample sheets indicates that 82-rated matches pre patch were played in close succession, although on each side of the patch. Therefore, the average opposing squad ratings didn’t differ between those two measurements.

Conclusion

Although Arlington69 ends up concluding that bronze benching works, a more thorough analysis of his data leads to the exact opposite conclusion: Bronze benching has no impact on what opponents you are matched up against in the game modes included in his sample.

The graphic representation of the entire sample below is perhaps the simplest way to illustrate why I arrive at that conclusion.

Especially when we look at the pre patch section, we see that Arlington69’s use of different squads had no impact on the ratings of the opposing squads. We also see that later in the year (after the patch on Jan 24th), there is a gradual improvement in the squads, but this applies both to Arlington69 and his opponents. And the most likely reason is the release of TOTS and other special items. This is perhaps what (mis)lead him to his conclusion, but aside from the fact that both variables grow, they clearly aren’t causally connected.

On a last note, I need to state that the sample mixes different game modes. Different game modes potentially means different matchmaking methods. However, it is likely that we would have seen an effect if for example Weekend League used squad rating based matchmaking, even though Seasons most likely doesn’t. Thus, I consider it most likely that bronze benching doesn’t work in any of the game modes included here.

One thought on “Fact check: Does bronze benching work?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: