Fact check: Does bronze benching work?

Share

In a recent Reddit post, Arlington69 claims to have evidence confirming that bronze benching works. Allegedly, using a bronze bench means that you will get easier opposing squads even in game modes like seasons and weekend league. Does bronze benching really work? We decided to investigate.

An introduction to bronze benching

Bronze benching is the not-so-noble art of pretending that your team is worse than it really is. Due to the way FIFA calculates the squad overall rating, adding low rated bronze players as subs will lower the overall team rating considerably. As seen in the example below, a full TOTS team with a 91-rated starting 11 becomes 84-rated when adding the lowest rated bronze players as subs.

People use bronze benches because they believe that FUT’s matchmaking algorithm uses the squad’s overall rating as a parameter when picking your next opponent. If this assumption is correct, my centre team below will be more likely to get matched up against the squad on the right than the much scarier squad on the left, which of course would make a big difference.

There is however a couple of very good reason to be skeptical about the notion that bronze benching leads to easier opponents in game modes like FUT seasons and Weekend League.

First and foremost, it contradicts official information released by EA sports officials. On earlier occasions, EA has stated explicitly that FUT seasons is about bringing your best team to the pitch.

Second, it’s difficult to imagine a viable motive for EA. EA makes millions selling packs. Iff teams are matched based on overall rating, the advantage of owning better players would be eliminate, which in turn would make it meaningless to spend money on packs or buy players in general. I’m aware that some players believe that EA uses OVR-based to make matches more fair, but that argument doesn’t make sense. If EA felt that a team full of world class players gave you too much of an advantage, a much simpler and cheaper solution would be to make the performance gap between various cards smaller.

But having said that, Arlington69’s evidence appeared convicing. And at the same time, there mere fact that something doesn’t make sense, doesn’t mean that it cannot be real. The chemistry glitch didn’t make a whole lot of sense, and it was still a very real feature – until it was patched. So, despite my immediate skepticism, I decided to give Arlington69’s analysis a sanity check.

Arlington69’s bronze benching study

Arlington69 recorded his and his opponent’s squad rating over the duration of 670 matches. His sample included FUT seasons, Weekend League and Daily Knockout Tournament matches.

His conclusions are largely based on the charts below, which are supported by a bit of explanatory text in the original Reddit post.

In the chart to the left, “Average Distribution of opponents rating compared to mine”, he plotted the difference in stat points between his squad and the opposing squad. He notes that 75 % of his opponents were within 2 stat points of his own squad.

The most interesting chart is however the chart on the right, titled “Distribution of opponents based on my rating”. In this chart, he breaks his sample into five graphs – one per squad rating level he has used. The visual representation shows that Arlington69 got lower rated opponents when he used his 82 rated squads (light blue) than when he used his 86 rated squads (dark blue). That would indeed seem to suggest that the game picks lower rated opponent squads when you use a lower rated squad yourself, which ultimately would imply that bronze benching will get you easier opponents as well.

There are however a couple of problems with this analysis.

Reliability and validity

A cornerstone principle in science is that research methods need to be reliable and valid. Reliability means that sample(s) must be sufficiently large to ensure that a repetition of the study won’t lead to a different conclusion. Validity means that our methods need to measure the right things. Arlington69’s experiment has serious flaws in both respects. In regards to reliability, it’s a problem that no statistical tests were conducted. Especially when Arlington69 divides his sample into sub sections, statistical inaccuracy should be a concern. The biggest concern is however the validity of the applied method as I will explain below.

The results fit with the opposite conclusion as well

In the complete population of FIFA players, few are stupid/masochistic/etc. enough to take a 65-rated squad into action while few will be wealthy enough to run a 95+ squad. Therefore, the huge majority of players use average squads.

And because of that, it’s likely that a survey of all squads used in online matches art any given time will reveal that the opponent squads are normally distributed around said average. Under those circumstances, completely random matchmaking will lead to that a player using an average squad (like most of us) will see that the opposing squads are normally distributed around his own squad.

In other words, the chart on the left looks exactly as we would expect in a scenario where the game picks random opponents, meaning that average opponent squad rating isn’t dependent on your own squad rating. It is therefore not possible to conclude anything about whether opponent squad rating is dependent of own squad rating based on the chart on the left.

Time matters

But why did Arlington69 then find that he got lower rated opponents when using a lower rated squad, i.e. the colorful chart on the right? A likely reason is that his sample is based on matches played over several months, meaning that not only his own squad but also the avwrage opposing squads improved between match #1 and match #670.

A detail that Arlington69 clearly hasn’t considered is that most players improve their squads over the year as prices drop due to the release of new special cards. This means that if matchmaking is completely random, we would expect the average opponent’s squad to be higher rated in May than it was in November.

In other words, the chart on the right looks exactly as we would expect it to look if matchmaking was random.

A look at the raw data

The problem I raise here is not purely hypothetical.

Arlington69 kindly provided access to his raw data, and although the sample doesn’t contain match dates, it happens to be divided into a pre patch and a post patch section. The patch in question is the kick off glitch patch (title update 6) released January 24th 2018. Therefore, we know for certain that all pre patch matches were played before January 24th whereas post patch matches were played after that date.

And when I compare the average ratings of squads used by Arlington69 and his opponents pre and post patch, I see exactly what I expected above:

Pre patch Post patch
Own rating 83,0 84,5
Opponent rating 82,8 84,7

What we see above is that between pre patch and post patch, Arlington69’s own average squad rating grew from 83.0 to 84.5, whreas the opposing squad average grew from 82.8 to 84.7.

This observation doesn’t rule out that his own squad rating is causally connected with average opponent squad rating, meaning that bronze benching would work. But it does fit very well with the hypothesis that both variables grew because of a third variable, namely the general growth in squad ratings over time as improved items are released. After all, it would be quite strange if the very reasons that allowed Arlington69 to improve his squad weren’t present for his opponents. Given these circumstances, we obviously can’t conclude that bronze benching works based on the analysis carried out by Arlington69.

Does bronze benching work?

Arlington69’s analysis is far from bulletproof, but his data set is systematic and large. And because of that, his effort hasn’t been in vain in terms of reaching a conclusion in regards to whether bronze benching works or not.

In the table below, I have inserted average opponent squad ratings for each rating level used by Arlington69 on both sides of the patch. I also included 95 % confidence intervals.

Pre patch Post patch
Own squad Avg. opp. squad  Matches Avg. opp. squad  Matches
81  82.4 +/- 0.8 47  N/A 1
82  82.9 +/- 0.4 110  82.9 +/- 0.5 41
83  82.9 +/- 0.3 61  84.7 +/- 0.9 23
84  82.9 +/- 0.3 148 84.7 +/- 0.5 99
85  83.2 +/- 0.7 32 84.7 +/- 0.5 53

What we see is that, no matter what squad Arlington69 used pre patch, the average opposing squads were rated approximately 82.9. And no matter what squad he used post patch, the average opposing squads were rated around 84.7.

In other words, absolutely nothing suggests that your own squad rating influences the opposing squad rating. The only factor that does influence opposing squad ratings is time, because squads tend to improve over time.

82-rated squads appear to stand out as an exception, but that has a natural explanation: A closer look at the sample sheets reveals that 82-rated matches pre patch were played close to each other, meaning that the average opposing squad ratings didn’t differ singificantly between those two measurements.

Conclusion

Although Arlington69 ends up concluding that bronze benching works, a more thorough analysis of his data leads to the exact opposite conclusion: Bronze benching has no impact on what opponents you are matched up against in the game modes included in his sample.

The graphic representation of the entire sample below is perhaps the simplest way to illustrate why I arrive at that conclusion.

Especially when we look at the pre patch section, we see that Arlington69’s use of different squads had no impact on the ratings of the opposing squads. We also see that later in the year (after the patch on Jan 24th), there is a gradual improvement in the squads, but this applies both to Arlington69 and his opponents. And the most likely reason is the release of TOTS and other special items. This is perhaps what (mis)lead him to his conclusion, but aside from the fact that both variables grow, they clearly aren’t causally connected.

On a last note, I need to state that the sample mixes different game modes. Different game modes means different matchmaking methods. However, it is likely that we would have seen an effect if for example Weekend League used squad rating based matchmaking, even though Seasons most likely doesn’t. Thus, I consider it most likely that bronze benching doesn’t work in any of the game modes included here.

%d bloggers like this: