An explanation of our methods and why we emphasize the stats we do. In general, we try to combine the best available statistics with more “intuitive” methods that are actually hiding sound analytical concepts beneath the surface.
The Sandlot Test
The “Sandlot Test” is far and away our least rigorous and most subjective form of analysis, however we think there’s tremendous value to engaging in it. When comparing two players, the Sandlot Test asks “If I were choosing players for my team and I had first pick, which of these two players would I choose?”
This is an informal way to engage in “forced preference” or “forced choice” decision-making, a method of clarifying a complex equation by focusing on the most significant criteria. By reducing the the analysis down to a question of “which player will give me the greatest chance to win?” it’s an qualitative way of trying answer the same question that WAR tries to answer quantitatively.
The Who’s Who Test
In the “Who’s Who Test” the reader is presented with a statistical comparison of comparable players (position, era, etc.) and challenged to identify which stat line belongs to which player.
While this is a seemingly informal form of analysis, it’s essentially a “nearest neighbor” analysis in reverse – by presenting the reader with close statistical comparisons without identifying information it prevents existing perceptions from coloring conclusions. Whereas the Sandlot test builds from statistical specifics to a qualitative conclusion, the Who’s Who test strips away the qualitative narratives surrounding players and leaves only the purely quantitative analysis behind.
How We Use Defensive Metrics
Properly measuring defensive performance is currently one of the great challenges of baseball analysis. In the near future, the FIELDf/x system will track the movement of every player on every play, providing a wealth of information for analysis, but for right now, the information available is more limited. Things get even more sparse when going back beyond even the last few years, making sound analysis of past defensive performance very difficult.
Because of this, we are very careful with our use of defensive stats. We’d rather forgo an area of analysis rather than use shaky info to reach a false conclusion. We’ll only be using defensive stats when all the various metrics are in agreement or when a player’s defense is so important to the analysis that we’d be doing a greater harm by ignoring it.
Even when we do use defensive metrics, we’ll always be very explicit about how they’re being used and what possible “blind spots” might exist to color or change our conclusions.
Why use WAR v. Other Stats?
For an explanation of the stat itself, see the glossary entry on the different versions and how they’re calculated.
Ah, WAR – the only thing in baseball argued about more than balls and strikes. WAR is the attempt to create a baseball Grand Unified Theory. Because the calculation of the stat is the most complex of any advanced stat, and there are multiple versions, and each version measures the various aspects a bit differently, this results in multiple WAR totals for a single player.
Because of this, people often dismiss WAR as a “made-up” stat, when what it really going on is that it’s so complicated that most people cannot recreate easily it on their own and it becomes overly-abstract and inaccessible (it’s often called a “black box” statistic because of this). For some, the fact that batting average is the same no matter if its on ESPN.com or MLB.com, and that they can pull out a calculator and double-check the number, makes that stat more “real.”
To this I’d say two things. One, WAR is trying to capture every single contribution a player makes in every single aspect of the game – of course it’s going to be complicated! Anyone who’s played the game knows there are so many little ways that a player can help their team win, trying to measure all of them is going to be a gigantic task.
Two, the argument for WAR’s value comes down to what exactly the point of stats is. We don’t keep stats because it provides numbers with we can then “play with” – we keep stats because they reflect what happens on the field. Stats give fans a way to quantify the game that’s actually played, and the better that a stat reflects excellent performance, the better that stat is. That’s why so many MLB front offices now have “stat geeks” crunching numbers, because those guys help the team win, and at the end of the day, that’s what matters – it’s all about the W.
To illustrate this point, here’s a list of the top ten teams in wins in 2014, 2004 and 1994 (the strike shortened year that, among other things, killed the then first-place Expos) cross-referenced against the top ten teams in rWAR and fWAR, as well as in more traditional stats such as batting average, runs scored, and team ERA.
Take a look at how often the top ten teams in wins match up to the top ten teams in WAR and how much less they match with the top teams in batting average or ERA.
If you were trying to predict which teams won the most in a given year, WAR would be a lot more useful than runs scored or team ERA, wouldn’t it?
The fancy way to say it is that WAR has a higher “correlation coefficient” than other stats – the simple way to say it is that WAR does a much better job of capturing all the things that help teams win than other stats do.
A team can have a mediocre pitching and still win (e.g., 2004 Yankees), a team can hit like crazy and still lose (essentially every Rockies team ever) but a team with a high total WAR? Well they’ve got players that are contributing across the board, and we can see how all of those contributions show up in the win column.
Why use wRC+ v. Other Stats?
One of the the great advantages that advanced statistical analysis gives us is the ability to figure out precisely how much a given act on the field contributes to a team’s win (or loss). Intuitively most fans know that a double is not “worth” twice as much as a single in terms of helping a team win, but without a whole lot of data, we can’t know exactly how much more that double is worth. Well, now we have that “whole lot of data” and in the case of run production, it’s used to create the stat known as wRC+ (see here for a definition of the stat).
When examining a player’s performance, we’ll always look at all the available stats (in particular, OPS+ and wOBA), but we’ll often treat wRC+ as a sort of “King Stat” as far as offense is concerned. The reason for that is because wRC+ takes into account all of the various outcomes of a plate appearance and weighs them according to how much they correlate to the production of runs. In addition, because stat takes into account all the league average offensive stats for each specific year the result is useful for comparing players from different eras.
wRC+ therefore gives an easy to use stat that accounts for all offensive production, the league standards in era in which the player played, and presents that number in a form easily understood by any fan (even if the calculations aren’t easy to do in your head).
Why use FIP & xFIP?
Every watch a game and think “Oh man, if our defense didn’t suck, this pitcher could have a shutout going”? Ever wonder how much St. Louis Cardinals pitchers in the 1980s benefited by having Ozzie Smith playing behind them? Or how much Giants pitchers benefited from Willie Mays prowling centerfield?
Well, defense-independent pitching metrics try to answer that question by only looking at the things that a pitcher has within his direct control. There are two major variations of this stat that we use here: Fielding Independent Pitching (FIP) and Expected Fielding Independent Pitching (xFIP), and we go into more detail on the calculations in the glossary.
These stats strip away almost all of the results from action after the ball is put into play (i.e., all the stuff “out of the pitcher’s hands”) by focusing primarily on strikeouts, walks, and homeruns (the three primary results that are just about the contest between the pitcher and the batter.
These stats have been shown to be a better predictor of a pitcher’s future performance than the previous seasons’ ERA, which means that they better reflect the pitcher’s actual ability and performance, and minimize the effects of a bad defense or bad luck (a pitcher that gives up ten singles with the bases empty can easily give up, a pitcher that gives up ten singles with runners on third gives up ten runs – in both cases, the pitcher’s performance was essentially the same “the pitcher gave up ten singles”).
Because they help to strip away the effect of events outside the pitcher’s control, they can help to paint a fuller and clearer picture of a pitcher’s performance.