Jake Arrieta Will Win the NL Cy Young, but Clayton Kershaw Should

After notching his 20th win of the season, many are pegging the Cubs’ Jake Arrieta as the favorite for the National League Cy Young award. We think he’ll win, but we also think he shouldn’t. Clayton Kershaw has been the best pitcher in the National League, in some ways even better than he’s ever been.

The narrative surrounding the 2015 NL Cy Young award has been interesting. Many writers have been framing it as a two pitcher race between Jake Arrieta, the Chicago Cubs emerging ace, and Los Angeles Dodgers #2 Starter (#1B?), Zack Greinke. Why that’s interesting is that neither of them is likely the best pitcher in the National League – Clayton Kershaw is.

ESPN’s debate on the award reduces Kershaw to a footnote in the half of the article dedicated to Greinke. Sports Illustrated declares that “For all intents and purposes, this is a three-way tie” and then talks about Kershaw for one half of one paragraph.

Some more traditional outlets don’t mention Kershaw at all and even Bleacher Report, a site that generally moves beyond the basic counting stats, makes it a contest of Arrieta v. Greinke…and oh yeah, Kershaw.

Why? Well, Arrieta’s got those 20 wins, and he’s got his “moment” game (the no hitter where he “redefined dominance”), and Kershaw seems to have slipped from the lofty perch he’s inhabited the past few seasons.

Is Arrieta actually better than Kershaw, though? The best way to answer that is to get below the ‘surface stats’ and look at exactly how a pitcher helps his team win.

The Contest Between Pitcher v. Batter (and the stats That measure it)

One of the things we love about baseball is that in many ways it’s an individual sport that is only won by team play. The core competition at the heart of every play is the battle between the hitter and the pitcher. No matter what happens on the field, after every play things reset and the game returns to the pitcher and the hitter.

Within the pitcher v. hitter contest, there are only a few outcomes directly under a pitcher’s “control”. Of all the possible outcomes of an at bat, only strikeouts, walks, hit-by-pitch, and home runs do not involve the pitcher’s teammates on defense. Because of this, a number of stats have been developed to try and measure the pitcher’s contributions by focusing upon those four outcomes.

Fielding Independent Pitching (FIPFielding Independent Pitching. A measure of the events “under the pitcher’s control” (HR, BB, Ks, HBP) that attempts to remove the influence of team defense – whether good or bad – on the pitcher’s stats. It is scaled like ERA, so a “good” ERA number is also a good FIP score.) is one of the most common. Whereas ERA measures outcomes that involve the pitcher’s defense (whether good or bad), FIP take the four outcomes above, weighs them according to their relationship to runs allowed, and then scales the result like ERA (meaning a number that would be good for ERA is also good for FIP, and league average FIP is equal to league average ERA in a given season).

The concept behind FIP (and why it’s a better measure of a pitcher’s performance than ERA) is demonstrated easily by a hypothetical:

Say I have a team with a league average shortstop. If I were to replace that league average shortstop with Ozzie Smith in the prime of his career, that team’s defense would improve and every pitcher’s ERA would likely go down. However, the pitchers didn’t actually get better (the shortstop did), so that improvement in ERA would be misleading. FIP addresses that issue by removing defense-dependent outcomes.

Expected Fielding Independent Pitching (xFIPExpected Fielding Independent Pitching. A measure of the events “under the pitcher’s control” (HR, BB, Ks, HBP) that attempts to remove the influence of team defense – whether good or bad – on the pitcher’s stats. xFIP differs from “regular” FIP by calculating home runs at a league average rate rather than actual, in an attempt to reduce seasonal variations in HR/FB ratio (since over time, that regresses to the mean). It is scaled like ERA, so a “good” ERA number is also a good FIP score.) takes the concept behind FIP and tries to adjust for seasonal variations in HR/FBHomeRun to FlyBall percentage. A measure of how often fly balls turn into homeruns for that particular player. League average is usually between 10-11%. Most pitchers average between 9-12% for their careers while hitters have a much larger span, with power hitters reaching rates of 15-20% or more, and slap hitters averaging as low as 1-2%. A large deviation in this stat from a player’s career norm is a red flag that they’re either getting lucky or unlucky over the course of a specific season.. A pitcher generally gives up home runs at a roughly constant rate over their career (some pitchers are a bit better than league average, some worse), but that rate can fluctuate greatly for a single season. xFIP attempts to control for that by calculating FIP with the home run total based on a league average HR/FB rate rather than the pitcher’s actual rate.

Another hypothetical to address why that’s done: In a given season a pitcher may only give up 40 long fly balls. Because that’s a relatively small sample size, there can be fluctuations in result not related to skill. For instance, in one season, 10 of those fly balls fall just a few feet short of the wall, but in the next, the wind is blowing out more often and those 10 fall just a few feet into the stands. That’s a big difference in home run total (say 35 v. 25), but in reality, the pitcher has been consistent with 40 long fly balls per season. xFIP seeks to adjust for that.

The ESPN article above dismisses FIP and xFIP because “People can argue that FIP or xFIP are better measures of a pitcher’s efforts in a vacuum, but when was the last baseball game that got played in one of those?” I’m not sure I understand Mark Saxon’s conception of the game is if he thinks strikeouts and walks occur “in a vacuum” rather than “on the field in the foundational contest of a baseball game.”

Pictured: Clayton Kershaw and Chipper Jones on a baseball field in a stadium filled with air.

Saxon’s contention is that “a pitcher’s value to his team is in actual run prevention, not expected run prevention” and we wouldn’t disagree for a second. The thing is, the pitcher himself primarily prevents runs through the things directly under his control (i.e., the things measured by FIP and xFIP), not what his team does after the ball is put in play. The whole purpose of FIP is that it actually measures the things the pitcher directly does to prevent runs, not just whether a run was scored while the pitcher was on the mound.

The true test of any stat is how closely it matches up with what actually happens on the field. At the end of the day, FIP and xFIP are far better predictors of future pitcher performance than ERA is, which strongly (very strongly) suggests that they are better measures of what makes pitchers successful or unsuccessful.

Other Useful Stats When Evaluating Pitchers

One valid criticism of FIP and xFIP is that pitchers do actually affect balls that are put into play in small ways. For instance, strikeout pitchers generally induce weaker contact, so even when balls are put into play on a strikeout pitcher it’s more likely to turn into an out than for another pitcher. Ground balls also turn into hits more often than fly balls, but the more ground balls a pitcher induces, the easier it is for his defense to turn them into outs and they also produce more double plays.

Skill-Interactive ERA (SIERASkill-Interactive Earned Run Average. An attempt to improve on Fielding Independent Pitching (FIP) by accounting for balls in play and the likely outcomes of those plays. It attempts to account for all of the complex ways in which a pitcher might prevent runs (e.g., ground balls turn into hits more often than fly balls, but become extra base hits less often). It builds on the strength of FIP and xFIP (focusing on the things the pitcher directly affects) and accounts for the complexity of how pitchers affect the types of balls in play (and how those different types turn into outs or runs). It’s also adjusted for park effects.) is a statistic that attempts to adjust for just those factors. It takes FIP and adds in calculations for how pitchers do affect balls in play and how those balls in play turn into outs (or runs). It’s only a bit more predicative than FIP or xFIP, but it’s more comprehensive and takes into account more of what happens on the field (both hallmarks of “good” stats).

A final stat that is useful for evaluating pitchers is Game Score. Game Score is a metric developed by Bill James to measure the quality of a pitcher’s start. Here’s how it’s calculated:

Start with 50 points.

Add one point for each out recorded, so three points for every complete inning pitched.
Add two points for each inning completed after the fourth.
Add one point for each strikeout.

Subtract four points for each earned run allowed.
Subtract two points for each hit allowed.
Subtract two points for each unearned run allowed.
Subtract one point for each walk.

A start with a game score of 60 is considered a quality start. A game score of 70 is great, 80 is excellent, 90 is incredible, and 100 is legendary. For reference, only 12 nine-inning games in MLB history have had a game score of 100 or more (three of those were perfect games and four others were no hitters). Only fifteen pitchers in history have at least 10 games with game scores of 90 or more.

Game Score gives us a much fuller picture of how a pitcher pitched and how the final outcome was achieved. It also goes a decent ways toward answering our question: If Kershaw’s been so much better than Arrieta in the pitcher v. batter contest, why don’t his ‘surface stats’ look better?

Kershaw’s Been Unlucky

The first thing to note is how little help the Dodgers offense has given Kershaw. Kershaw has only received 3.84 runs per game, which is 13th lowest in the NL. Arrieta has received 4.13 runs per game in support, which is exactly league average (though below the Cubs’ per game average) and almost a 1/3 of a run better than Kershaw.
Though September 24th, Kershaw had pitched 31 games and of those 31, 23 have received a game score of 60 or more (74%). Of those 23 games, Kershaw received a no decision in six of them and took the loss in three more. 23 Quality starts and Kershaw has only 14 wins to show for it.

He gave up one earned run five times, two earned runs twice, and three earned runs twice; that’s 63 innings with an ERA of 2.14 and 78 Ks against 10 BBs; and for that, he ended up with an 0-3 record to show for it.

Arrieta, on the other hand has only three starts where he has put up a game score of 60 or better and came away without a win. If the Cubs were converting Arrieta’s quality starts into wins at the rate Kershaw’s Dodgers have, his 19 quality starts would translate into only 11.5 wins instead of the 16 wins that Arrieta’s actually received in those games.

Once we move beyond the raw win total, Kershaw’s case becomes quite clear.

Kershaw’s Been Better (and it’s not actually that close)

Despite what clickbait listsicles might suggest without actually comparing Arrieta to other pitchers, he hasn’t been the best pitcher in the National League this year. Yes, he has the most wins and yes, he has a lower era than Kershaw (though not as low as Grienke), but those are basically the only two stats where Arrieta is better than Kershaw.

As of today (September 26th), both pitchers have pitched 31 games. Kershaw has more innings pitched than Arrieta (despite that same clickbait listsicle arguing one major point for Arrieta’s value is that he’s “been more durable” because he’s pitched more innings – yes, that was true when he had more starts, good math).

Kershaw has struck out more hitters by a wide margin (281 to 220) and walked fewer (41 to 48). As we discussed above when we talked about FIP/xFIP/SIERA, those are the two most significant outcomes completely within the pitcher-batter contest, and Kershaw’s better at both.

Those raw numbers translate into Kershaw striking out almost two and a half more hitters per nine innings (11.50 to 9.17), walking 1/3 fewer hitters per nine (1.68 to 2.00), and having a K/BB ratio that’s almost 50% better than Arrieta’s (6.85 to 4.58). In fact, Kershaw’s the only pitcher in the National League with a strikeout rate over 30%, posting a crazy 33.1% rate. That means that 1/3 of the hitters that come to the plate against Kershaw walk back to the dugout without even putting the ball into fair territory.

That 33.1% rate isn’t just the best in the National League this season, it’s the best in MLB (Chris Sale has a 32.3% strikeout rate, the only other pitcher over 30%). And not only is that 33.1% rate the best in MLB this season, it’s the best rate in MLB since Randy Johnson posted a 37.4% rate in 2001. It’s also the best rate of Kershaw’s career.

There have only been eight seasons in MLB history where a pitcher has posted a 33+% strikeout rate while pitching at least 150 innings. Five of those nine belong to Randy Johnson, two of them belong to Pedro Martinez, and one of them belongs to Kerry Wood. Kershaw is currently on pace to join them as only the ninth pitcher to do so and of those other eight seasons, only two saw the pitcher posting a walk rate within a percentage point of Kershaw’s 4.8% or a K:BB ratio within 1.5 of Kershaw’s 6.85.

Those two seasons, the only other two in history where a pitcher did what Kershaw is doing this season? Pedro Martinez in 1999 & 2000, aka the two most dominant pitching seasons in MLB history.

Now, Kershaw hasn’t quite been to that level this season. Pedro posted strikeout rates of 37.5% and 34.5% with walk rates of 4.4% and 3.9%, resulting in K:BB ratios of 8.46 and 8.88 (the 2nd & 3rd highest ratios for any season with 200+Ks) and Kershaw’s “only” putting up 33.1%, 4.8%, and a 6.85 K:BB ratio. However, when you’re using the phrase “Only the 1999 & 2000 version of Pedro Martinez had seasons with better numbers”, that’s saying something.

Kershaw is on Top of Almost Every Leaderboard

Arrieta is tops in wins and second in ERA and WHIP. Greinke is tops in ERA, WHIP, and (currently tied for) second in wins. Other than those two stats, it’s hard to find one where Kershaw isn’t first. Kershaw leads the National League in:

Innings pitched (220, several folks right behind)
Average Game Score (67.2, Arrieta’s 2nd with 66.5)
Strikeouts (281, Max Scherzer is 2nd with 249)
K/9 (11.50, Scherzer is 2nd at 10.59)
K% (33.1%, Scherzer is 2nd at 29.6%)
K-BB%Strikeout Percentage – Walk Percentage. Probably the best measure of both a pitcher’s strikeout skill and their control (and ability to prevent walks). When comparing pitchers with significantly different walk rates, it better reveals a pitchers power/control combination than K:BB ratio does (since that can be skewed by particularly good control, even with middling strikeout numbers). (28.3%, Scherzer is 2nd at 25.9%)
FIP (2.11, Arrieta is 2nd at 2.45) & xFIP (2.17, Arrieta is 2nd at 2.70)
FIP-Adjusted Fielding Independent Pitching. A measure of the events “under the pitcher’s control” (HR, BB, Ks, HBP) that attempts to remove the influence of team defense – whether good or bad – on the pitcher’s stats. It is then adjusted for park effects and related to league average FIP. 100 is league average and every 1 point deviation from 100 is a percentage point better or worse than league average. E.g., an FIP- of 90 means the pitcher’s FIP was 10% better than league average, and an FIP+ of 110 means it was 10% worse. (55, Arrieta is 2nd at 62) and xFIP-Adjusted Expected Fielding Independent Pitching. A measure of the events “under the pitcher’s control” (HR, BB, Ks, HBP) that attempts to remove the influence of team defense – whether good or bad – on the pitcher’s stats. xFIP differs from “regular” FIP by calculating home runs at a league average rate rather than actual, in an attempt to reduce seasonal variations in HR/FB ratio (since over time, that regresses to the mean). It is then adjusted for park effects and related to league average FIP. 100 is league average and every 1 point deviation from 100 is a percentage point better or worse than league average. E.g., an xFIP- of 90 means the pitcher’s xFIP was 10% better than league average, and an xFIP- of 110 means it was 10% worse. (56, Arrieta is 2nd at 69)
SIERA (2.31, Scherzer is 2nd at 2.77)

Essentially, of all the stats that measure the things actually under the pitcher’s control, Kershaw is the best in the league by a wide margin. The only stat where he isn’t is walk rate, where his 4.8% is 6th in the league. In those stats where Arrieta or Greinke lead, Kershaw is 6th in wins with 15, 3rd in ERA (2.25 to Greinke’s 1.65 and Arrieta’s 1.88), and 3rd in WHIP (0.91 to Greinke’s 0.85 and Arrieta’s 0.90).

By the way, how incredible was Max Scherzer’s first half if even after his second half swoon his name’s mentioned so often above? I’d bet second in the league in a ton of stats with a 2.98 ERA (2.96 FIP and 3.05 xFIP) is the type of “disappointing season” pretty much every pitcher in MLB would take (except maybe Kershaw).

Well, What About Grienke?

Grienke was the putative favorite for a good chunk of the season. Running off his scoreless innings streak, putting up a microscopic era, allowing fewer baserunners than any other pitcher – as we said above, a lot of writers are framing this purely as a race between Arrieta and Grienke. Thing is, Greinke’s been the third, maybe fourth, maybe even fifth or sixth best pitcher in the National league this year.

He’s got the gaudy win percentage (18-3 record), he’s got the teeny-tiny ERA (1.65), and he’s got the lowest WHIP in the majors driven by his excellent walk rate (4.6%, 4th in the NL). Problem is, he’s got little other than those three stats and a whole lot of luck.

Greinke has posted a BABIPBatting Average on Balls In Play. As the name suggests, this is the percentage of balls put into play (either by a hitter or by a pitcher’s opponents) that turn into hits. League average BABIP is about .300 and when a player is deviating from their career average by a significant amount, that’s often a red flag they’re getting either lucky or unlucky over the course of a single season. of .231 – that’s his lowest career mark by 36 points and is a full 68 points below his career average. That’s a ridiculously lucky season. He’s also posted a HR/FBHomeRun to FlyBall percentage. A measure of how often fly balls turn into homeruns for that particular player. League average is usually between 10-11%. Most pitchers average between 9-12% for their careers while hitters have a much larger span, with power hitters reaching rates of 15-20% or more, and slap hitters averaging as low as 1-2%. A large deviation in this stat from a player’s career norm is a red flag that they’re either getting lucky or unlucky over the course of a specific season. rate of 7.1%, the second lowest of his career and well below his career average of 9.1% (league average this season is 11.4%, so Greinke’s generally better than average, but not this much better).

That’s why even though his ERA is insanely low at 1.65, his FIP is 2.77, his xFIP is 3.27, and his SIERA is 3.31; when we remove the effect of his defense and luck and only look at the things he can control, he’s right in the middle of the top-10 pack (behind both Kershaw and Arrieta). If we look at Greinke’s other stats, the idea that he’s been the best pitcher in the National League starts to melt away.

Greinke’s 11th in strikouts (185 to Arrieta’s 220 and Kershaw’s 281), he’s 18th in strikeouts per nine innings (8.02) and 15th in strikeout rate (23.5%). By virtue of his low walk totals he still posts a K/BB ratio of 5.14, but that’s good for 6th in the National League and is still 1.7 behind Kershaw. His FIP is 5th in the NL, his xFIP is 10th (which reflects the luck in his HR/FB rate), and his SIERA is 8th.

Greinke’s one of the best pitchers in the National League this year. He just hasn’t been the best.

Why Arrieta Will Win Anyway

Short answer: Because voters are tired of giving the Cy Young to Kershaw.

Longer answer: Because voters are tired of giving the award to Kershaw (3 of the past 4 years, and he might have deserved it in the 4th as well), and because Kershaw appears to have slipped some (even though he hasn’t), and because there’s a pitcher in the NL who’s appears as good (but isn’t), and because that pitcher has his “moment” game (his August no-hitter), and because that pitcher has 20+ wins.

I found it fitting that Cubs manager Joe Maddon compared Arrieta to David Price in 2012 because the “Price over Verlander” vote seems to mirror this one in many ways.

In 2012, Verlander was better than Price in most statistical categories, but it was close. In addition, Verlander was coming off of his Cy Young + MVP season (just like Kershaw), and had gone from “insanely better than everyone else” to “just better than everyone else”. Further, Price, like Arrieta had reached the “magical” 20-win mark.

Despite statheads (and even some mainstream writers) everywhere celebrating Felix Hernandez’s 2010 Cy Young as the “death of the win” as a significant statistical measure, it’s still got cache. In fact, for all the talk, 7 of the 9 most recent Cy Young awards not given to Felix Hernandez were given to a pitcher with 20 or more wins (the only two times it didn’t happen there were no 20-win pitchers in that league).

Obviously, sometimes the pitcher with the most wins is actually the best pitcher in the league, but that’s the case far less often than Cy Young voting would suggest and it’s not the case this year in the National League. Arrieta is good, almost as good as Kershaw, and if he wins it won’t be a travesty or anything. In the grand history of Cy Young votes, it’ll actually be a pretty good one. But it’ll still be giving the award to the second best pitcher in the league instead of the best.

That guy right there is the best, if we hadn’t made that clear yet.