There was quite a bit of chatter in the press last year (Spring of ’14) about an academic study published in the journal Management Science entitled “Seeing Stars: Matthew Effects and Status Bias in Major League Baseball,” by Jerry Kim and Brayden King. The study analyzes Pitch f/x data for every pitch thrown in all of the 2008 and 2009 Major League games in which the batter did not swing. The data covers 4,914 games, 313,774 at-bats, and 756,848 pitches (non-swinging pitches only). That’s over three-quarters of a million pitches.
Kim & King’s study was not about calling balls and strikes. It focused instead on our unconscious biases that come into play when we make judgments. They simply used three-quarters of a million ball/strike decisions to test their hypothesis that subjective factors can affect these otherwise objective decisions.
So they looked at Pitch f/x data to see how often umpires mistakenly called a ball a strike and vice-versa. The error rate, according to their analysis, is 14.7%. So on nearly one in every eight pitches, MLB pitchers incorrectly undervalue a pitcher’s performance (call a strike a ball) or overvalue his performance (call a ball a strike). When you consider that the average Major League game has close to 300 pitches, and if roughly half of those are called pitches, then we’re talking about roughly 22 mistakes each game.
But even that number becomes somewhat skewed, because (as we learn) baseball superstars (pitchers like Greg Maddux and Felix Hernandez, and hitters like Ted Williams and Pete Rose), what they call “high status players,” are frequently the beneficiaries of these mistakes.
These numbers – this error rate – seems awfully high. Umpire instructors typically say that if you miss four to six pitches then you’ve had a pretty good game. These numbers, then, don’t line up well with the impression among umpires of their own fallibility. That said, it’s hard to argue with the data. Nevertheless, Kim and King are not the first to come up with this error rate for MLB umpires. A number of studies (I’ll talk about some of them in future posts) have also arrived at an error rate between 14 and 15%, or roughly one in every seven or eight pitches.
But Kim & King were interested in performance biases, not umpire error rates. So what they did next was to evaluate these roughly 216,000 umpire “errors” and analyze how they correlate with various factors like right/left-handedness, the race of the pitcher, whether home team or visitor, the current count, the stage of the game, and so forth. And this is where their study gets really interesting. Because the umpire error rate changes pretty significantly when some of these factors (but not all) come into play.
What is the Matthew Effect?
Before going farther, let’s ask, just what is the Matthew Effect and what’s it got to do with crappy calls?
In a nutshell, the Matthew Effect states that performers for whom we expect superior performance (an all-star pitcher, for example, or a league-leading batter) that they tend to be judged more favorably than performers for whom there is no such expectation. In short, we have an unconscious bias in favor of those who’ve performed well historically. Sociologists refer to the Matthew Effect as enabling the sense that “the rich get richer and the poor get poorer.” Academics lament that the Matthew effect leads to judgments that good readers read well and poor readers struggle.
While the Matthew Effect plays a large role in the fields of education and sociology, it has interesting applications in sports as well. Who, for example, hasn’t complained that basketball superstars like Michael Jordan, Larry Bird, and Magic Johnson receive more favorable treatment than others on foul calls? Or that Greg Maddux and Randy Johnson got more strikes on the corner than, say, Danny Darwin or Bill Singer? (Remember them? Didn’t think so.)
Well, King and Kim set out to measure this Matthew Effect (what they call “status bias”) by analyzing the decision making of MLB umpires calling balls and strikes. In their own words, their goal was “to observe the difference in a pitch’s objective quality and in its perceived quality as judged by the umpire.” Three-quarters of a million pitches later, they have some pretty interesting results.
What did the study reveal?
The study revealed a lot. It revealed that there is an expectation that high status pitchers will throw more strikes and that high status hitters are better at evaluating the quality of a pitch as it approaches the plate. Of course, having that expectation is just common sense. We all have it, more or less. What’s not common sense is what happens to our judgments (unconsciously) while swimming in the stew of our expectations. So here are a few of the study’s more interesting outcomes.
- They showed that the home team pitcher experiences a nearly eight percent advantage over visiting pitchers with respect to the “over-recognition” mistakes (that is, mistakenly calling a true ball a strike).
- There are nearly five percent more over-recognition mistakes in the ninth inning than in the first. That’s odd, isn’t it? You would expect the opposite – that is, that accuracy would increase (and the error rate decrease) over the life of the game. But I suppose that’s why we collect the data.
- Here’s my favorite: Umpires are more accurate with right-handed batters than lefties. In fact, left-handed batters have a 41 percent greater likelihood of getting a mistaken call than right-handed batters. Of course, this high frequency of mistakes include both over-recognition errors (balls called strikes) as well as under-recognition errors (strikes called balls), so many of these (they don’t say how many) cancel each other out. To me, though, it’s incredibly interesting that there is such a vast difference here. We know from experience that the view for lefties is different from our view for righties, but I’m surprised that the difference in the viewpoint results in such a large difference in the error rate.
- No surprise here: The closer the pitch to the edge of the zone, the greater the likelihood of a mistake. We all know that. And I’ll have more to say about this in the next section of this post.
- Here’s where it gets good (and this shouldn’t be surprising, though many of us will deny it): The count (balls and strikes) at the time of a given pitch has a huge influence on the error rate. The odds of mistakenly calling a strike is 62 percent lower with an 0-2 count; with a 3-0 count, on the other hand, the likelihood of getting a mistaken strike call is 49 percent higher. This certainly validates that an 0-2 zone shrinks, while a 3-0 zone expands.
- Several pitcher-related factors influence the error rate, including the number of years a player has been in the major leagues and the reputation for control (or wildness) for a given pitcher. (For control/wildness, the researchers used a pitcher’s base-on-balls percentage.) A pitcher’s ethnicity, it turned out, has no effect on the error rate, but pitcher’s status (measured arbitrarily by the number of All-Star game appearances) had a very definite impact. Pitcher’s with five All-Star appearances enjoyed a 15 percent greater chance of getting mistaken strike calls (over the baseline), and a 16 percent advantage over non-All-Star pitchers in getting such favorable calls.
- The “under-recognition” errors (that is, a true strike called a ball) follows a similar pattern. High-status pitchers are less likely to experience these errors. While the baseline across all pitchers showed the under-recognition error rate of about 19 percent. For pitchers with five All-Star appearances, this drops two full percentage points. That’s not a huge amount – that is, until you count each pitcher’s total number of pitches in a season; then do the math.
- The Matthew Effect works for batters, too. Again, using the number of All-Star appearances as a proxy for status, high-status batters get a 1.3% bump for each All-Star appearance in both types of error. The two error types combine for a nearly three percent advantage for each of a batter’s All-Star appearances. So a five-time All-Star has (statistically) a nearly 15% advantage in combined under- and over-recognition errors. Of course, a high-status batter facing a high-status pitcher (who is getting his own favorable errors) is going to have some of this effect nullified.
- Finally, Kim and King analyzed error rates for 81 MLB umpires that called more than 1,500 pitches over the two seasons and (as you’d expect) detected patterns for given umpires. The image below is taken directly from the article and is, unfortunately, difficult to read. But it nevertheless shows that MLB umpires vary noticeably one from another. The predominant tendency (shown in the upper left quadrant) is to over-recognize high status while at the same time under-recognizing low status. That’s tough on the rookies. Only a handful of MLB umpires over-recognize low status.
So what does this all mean?
Your initial reaction to all of this information might be a rather jaw-dropping recognition that umpires are really bad at calling balls and strikes. That was my first reaction … and that reaction was the impetus for my deciding to write this post: To basically cop to our fallibility behind the plate. Because if the pros are missing one in eight pitches, what does that suggest about the rest of us? I’m suddenly very grateful that Pitch f/x is not installed at my fields.
But in the roughly two weeks that I’ve been working on this post, I’ve softened. And my softening has to do with item #4, above: That the closer the pitch is to the strike zone, the higher the likelihood of an error. In other words, it’s all about the corners.
In their study, the researchers used a variable they called “distance,” which is measure of the number of inches from the border of the strike zone of a given pitch. They go on to report that “…the relationship between distance and over-recognition exhibits a non-linear relationship with a rapid decline in the odds of mistake as distance increases” (p. 27). What they mean by “non-linear relationship” is that the error rate does not decline gradually as the distance from the edge of the zone increases. The relationship is not proportional. Rather, the error rate changes very rapidly as the distance begins to increase. This means that there is a measurable error rate within an inch or two right at the corners, but then the errors decline very significantly as you get to three and four inches from the corner.
And what does that mean? Well, it means what we all know already – that the corners are soft. Pitches off the plate are pretty easy to call. Pitches in the meat of the zone are pretty easy to call. And all of us get almost all of those right almost all of the time. But the corners are different, especially the pitches that hit two corners (down and in, down and up, up and away, low and away). In fact, I would guess (the data doesn’t tell us) that most of the under-recognition errors (true strikes called balls) are at the four corners, where the strike zone gets rounded just a tiny bit.
Here’s another look at Pitch f/x data on MLB batters. This is from The Hardball Times and this article (written by Jon Roegele) focuses on the expanding strike zone (particularly at the bottom). But for our purposes the point is the obvious rounding of the zone at the four corners. That, as you can plainly see, is where a great many (probably most) of the under-recognition errors occur. Even the “expanded” 2014 strike zone shows the rounding effect.
So what about the over-recognition errors? Where are those errors coming from? Well, I can’t say for certain, but I suspect they’re on the outside of the zone, a ball to a ball-and-a-half out, roughly six to eight inches above and below the vertical center of the zone. In other words, right around the belt, but just off the plate.
So the MLB umpires we watch and model on aren’t so damn bad after all — or so I say.