|
April 10, 2002 Fun With CORREL
Three years ago, I wrote an article for this page suggesting that the correlation between wins and payroll isn't nearly as high as most people believe, and is in fact relatively insignificant. As usual, the reaction to that article was an assortment of laughing, jeering, heckling, name-calling and accusations of mental illness and/or drug abuse. Last year, I attempted to prove my theory by comparing the number of wins between the top-third and bottom-third of MLB teams in terms of payroll. In that study, I showed that the difference between those two groups has been historically insignificant, except in 1998 and 1999, when revenue-sharing removed the incentive for teams with low payrolls to compete. Scot Zook countered this study by presenting one of his own, comparing the number of wins between the top three and bottom three teams by payroll. His study showed a very high correlation. I've since realized that studies of this type can have drastic results depending upon where you draw the line between "big market" and "small market." Recently, I made a discovery that has made studies like these much more accurate, much more meaningful, and much easier. All this time, I've been talking about "correlation", but I haven't really thought about what that word means. Correlation is really a statistical term that is measured with an actual mathematical formula that looks something like this: Fortunately, you don't have to be Will Hunting to use this formula. All you need is a version of Microsoft Excel that includes the handy-dandy "CORREL" function. Using this function of Excel, we can determine the true correlation between two sets of numbers in a matter of seconds. A correlation of -1.0 would indicate a perfect negative correlation (in other words, teams with high payrolls win fewer games than teams with low payrolls.) A value of +1.0 would indicate a positive correlation (teams with high payrolls win more games than teams with low payrolls.) A value of 0 would indicate no correlation whatsoever. In general, a correlation between +.50 and -.50 is statistically meaningless. When you're talking about a sample size of only 30 teams, for example, a correlation of +.55 or -.55 can be generated purely by random chance. Given that explanation, here is a graph showing the correlation between payroll and wins for every season since 1977: This graph is interesting, to say the least. As fans, we've been led to believe by Selig and his minions (and assorted media toadies) that competitive imbalance exists, and that it is a new phenomenon caused by escalating player salaries. But this graph shows that if this correlation exists, it isn't anything new. The correlation between wins and payroll was just as high in 1977 and 1978 as it was in 1998 and 1999. And for the most part, there simply hasn't been a significant correlation at all over the past 25 years. In fact, over the past 25 years, the correlation between wins and payroll has only been statistically significant in seven of those years (and I'm being generous by including 1995's .55 correlation and 1979's .58 correlation.) If this "problem" is so terrible and so urgent that it requires immediate drastic restructuring of baseball's economic system, why has the correlation between wins and payroll been lower in the last two seasons than it has been in the past decade? Clearly, the correlation between wins and payroll appears to be a rolling wave, and not a sustained or increasing "problem." If I were to theorize as to why this correlation is a rolling wave instead of a steady trend, I would guess that this is due to paradigm shifts in the way that baseball executives build and maintain their rosters. 1977 was the dawn of the free agency era, and there weren't that many teams in baseball that understood the immediate financial benefit of adding free agents to their rosters. But once more teams began to accept this revolutionary new concept, the field of play evened out, and teams that spent more than others no longer held as much of an advantage. The late-90's saw a drastic spike in this correlation, as revenue-sharing began to reward teams that kept their revenues to a minimum. In order to lower their revenues, teams slashed payroll beyond reasonable limits, decisions were made based solely upon payroll and on-field performance suffered as a result. Then, a few years ago, the Oakland A's introduced the baseball world to a few revolutionary new concepts (well, new to Major League Baseball, but not to Bill James disciples) on how to build a winning ballclub without spending a ton of money. As with most things in life, success breeds imitation, and we're now witnessing several clubs (San Diego, Toronto, Seattle, San Francisco and Houston) who are all following the Oakland model of success, resulting in a lower correlation between wins and payroll. As time rolls on, I suspect we'll see several more teams adopt the Oakland model until it gets to the point where it is no longer advantageous to follow that model and more revolutionary new ideas will be needed to regain that advantage. So, what does all of this prove? Is there a correlation between wins and payroll? Statistically, yes there is. Is it a significant correlation? Statistically, it is not. For as long as I have argued this point, I have also been saying that all Major League teams ride an ever-rolling "wave of prosperity" in terms of wins and payroll. Again, this statement has been met with ridicule and abuse time and time again. I also tried to prove this theory by presenting anecdotal evidence of teams like the Indians, Braves and Mets that have enjoyed sustained periods of both success and failure over the past twenty years. Now, for the first time, with the help of our handy-dandy CORREL function, I present irrefutable statistical proof of my theory. Below is a graph showing the correlation of 2001 wins to the number of wins in every past season down to 1977: As you can see, the correlation drops below significant levels immediately. The correlation between 2001 wins and 2000 wins is just .50 - right on the borderline of being significant. It takes only four years to drop to the neighborhood of zero, which indicates no correlation whatsoever. In other words, teams that are winning today have no correlation whatsoever to teams that were winning four years ago. By 1994, the correlation begins to venture into negative territory, signifying that the teams that are winning today are the same teams that were losing back in 1994. And from there, the numbers remain hovered between -.55 and +.05. What does this mean? It means that, despite the popular opinion of the majority of cynical baseball fans living in places like Kansas City and Minneapolis, the same teams do not always win every year. Teams that are winning today were losing ten years ago. And teams that are losing today are likely to be winning ten years from now. Lastly, let's do the same study as we did above, but for payroll: My theory of a "rolling wave of prosperity" applied to both wins and payroll, and this graph seems to support that theory - only to a lesser degree than wins. I expected to see the correlation dip below zero after six or seven years, just as we saw with wins, but that is not the case. The correlation stopped being significant around 1994 and settled into the -.06 to .34 range beginning in 1992. All of which means that teams that are spending the most money today have no significant correlation to the teams that were spending the most money 10, 15 and 25 years ago. Today's big spenders are not, however, yesterday's cheapskates. In conclusion, the CORREL function has proven that my theories from three years ago were right on the mark. There is little significant correlation between wins and payroll, and there is evidence of a rolling wave of prosperity in both wins and payroll throughout recent Major League history. |