Sunday, November 15, 2009

More Marathon analysis

Several weeks ago I posted upon the results of a half marathon that I recently ran. Since then, the marathon season has run it's biggest races of the fall, including Chicago, New York, and DC's Marine Corps. I haven't had the pleasure of running any of these trials (I'm hoping for a debut next fall). I've been encouraged to see that more analysis has come from watching those races.

One of the most past along was this post by Paul Kedrosky of Infectious Greed asking why the wining finish times at New York have been less volatile overtime. The post provides a great question with little answer. I've discussed this disparity with several runners (including one who finished in the top 400 males runners at Boston this year) and several hypothesis have struck us.

I. Weather. The New York and Boston marathons are run at different times of the year and weather maybe in more flux in the Boston spring than the New York fall.
II. Competing Races. The New York marathon competes with Chicago, DC, and Detroit for runners. Boston competes with London, in 2010 the two are a week apart. As a result New York may attract a wider international field and Boston may have a more American appeal.
III. Prize money disparity. I've checked and this explanation seems unlikely. Boston paid out$150,000 to winning runners and New York paid out $130,000 to winners. Unless the evolution of the two races' prizes has been different, I'd expect that Boston with the higher prize would have less volatile times as it should consistently attract top talent.

Analysis of this type is a test, evaluate, and reject kind. There are lots of possible reasons but probably only a few that hold up to rigorous analysis.

Speaking of rigor, I've done a bit more work on the half-marathon I ran and owe readers an update. I originally provided a simple linear regression on gender and age. I tested other specifications and did not get any more significant explanitory power. Well, at least not statistically. Here's a case where common sense needs to play an important role. The model that I originally reported implies that someone of the lowest age should run fastest, for example that a 8-year old boy should best a 21-year old man. This makes little sense. Instead the model should and now does include a variable for the square of age.

The new regression is shown below.
This regression is only for men. The linear line is the original formulation. The upward-sloping curve is the squared regression. This new model has two important implications. First, it estimates that a man should run his fastest time at 23.4 years of age, eliminating the problem of superfast pre-teens. Second, at older ages, beginning around age 55, the curve increases much faster than the line. The squared model indicates that at each age above 23 each additional year of age adds more time to finish than the last year. As a result the move from age 60 to 61 is much more significant than 24 to 25.

Also plotted on the figure is my finish time. In my modest defense, I'd like to say that the models are best fit. The full data shows that I'm still within a dense cloud of finish times. I'm taking all this as an incentive for improvement rather than a reason to retire.

No comments: