As a motivating example of Simpson’s Paradox, Ken Ross in his 2004 book A Mathematician at the Ballpark presents the interesting case of the batting averages of David Justice and Derek Jeter in the mid ’90s. In each year examined, Justice had the better batting average:
| 1995 | .253 to .250 |
| 1996 | .321 to .314 |
| 1997 | .329 to .291 |
So who had the better batting average over the range of 1995 to 1997? Unlike the results for each year, the answer is Derek Jeter (.300 to .298).
Simpson’s Paradox happens when a trend in grouped data disappears or is reversed when the groups are aggregated. As related to the above example, Jeter had just 48 at bats in 1995, so his .250 average represented just 12 hits, while Justice’s average represented 104 hits on 411 at-bats. Note that the averages of both players were significantly lower in 1995 than in the other two years, so when combined with Justice having only 140 at bats in 1996 (all other player-years had over 500), the combined total makes more intuitive sense.
An underlying idea that leads to this paradox is that rate statistics (e.g. X as a percentage of Y) obscure data that is meaningful for aggregating, when examined alone. In the example, the rate of hits to at bats (batting average) introduces a confounding variable of at bats per baseball season when grouped by year. This variable is meaningful when aggregating years.
Edward Hugh Simpson received a degree in mathematics at Queen’s University Belfast in 1942, at the age of 19. With the second world war going on, Simpson immediately joined the famous Bletchley Park cryptanalysis lab, working first on Italian encryption codes and then on the Japanese JN-25 cipher. After the war, Simpson enrolled in postgraduate studies at Cambridge, and his 1946 paper which established a formulation of the paradox was published in the Journal of the Royal Statistical Society in 1951. After postgraduate studies, Simpson joined the civil service and served in various departments and chairs for the UK government until his retirement in 1982.
