Simpson's Paradox in College Basketball Shooting Statistics
by Darci L. Kracht
Department of Mathematical Sciences
Kent State University
Kent, OH 44242 USA
MAA MathFest 2003 Contributed Paper Session
"Advances in Recreational Mathematics"
Summary:
In 1992, the American Mathematical Monthly published a short note by Richard J. Freidlander about two actual occurrences of Simpson's Paradox in major league baseball batting averages. Simpson's Paradox is an apparently rare phenomenon where the direction of a statistical association is reversed when a third variable is considered. I have used Freidlander's examples for a number of years in my undergraduate courses and have often wondered how he managed to find them. When teaching a course called "Explorations in Modern Mathematics" two years ago, I decided to investigate for myself.
That semester, we had discussed this phenomenon at some length and so I wished to put a related question on the final exam. I happened to have both a baseball player and a basketball player in the class. Since baseball season wasn't yet over, I began looking at the basketball statistics. After several hours of laborious computations, I was thrilled to discover the following occurrence of Simpson's Paradox.
Kent State Men's Basketball: 2000-2001 Conference Games Only (18 games) | ||||||
---|---|---|---|---|---|---|
  | Trevor Huffman | Bryan Bedford | ||||
  | Made | Attempted | Average | Made | Attempted | Average |
Two-pointers | 57 | 127 | 0.449 | 13 | 30 | 0.433 |
Three-pointers | 35 | 100 | 0.350 | 0 | 1 | 0.000 |
All field goals | 92 | 227 | 0.405 | 13 | 31 | 0.419 |
Subsequently, I found several more examples of Simpson's Paradox when statistics for two-point and three-point point field goals are considered separately and combined. (See some on my web page.) This makes sense because three-pointers are considerably more difficult than two-pointers and, furthermore, certain players (guards) shoot many more three-pointers than others (centers).
However, I continued to wonder about the frequency of Simpson's Paradox in general. I decided to look into the same data set--- shooting percentages in the 2000-2001 men's conference games. Using a C++ program this time, I considered two-pointers, three-pointers, and free-throws, separately and combined. Here I found four occurrences of Simpson's Paradox in addition to the one described above. (This makes 5 paradoxes in 220 comparisons.) Finally, I partitioned the 18 games into two nonempty sets in all possible ways and then computed each player's averages in each set and then all games combined. The number of examples of Simpson's Paradox is given in the following tables..
Kent State University Men's Basketball
2000-2001 Conference Games Only
For 11 players and 18 games (partitioned into two subsets):
All Field Goals | ||||
---|---|---|---|---|
Subsetsizes | Partitions | Paradoxes | Comparisons | Paradox Rate |
1 and 17 | 18 | 0 | 990 | 0.00000 % |
2 and 16 | 153 | 6 | 8415 | 0.07130 % |
3 and 15 | 816 | 29 | 44880 | 0.06462 % |
4 and 14 | 3060 | 101 | 168300 | 0.06001 % |
5 and 13 | 8568 | 287 | 471240 | 0.06090 % |
6 and 12 | 18564 | 643 | 1021020 | 0.06298 % |
7 and 11 | 31824 | 1143 | 1750320 | 0.06530 % |
8 and 10 | 43758 | 1582 | 2406690 | 0.06573 % |
9 and 9 | 24130 | 906 | 1337050 | 0.06776 % |
Two-Pointers Only | ||||
---|---|---|---|---|
Subsetsizes | Partitions | Paradoxes | Comparisons | Paradox Rate |
1 and 17 | 18 | 0 | 990 | 0.00000 % |
2 and 16 | 153 | 2 | 8415 | 0.02377 % |
3 and 15 | 816 | 23 | 44880 | 0.05125 % |
4 and 14 | 3060 | 109 | 168300 | 0.06477 % |
5 and 13 | 8568 | 308 | 471240 | 0.06536 % |
6 and 12 | 18564 | 677 | 1021020 | 0.06631 % |
7 and 11 | 31824 | 1249 | 1750320 | 0.07136 % |
8 and 10 | 43758 | 1752 | 2406690 | 0.07280 % |
9 and 9 | 24130 | 952 | 1337050 | 0.07120 % |
Three-Pointers Only | ||||
---|---|---|---|---|
Subsetsizes | Partitions | Paradoxes | Comparisons | Paradox Rate |
1 and 17 | 18 | 0 | 990 | 0.00000 % |
2 and 16 | 153 | 0 | 8415 | 0.00000 % |
3 and 15 | 816 | 0 | 44880 | 0.00000 % |
4 and 14 | 3060 | 1 | 168300 | 0.00059 % |
5 and 13 | 8568 | 0 | 471240 | 0.00000 % |
6 and 12 | 18564 | 8 | 1021020 | 0.00078 % |
7 and 11 | 31824 | 17 | 1750320 | 0.00097 % |
8 and 10 | 43758 | 22 | 2406690 | 0.00091 % |
9 and 9 | 24130 | 17 | 1337050 | 0.00127 % |
Free-Throws Only | ||||
---|---|---|---|---|
Subsetsizes | Partitions | Paradoxes | Comparisons | Paradox Rate |
1 and 17 | 18 | 2 | 990 | 0.20202 % |
2 and 16 | 153 | 15 | 8415 | 0.17825 % |
3 and 15 | 816 | 69 | 44880 | 0.15374 % |
4 and 14 | 3060 | 201 | 168300 | 0.11943 % |
5 and 13 | 8568 | 499 | 471240 | 0.10589 % |
6 and 12 | 18564 | 952 | 1021020 | 0.09324 % |
7 and 11 | 31824 | 1448 | 1750320 | 0.08273 % |
8 and 10 | 43758 | 1833 | 2406690 | 0.07616 % |
9 and 9 | 24130 | 963 | 1337050 | 0.07202 % |