Simpson's Paradox in College Basketball Shooting Statistics

by Darci L. Kracht Department of Mathematical Sciences Kent State University Kent, OH 44242 USA

Summary:

In 1992, the American Mathematical Monthly published a short note by Richard J. Freidlander about two actual occurrences of Simpson's Paradox in major league baseball batting averages. Simpson's Paradox is an apparently rare phenomenon where the direction of a statistical association is reversed when a third variable is considered. I have used Freidlander's examples for a number of years in my undergraduate courses and have often wondered how he managed to find them. When teaching a course called "Explorations in Modern Mathematics" two years ago, I decided to investigate for myself.

That semester, we had discussed this phenomenon at some length and so I wished to put a related question on the final exam. I happened to have both a baseball player and a basketball player in the class. Since baseball season wasn't yet over, I began looking at the basketball statistics. After several hours of laborious computations, I was thrilled to discover the following occurrence of Simpson's Paradox.

Kent State Men's Basketball: 2000-2001 Conference Games Only (18 games)
Trevor Huffman Bryan Bedford
Two-pointers 57 127 0.449 13 30 0.433
Three-pointers 35 100 0.350 0 1 0.000
All field goals 92 227 0.405 13 31 0.419

Subsequently, I found several more examples of Simpson's Paradox when statistics for two-point and three-point point field goals are considered separately and combined. (See some on my web page.) This makes sense because three-pointers are considerably more difficult than two-pointers and, furthermore, certain players (guards) shoot many more three-pointers than others (centers).

However, I continued to wonder about the frequency of Simpson's Paradox in general. I decided to look into the same data set--- shooting percentages in the 2000-2001 men's conference games. Using a C++ program this time, I considered two-pointers, three-pointers, and free-throws, separately and combined. Here I found four occurrences of Simpson's Paradox in addition to the one described above. (This makes 5 paradoxes in 220 comparisons.) Finally, I partitioned the 18 games into two nonempty sets in all possible ways and then computed each player's averages in each set and then all games combined. The number of examples of Simpson's Paradox is given in the following tables..

Kent State University Men's Basketball 2000-2001 Conference Games Only

For 11 players and 18 games (partitioned into two subsets):

All Field Goals
1 and 17 18 0 990 0.00000 %
2 and 16 153 6 8415 0.07130 %
3 and 15 816 29 44880 0.06462 %
4 and 14 3060 101 168300 0.06001 %
5 and 13 8568 287 471240 0.06090 %
6 and 12 18564 643 1021020 0.06298 %
7 and 11 31824 1143 1750320 0.06530 %
8 and 10 43758 1582 2406690 0.06573 %
9 and 9 24130 906 1337050 0.06776 %

Two-Pointers Only
1 and 17 18 0 990 0.00000 %
2 and 16 153 2 8415 0.02377 %
3 and 15 816 23 44880 0.05125 %
4 and 14 3060 109 168300 0.06477 %
5 and 13 8568 308 471240 0.06536 %
6 and 12 18564 677 1021020 0.06631 %
7 and 11 31824 1249 1750320 0.07136 %
8 and 10 43758 1752 2406690 0.07280 %
9 and 9 24130 952 1337050 0.07120 %

Three-Pointers Only
1 and 17 18 0 990 0.00000 %
2 and 16 153 0 8415 0.00000 %
3 and 15 816 0 44880 0.00000 %
4 and 14 3060 1 168300 0.00059 %
5 and 13 8568 0 471240 0.00000 %
6 and 12 18564 8 1021020 0.00078 %
7 and 11 31824 17 1750320 0.00097 %
8 and 10 43758 22 2406690 0.00091 %
9 and 9 24130 17 1337050 0.00127 %

Free-Throws Only