I acquired all of the Division I team data since 2002, and from that we can observe some fascinating trends and data relationships in the data. This is a multipart series exploring some of that data.
In Part 23 I discussed how PPS and PPM could be used to predict Winning Percentage. The Measure of Fit (R-squared) figures for those equations were outstanding, 0.98 and higher. Predicting tournament success in a one-and-done tournament is a much riskier game, however.
In order to compare methods, all of the stats since 2002 were used to create multiple regression analyses to estimate the round number reached for each of the 1,401 tournament teams in the period. (Because offensive rebounding figures have only been available from the NCAA since 2015, PPMDIF was calculated using 543 tournament teams.)
Here are the resulting regressions {R-Squared}:
TR = Tournament Round Reached (Round of 64=1, National Champion = 7)
PPSDIF = Smith Points Per Possession Differential
PPSDIF = Modern Points Per Possession Differential
KPEm = KenPom Efficiency Margin
KPSOS = KenPom Strength of Schedule (NCSOSAdjEm)
First of all, this is for entertainment purposes only. Second of all, these R-squares have a lot of error. When you are below 0.800, you have to be really careful!
That said, it appears that while Smith's Points Per Possession figure alone is the best for predicting Winning Percentage, it pales to using the Modern method or KenPom data. The fourth equation, with its highest R-squared, is the best combination of stats I have found so far, for predicting tournament success. You can combine the current KenPom ratings with this equation, but the results are pretty much the same order. Therefore filling out a bracket straight with KenPom ratings is probably better than going with straight chalk. (Don't ask me for advice on filling out brackets. I get worse at it every year, it seems.)
Next up: Points Per Possession against Winning Percentage (Smith vs. Modern)
In Part 23 I discussed how PPS and PPM could be used to predict Winning Percentage. The Measure of Fit (R-squared) figures for those equations were outstanding, 0.98 and higher. Predicting tournament success in a one-and-done tournament is a much riskier game, however.
In order to compare methods, all of the stats since 2002 were used to create multiple regression analyses to estimate the round number reached for each of the 1,401 tournament teams in the period. (Because offensive rebounding figures have only been available from the NCAA since 2015, PPMDIF was calculated using 543 tournament teams.)
Here are the resulting regressions {R-Squared}:
- TR = 18.84511 * PPSDIF Rsqr=0.692
- TR = 13.3857 * PPMDIF Rsqr=0.720
- TR = (13.32646 * PPMDIF) + (0.022699 * KPSOS) Rsqr=0.724
- TR = (0.118572 * KPEm) + (0.01895 * KPSOS) Rsqr=0.771
TR = Tournament Round Reached (Round of 64=1, National Champion = 7)
PPSDIF = Smith Points Per Possession Differential
PPSDIF = Modern Points Per Possession Differential
KPEm = KenPom Efficiency Margin
KPSOS = KenPom Strength of Schedule (NCSOSAdjEm)
First of all, this is for entertainment purposes only. Second of all, these R-squares have a lot of error. When you are below 0.800, you have to be really careful!
That said, it appears that while Smith's Points Per Possession figure alone is the best for predicting Winning Percentage, it pales to using the Modern method or KenPom data. The fourth equation, with its highest R-squared, is the best combination of stats I have found so far, for predicting tournament success. You can combine the current KenPom ratings with this equation, but the results are pretty much the same order. Therefore filling out a bracket straight with KenPom ratings is probably better than going with straight chalk. (Don't ask me for advice on filling out brackets. I get worse at it every year, it seems.)
Next up: Points Per Possession against Winning Percentage (Smith vs. Modern)