We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
A unique and highly practical system for identifying good and bad bets at the major Southern California Thoroughbred racetracks is created and analyzed. A probability model for each individual race is created; a function of odds, Morning Line, each horse's past performances, current trainer and jockey, and miscellaneous factors depending on type of race. A continuous response variable, "Perf", (a numerical performance estimator) is used as the response variable in the regression analysis performed. After obtaining new estimates for Perf, Monte Carlo methods were then implemented to calculate probabilities of each horse's 1st, 2nd, 3rd, or 4th place finish. Horses were then grouped according to Odds, and reports were generated to analyze results and calculate Expected Values. To find the numerous hidden factors and patterns that only occur under specific conditions, numerous subsets of races and horses were anayzed using hundreds of covariates. A Baseline of probabilities is created using a simple model based mainly on odds of a horse. Then the final model probabilities resulting from the estimated regression parameters equation are compared to the baseline probabilities. Those that differ significantly are separated into two groups: Estimated probabilities higher than the baseline's are considered profitable bets "Overlays", while those less than the baseline's are "Underlays" (unprofitable bets). Each group is displayed in the odds-based report format. 10 3/4 years of horse-racing data is used with 8 3/4 years set up as "Regression" Dataset and the two most recent complete years as "Testing" Dataset. Of primary interest is the Profitability or Expected Values of each group. In sum, various parameters and wagering options are analyzed for their positive or negative affects on profitability.