TL;DR
- What: Day-ahead probabilities for NHL & NBA markets (moneyline, totals, asian handicaps).
- Why: To learn, showcase my engineering & machine-learning craft, test a production-grade sports modeling pipeline.
- How: Time-series machine-learning + strict MLOps.
What does "day-ahead" mean?
Data Cutoff: I will try to publish predictions one day before actual game day.
This means the predictions will not include data updates that occur closer to game time, however, this delay is intentional to help protect my edge.
Motivation
This project started as a curiosity: can a well-engineered machine learning system consistently match or outperform market closing lines using only on-court performance data? The general consensus is that this is extremely difficult, and because backtests often rely on simplifying assumptions, any claim of edge must be treated probabilistically.
My primary goal is to learn and explore: time-series modeling, feature engineering, and execution strategy optimization under realistic constraints. This dashboard acts as a transparent, versioned prediction system where anyone can evaluate model quality over time.
Methodology (High-level)
I avoid leaking exact features. Below is the shape of the system.
- Targets: Home/away team scoring distributions. Market probabilities (moneyline, totals, spread/puck line) are derived from these distributions.
- Feature families (not specifics):
- Team form, schedule density, pace/tempo, box-scores, play-by-play data.
- Lagged scores, various rolling averages (time-aware).
- No market data used.
- Modeling: Fast kernel approximation into richer non-linear space, to capture interactions and smoothness. Then I use classical machine learning estimators.
- Validation: Walk-forward cross-validation with strict leaks checks and time safe joins.
Versioning & data
- One JSON per game in
docs/data/predictions/namedYYYY-MM-DD_HOME_AWAY.json. - Matching results JSON per game in
docs/data/results/. Published post-match, usually, day after the match is played. - CI builds manifests and aggregated metrics on deploy.
Simplistic Flow-chart
A[Daily ETL] --> B[Feature Build]
B --> C[PostgreSQL Storage]
C --> D[Model Predict (day-ahead)]
D --> E[Post-Processing]
E --> F[Artifacts: JSON, charts, etc.]
F --> G[Publish new page version]
Data access & use
- All JSONs that power this site live in the repository and are versioned. If you want the data, please clone or pull the repo rather than scraping this site.
- Paths:
docs/data/predictions/anddocs/data/results/. CI publishes/updates them on deploy. - Polite use policy: no scraping, crawling or hot-linking of JSON endpoints. If you need programmatic access, sync the repo locally or vendor the files.
- License: data is provided under
CC BY-NC 4.0(non-commercial); contact for commercial use.
Why? Scrapers add load and can break consumers when structure changes; git history is stable, verifiable, and bandwidth-friendly.
Disclaimers
This site is for informational and entertainment purposes only. It is not investment or betting advice.
Follow & updates
Release notes and model changes are announced here: