Building Your Own Fantasy Projection Model: A Practitioner's Reference
Most projection models fail not because the builder lacked math skills, but because they made architectural decisions in the first hour that quietly distorted every number that followed. This reference covers the structural components of a from-scratch fantasy projection model — what goes in, how the pieces connect causally, where the real tradeoffs live, and what the most persistent misconceptions cost in practice. The scope covers redraft-season modeling for the major North American professional sports leagues, with NFL examples used as the primary illustration throughout.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps (non-advisory framing)
- Reference table or matrix
Definition and scope
A fantasy projection model is a structured quantitative system that converts real-world player performance inputs into expected fantasy point totals for a defined future period — typically a single game, a week, or a full season. The output is always conditional: projected against a specific opponent, in a specific scoring format, under a specific set of assumed game conditions.
The scope matters more than most builders acknowledge. A model that projects NFL wide receivers for 0.5-PPR scoring in a standard redraft league is a fundamentally different instrument from one designed for best ball projections, where variance and ceiling are weighted differently than floor. Conflating these use cases produces outputs that are technically correct and practically useless.
The operational boundary of a well-scoped model includes four declared parameters: (1) the sport and position group being modeled, (2) the scoring format and its exact point-per-stat values, (3) the projection horizon (single game vs. rest-of-season), and (4) the data vintage — meaning how recent the inputs are and how the model handles information decay. The scoring format impact on projections is substantial enough that a tight end who ranks 8th in standard scoring can rank 4th in full-PPR, based solely on target volume receiving a multiplied weight.
Core mechanics or structure
Every projection model, regardless of sport or sophistication, rests on three structural layers.
Layer 1 — Baseline rate estimation. This is the historical performance rate for a player at a given role and opportunity level. For an NFL running back, this means rushing yards per carry, receptions per target, and touchdowns per red-zone carry — computed over a trailing window, typically 16 to 32 games, with older data discounted. The statistical inputs for fantasy projections that feed this layer determine the ceiling on model accuracy before any adjustment is applied.
Layer 2 — Opportunity projection. Rate statistics are worthless without a volume estimate. A receiver averaging 12 yards per target needs an expected target count before the model can produce fantasy point estimates. Opportunity variables include projected snap percentage, target share within the offense, and overall team offensive volume — itself derived from implied game totals. Vegas lines and fantasy projections are the canonical input here: an implied team total of 28 points versus 21 points represents a 33% difference in projected scoring environment before any player-level adjustment.
Layer 3 — Adjustment factors. These are modifiers applied to the rate-times-volume baseline: matchup difficulty, weather, injury status, pace of play, and coaching tendencies. Each adjustment should be expressed as a percentage modifier, not an additive constant, so its effect scales proportionally with the baseline. A 12% negative matchup adjustment on a player projected for 18 fantasy points produces a meaningfully different output than the same -2.2 point flat deduction applied across all volume levels.
The three layers multiply together — they don't add. A mistake in Layer 1 compounds through Layers 2 and 3. This is why backtesting projection accuracy against historical results is the only honest way to identify which layer is producing systematic error.
Causal relationships or drivers
The causal chain inside a projection model runs from team-level context down to player-level output, not the other way around. A player doesn't generate opportunity — a scheme and a game script do.
The primary causal drivers, ranked by explanatory power in published NFL research, flow as follows: (1) offensive line quality and its effect on rushing success rate, (2) quarterback skill as measured by completion percentage over expected (CPOE), a metric tracked by NFL Next Gen Stats, (3) target share stability within an offensive system, and (4) individual skill as expressed through yards after contact, yards after catch, or separation rate. Snap count and target share data sits at the junction of team context and individual opportunity — it's the single variable that most reliably predicts week-to-week point production when held against historical norms.
Causal confusion is the source of most model-building errors. Touchdowns, for example, are largely a function of red-zone opportunity and red-zone efficiency — two variables that are themselves partially independent and partially correlated. A player with a 28% red-zone target share who scores 0 touchdowns across 4 games is experiencing variance, not a true signal change. Treating that absence as a downward revision to the player's baseline rate is a regression error with a name: it's the kind of instinct that regression to the mean in fantasy addresses directly.
Classification boundaries
Projection models split into three meaningful categories based on methodology:
Deterministic models produce a single point estimate per player. Simple, interpretable, fast to compute. The tradeoff is that a single number conceals all distributional information — a player projected at 18 points might have a tight distribution (14–22) or a wildly wide one (2–34), and deterministic outputs cannot tell the difference.
Probabilistic models produce a distribution — mean, standard deviation, and ideally a full percentile curve. The projection confidence intervals this enables are qualitatively more useful for decisions involving risk, like DFS lineup construction or trade evaluation.
Machine learning models use pattern recognition across large historical datasets to generate predictions without explicit hand-coded causal assumptions. The machine learning in fantasy projections page covers the specific architectures in use. The short version: ensemble tree methods (gradient boosting, random forests) consistently outperform neural networks on tabular sports data where sample sizes are in the thousands of player-seasons, not the millions.
Tradeoffs and tensions
Complexity versus interpretability. A model with 47 adjustment variables may produce marginally better mean projections than a 12-variable model, while becoming impossible to audit when it outputs a number that looks wrong. The comparing projection systems literature suggests that simpler models fail more transparently — which is often worth more than the last 2% of accuracy.
Recency weighting versus sample stability. Heavier recency weights capture true change faster — a receiver who just inherited a target share spike due to injury gets updated quickly. But they also amplify noise: a 3-game hot streak that reflects variance, not skill, gets overweighted. The right weighting function depends on what the model is projecting and how stable that variable is year-over-year. Sample size and projection reliability sets the empirical floor for how many data points each variable class requires before a recency-weighted signal is meaningful.
Specificity versus generalizability. A model tuned tightly to one position group — say, starting pitchers — will outperform a generalist model on that position. But it requires separate architecture for every position, multiplying maintenance burden by 8 to 12 position groups across major sports.
Common misconceptions
Misconception: More data inputs always improve accuracy. Adding correlated inputs (e.g., both air yards and average depth of target, which measure nearly the same phenomenon) doesn't add information — it adds multicollinearity that can inflate standard errors and destabilize coefficients. Each input should explain variance not already explained by existing variables.
Misconception: Accuracy is measured by hit rate on individual projections. A model that projects every player at the position median will achieve a high percentage of "close" projections while being useless for ranking and differentiation. Mean absolute error across a full week of projections, benchmarked against a naive baseline (e.g., trailing 4-week average), is the correct performance metric.
Misconception: Matchup adjustments are large. Across published NFL and NBA fantasy projections research, defensive matchup quality accounts for roughly 5 to 15 percent of explained variance in weekly fantasy output, depending on position. Volume and role — the opportunity layer — dwarf matchup in predictive weight.
Misconception: Injury-adjusted projections require medical expertise. The meaningful variable is not the injury diagnosis — it's the expected snap percentage and role change conditional on the player being active. The injury adjustments in projections framework handles this through opportunity-down models, not medical classifications.
Checklist or steps (non-advisory framing)
The following sequence represents the structural build order for a first-generation projection model:
- Define the output unit. Declare the scoring format, point values per stat, and projection horizon before any data collection begins.
- Identify the primary opportunity variable for each position (targets, carries, snaps, plate appearances, minutes).
- Collect historical opportunity data for a minimum of 3 seasons at the player-game level, not player-season aggregates.
- Compute baseline rate statistics using a weighted trailing window — common implementations weight the most recent 8 games at 2x relative to games 9–32.
- Build the opportunity projection from team-level volume estimates (implied totals, pace data) before attaching player-level share assumptions.
- Code adjustment factors as percentage multipliers, not additive constants, applied in sequence to the rate × volume baseline.
- Establish a backtesting framework against at least one prior full season before the model is used prospectively. This is the single step most first-time builders skip.
- Benchmark against a naive baseline — if the model doesn't outperform the trailing 4-week average by measurable MAE reduction, the complexity isn't earning its cost.
- Document every assumption explicitly: weighting scheme, data sources, adjustment magnitude, update cadence.
- Set an update schedule — projection models that aren't updated for injury news, depth chart changes, and role shifts within 24 hours of new information lose their edge disproportionately fast. See projection update schedule for cadence norms.
Reference table or matrix
Model Architecture Comparison by Use Case
| Model Type | Best Use Case | Accuracy Ceiling | Interpretability | Maintenance Burden |
|---|---|---|---|---|
| Deterministic (rate × volume) | Season-long redraft ranking | Moderate | High | Low |
| Probabilistic (distributional) | DFS lineup optimization | High | Moderate | Moderate |
| Ensemble ML (gradient boosting) | Large-sample positional modeling | Highest | Low | High |
| Naive baseline (trailing average) | Benchmarking only | Low | Very High | Minimal |
| Hybrid (ML + manual adjustment) | In-season week-to-week | High | Moderate | High |
Key Variable Importance by Position (NFL)
| Position | Primary Variable | Secondary Variable | Matchup Weight |
|---|---|---|---|
| QB | Implied team total | CPOE (NFL NGS) | 5–8% |
| RB | Snap share + carry share | Team rushing rate | 8–12% |
| WR | Target share | Air yards per route | 10–15% |
| TE | Target share | Route participation rate | 8–13% |
| K | Implied game total | Field goal attempt rate | 3–6% |
The floor and ceiling projections framework maps directly to the probabilistic row in this table — it's a distributional output repackaged for practical lineup decisions.
For a broader orientation to how projection systems are structured and evaluated, the Fantasy Projection Lab home provides the reference context for where position-specific models sit within the full projection architecture.