Data Sources Used in Fantasy Projection Lab Models

The accuracy of any projection system lives or dies by what it feeds on. Fantasy Projection Lab pulls from a layered stack of public and licensed data streams — play-by-play logs, injury reports, environmental feeds, and betting market signals — and the choices made at this input layer shape every number that reaches a roster decision. This page details what those sources are, how they interact inside the modeling pipeline, and where the meaningful tradeoffs begin.

Definition and scope

A "data source" in the context of fantasy projections is any structured feed, database, or real-time signal that the model uses as an input variable. This is a broader category than most fantasy managers realize. It encompasses historical box score data going back multiple seasons, real-time snap count and target share logs (covered in depth at Snap Count and Target Share Data), injury and practice participation designations, stadium and weather conditions, and the implied totals embedded in NFL and NBA betting markets.

The scope matters because different sources answer different questions. Historical performance data answers "what has this player done?" Market signals answer "what does the aggregated sharp-money crowd expect to happen?" Weather feeds answer "under what physical conditions?" Injury feeds answer "will this player be on the field at all, and at what capacity?" No single source can answer all four questions, which is why a robust projection model requires all of them simultaneously.

How it works

Data enters the projection models through a structured ingestion process with five primary source categories:

  1. Play-by-play and box score databases — Historical game logs from the NFL, NBA, MLB, and NHL, typically sourced from league-licensed data or public repositories like nflfastR, which provides granular play-level data for NFL games dating back to 1999. Variables extracted here include target share, air yards, snap percentage, rushing attempt distribution, and usage rate splits.

  2. Injury and practice participation reports — Official league injury designations (Questionable, Doubtful, Out, IR) published by teams ahead of each game. These feed directly into injury adjustments in projections, where a player carrying a "Questionable" tag may see a 15–25% probabilistic reduction applied to their baseline projection depending on position and injury type.

  3. Betting market lines and totals — Vegas-implied game totals and team point spreads sourced from public sportsbooks. As detailed in Vegas Lines and Fantasy Projections, a team's implied team total is one of the strongest single predictors of offensive volume for skill-position players in a given week.

  4. Weather and environmental data — Temperature, wind speed, and precipitation data from meteorological APIs, particularly relevant for outdoor NFL stadiums. Wind speeds above 15 mph have a statistically measurable negative effect on passing volume, as documented in referenced sports analytics research.

  5. Usage and tracking metrics — Advanced data including route participation rates, average depth of target, and defensive coverage assignments, sourced from providers like Pro Football Reference and Basketball Reference for historical context.

Common scenarios

In-season weekly projections draw most heavily from sources 2 through 5 — injury reports, market lines, weather, and recent usage trends. A running back's projection for Week 9, for instance, might be anchored to a 3-week rolling snap count average, adjusted downward if he's verified Questionable with a knee designation, and adjusted upward if his team's implied total sits at 28.5 points or higher.

Preseason and dynasty projections flip the weighting. With no current-week injury report and no active betting line to consult, historical play-by-play databases and aging curves carry more weight. The difference between in-season and preseason methodologies is substantial enough that In-Season vs. Preseason Projections treats them as functionally separate modeling problems.

DFS single-game slates represent the most data-intensive scenario, because small sample variance matters more in a one-game context. These projections lean harder on target-share micro-data and Vegas alternate lines than season-long models typically require.

Decision boundaries

Not all data sources are created equal, and the most consequential modeling decisions involve knowing when to trust one signal over another.

Recency vs. sample size is the central tension. A receiver who logged a 35% target share across 3 games is not the same signal as one who averaged 22% over 16 games. The Sample Size and Projection Reliability framework governs how aggressively recent data is weighted against longer historical baselines.

Market signals vs. statistical models diverge most sharply when a team's offensive line undergoes a sudden change — something betting markets often price in faster than box score databases reflect. In these cases, the model uses implied team totals as a leading indicator rather than a lagging one.

Injury designation reliability varies by team. Practice reports are self-reported by NFL franchises, and the NFL's injury reporting policy (governed under NFL Game Operations Manual) requires disclosure but does not enforce uniform standards of precision. Some franchises are historically more transparent than others, which introduces a source-reliability variable at the data collection stage rather than just the modeling stage.

Understanding what the statistical inputs for fantasy projections actually represent — and where each input carries uncertainty — is what separates projections used as thinking tools from projections used as false precision. The numbers on the Fantasy Projection Lab home page are only as trustworthy as the data layers underneath them.

References