How to Measure Projection Accuracy: MAE, RMSE, and Bias

Projection systems live and die by their numbers, but the metrics used to evaluate those numbers are where the real accounting happens. Mean Absolute Error, Root Mean Square Error, and bias are the three workhorses of projection accuracy measurement — each capturing something different about how a system succeeds or fails. This page covers how each metric is calculated, what it actually reveals, where the metrics diverge, and the specific ways they can mislead when applied carelessly.


Definition and scope

A projection system makes a numeric prediction — say, 18.4 fantasy points for a running back on a given week. The actual outcome is 11.2. The error is 7.2 points. Simple enough. What gets complicated is how those individual errors get aggregated across hundreds of players and weeks into a single diagnostic number that tells you something useful about system performance.

Mean Absolute Error (MAE) is the average of the absolute values of all individual errors. Every error counts equally, regardless of sign or magnitude beyond its face value.

Root Mean Square Error (RMSE) squares each error before averaging, then takes the square root of the result. That squaring step punishes large errors disproportionately — a 14-point miss contributes four times as much to RMSE as a 7-point miss.

Bias — sometimes called Mean Error or Mean Signed Deviation — is the average of raw (non-absolute) errors. It tells you whether a system systematically overshoots or undershoots, and by how much, on average.

All three metrics are described in detail in the foundational statistics literature, including NIST's Engineering Statistics Handbook, which covers residual analysis and prediction error in applied contexts.

The scope of these metrics in fantasy sports spans every projection format covered at Fantasy Projection Lab: single-game DFS totals, season-long weekly projections, rest-of-season cumulative estimates, and preseason drafting tools. The metrics apply wherever a quantitative prediction can be compared to a verified outcome.


Core mechanics or structure

Calculating MAE requires three operations: subtract the projected value from the actual value for each observation, take the absolute value of each difference, then average those absolute values across the full sample.

MAE = (1/n) × Σ |actual_i − projected_i|

If a system projects 20 players in a week and the absolute errors are: 2, 4, 6, 1, 8, 3, 5, 2, 7, 4, 3, 6, 2, 9, 1, 4, 5, 3, 6, 2 — the MAE is the sum of those values divided by 20. In this illustrative set that sum is 83, producing an MAE of 4.15 fantasy points.

Calculating RMSE follows a similar path but squares each error first:

RMSE = √[(1/n) × Σ (actual_i − projected_i)²]

Using the same errors above, squaring each (4, 16, 36, 1, 64, 9, 25, 4, 49, 16, 9, 36, 4, 81, 1, 16, 25, 9, 36, 4) produces a sum of 455. Divided by 20 gives 22.75. The square root is approximately 4.77. RMSE is higher than MAE here — it always is when errors are not all identical — because the squaring step amplifies the larger misses.

Calculating bias drops the absolute value step entirely:

Bias = (1/n) × Σ (projected_i − actual_i)

Note the sign convention: if projected exceeds actual, the bias is positive (systematic overestimation). Positive bias means a system runs hot; negative bias means it runs cold.


Causal relationships or drivers

MAE and RMSE do not move in isolation — they respond to specific structural properties of the data and the projection methodology.

Variance in player outcomes is the single largest driver of both metrics. A position group with wide outcome dispersion — like wide receivers, where a single spectacular game can swing 30+ points — will naturally produce higher MAE and RMSE than tight ends in a lean scoring environment. This makes cross-position comparisons of raw accuracy metrics almost meaningless without normalization.

Sample composition shapes what RMSE reveals. If a system is tested only on projected starters with high floors, RMSE will appear artificially low because the extreme-outcome tail — injured players, surprise benching, weather-related zeroes — is underrepresented. Those tail events are exactly what RMSE is most sensitive to.

Structural bias often originates in the training data or regression assumptions. Systems trained on historical averages tend toward positive bias for high-variance players (because extreme negative outcomes are underweighted) and negative bias for breakout candidates (because history lacks analogues). Injury adjustments in projections, covered in detail at injury-adjustments-in-projections, are one of the primary levers for correcting downward bias on players returning from absence.

Scoring format shifts the scale of every error. A full-point-per-reception format inflates total point projections for pass-catching backs and receivers, which mechanically increases the denominator of expected error. A fair evaluation requires holding scoring format constant, a topic explored at scoring-format-impact-on-projections.


Classification boundaries

The three metrics belong to different diagnostic categories, and conflating them is a genuine source of confusion.

MAE is an error magnitude metric. It answers: on average, how far off is the system, expressed in the same units as the original projection (fantasy points)?

RMSE is an outlier-sensitivity metric. It answers: how does the system perform on its worst misses? Two systems with identical MAEs can have dramatically different RMSEs if one produces occasional catastrophic errors and the other does not.

Bias is a direction metric. It answers: does the system systematically lean in one direction? A system with an MAE of 5.0 and a bias of +4.8 is not a randomly erring system — it is almost always wrong in the same direction, which has different strategic implications than a system with MAE of 5.0 and a bias of +0.1.

These classifications matter when evaluating systems for different purposes. Backtesting projection accuracy requires all three metrics because none alone tells the complete story.


Tradeoffs and tensions

MAE vs. RMSE: what do large errors cost?

The choice between MAE and RMSE as a primary benchmark is not arbitrary — it encodes a value judgment about large errors. RMSE treats a 14-point miss as four times as damaging as a 7-point miss (because 14² = 196 vs. 7² = 49). MAE treats it as exactly twice as damaging. Neither is objectively correct. In season-long leagues where a single catastrophic miss on an injured star might lose a week but doesn't destroy a season, MAE may be more representative. In single-game DFS, where roster construction failures can cost entry fees on high-stakes contests, the RMSE perspective — penalizing blowup errors heavily — may be more relevant.

Bias and the MAE paradox

A system can show strong MAE while hiding severe bias. Imagine a system that overprojects skill players by exactly 6 points every week and underprojects defensive players by exactly 6 points. The absolute errors are all 6, but the bias by position would be ±6.0. Aggregated across positions, the bias might appear near zero. This is why bias must be calculated at the position or player-type level, not just at the portfolio level.

Sample size sensitivity

RMSE is more unstable under small samples than MAE. A single catastrophic error in a 20-game sample can raise RMSE by 0.5 to 1.0 points with almost no effect on MAE. Sample size and projection reliability examines the minimum observation thresholds before accuracy metrics become stable.


Common misconceptions

"Lower RMSE always means a better projection system."
Not across different player pools. A system evaluated only on quarterbacks — high-floor, moderate-variance players — will show lower RMSE than an identical methodology applied to wide receivers. RMSE is not a universal quality score; it is a conditional one.

"Bias near zero means no systematic error."
Portfolio-level bias near zero can mask opposing biases that cancel out. A system that overprojects running backs and underprojects tight ends by equal amounts will show aggregate bias close to zero while failing systematically at both positions. Always disaggregate bias by position, scoring tier, and matchup type.

"MAE and RMSE should always agree directionally."
They do move in the same direction under most conditions, but a system that eliminates moderate errors while occasionally producing enormous ones can show improving MAE alongside worsening RMSE. The divergence is a signal worth investigating, not an artifact to dismiss.

"Projection accuracy can be measured with a single season of data."
A single NFL season produces roughly 272 games (per the NFL's official schedule structure), but the player-week observation count varies dramatically by position. Kickers might generate 50 to 60 usable samples; top tight ends might generate 30 to 40. Statistical stability in RMSE typically requires 150+ observations at the position level before the metric is reliable.


Checklist or steps

The following sequence describes how accuracy measurement proceeds for a projection system evaluation:

  1. Define the observation unit — player-week, player-game, or season total — and apply it consistently across the entire evaluation set.
  2. Establish the scoring format — standard, half-PPR, or full-PPR — and confirm all projected and actual values use the same format.
  3. Calculate raw error for each observation: actual score minus projected score (preserving sign for bias calculations).
  4. Compute MAE: take absolute values of all errors, sum them, divide by observation count.
  5. Compute RMSE: square all errors, sum them, divide by observation count, take the square root.
  6. Compute bias: sum raw (signed) errors, divide by observation count. Positive result indicates overestimation.
  7. Disaggregate all three metrics by position group — do not stop at the aggregate level.
  8. Identify the top 5% of absolute errors (the largest misses) and examine whether they cluster by player type, game environment, or projection input category.
  9. Compare RMSE to MAE ratio: a ratio above 1.3 typically indicates a small number of large outlier errors is driving most of the RMSE penalty.
  10. Document the sample size per position before drawing conclusions — any position with fewer than 100 observations warrants explicit uncertainty labeling.

Reference table or matrix

Metric Formula Units Sensitive to large errors? Sign information? Primary diagnostic use
MAE Mean of |actual − projected| Fantasy points No — all errors weighted equally No — absolute values only Average miss magnitude
RMSE √Mean of (actual − projected)² Fantasy points Yes — large errors amplified by squaring No — squared values are positive Outlier error severity
Bias (Mean Error) Mean of (projected − actual) Fantasy points No — arithmetic mean Yes — positive = overestimation Systematic directional error
MAE/RMSE Ratio MAE ÷ RMSE Dimensionless N/A N/A Indicator of outlier concentration; ratio near 1.0 = uniform errors; ratio below 0.7 = outlier-dominated
Relative MAE MAE ÷ Mean actual score Percentage No No Cross-position or cross-format comparison

For a broader treatment of how accuracy metrics interact with system design choices, the comparing-projection-systems reference covers head-to-head evaluation frameworks. The projection confidence intervals page addresses how error distributions can be translated into probabilistic ranges rather than point estimates.


References