You may have seen a new visualization on our site recently that breaks down 10 key Orioles' player benchmarks for the 2026 season. We don't just set these benchmarks from thin-air, we decided to create a simple projection system weighted in historical player-data to help set reasonable targets and projections for the upcoming season. It's important to distinguish that these benchmarks are set based on an in-house model that's separate from our team ELO projections. The player system generates pre-season and in-season forecasts for these ten "core" players on the Orioles roster and combines three layers: a Marcel-style weighted baseline, Statcast quality-of-contact adjustments for batters, and age curve modeling. This article walks through each layer, explains the inputs, documents our backtesting process, discusses its limitations, and attributes the data sources.
A Brief History of Marcel
Marcel is a projection system created by sabermetrician Tom Tango in 2004. The name comes from "Marcel the Monkey" — the idea being that the method is so simple a monkey could do it. Tango designed Marcel as a minimum-competence baseline: weight the last three years of stats, regress toward the league average, and apply an age adjustment. That's it.
The beauty of Marcel is that this dead-simple approach is surprisingly hard to beat. More complex systems (ZiPS, Steamer, PECOTA) layer in pitch-level data, minor league translations, and machine learning, but they only marginally outperform Marcel on aggregate accuracy. Tango's original research demonstrated that the gap between a naive weighted average and a sophisticated model is far smaller than most people assume — and that the gap between no projection and Marcel is enormous.
Our system follows Marcel's core philosophy while adding a Statcast enhancement layer for batters where quality-of-contact metrics provide a genuine signal above the baseline.
For more on the original Marcel methodology, see the Tom Tango's original research.
How the Batter Projections Work
The 5/4/3 Weighting
The foundation of every batter projection is a weighted average of the last three seasons. The most recent season gets a weight of 5, the year before that gets 4, and the year before that gets 3. This means roughly 42% of the projection comes from last year, 33% from two years ago, and 25% from three years ago.
Within each season, stats are further weighted by plate appearances. The composite weight for each season is recency_weight * PA, so a player who had 600 PA last year and 77 PA two years ago (due to injury) won't have the injury-shortened season dominate his projection. This is traditional Marcel behavior — small-sample seasons contribute proportionally less.
A season needs at least 50 PA to be included at all. If a player has no qualifying seasons, the system returns a league-average projection.
Input Stats
The system tracks 14 statistics for each historical season, grouped into traditional stats, plate discipline metrics, and Statcast metrics:
Stat | What It Measures |
|---|---|
AVG (Batting Average) | Hits per at-bat. The oldest measure of hitting ability. |
OBP (On-Base Percentage) | How often a batter reaches base, including walks and HBP. |
SLG (Slugging Percentage) | Total bases per at-bat. Measures raw power. |
ISO (Isolated Power) | SLG minus AVG. Isolates extra-base hit power from batting average. |
wOBA (Weighted On-Base Average) | An all-in-one offensive metric that weights each outcome (single, double, walk, etc.) by its actual run value. The stat most central to the projection. |
wRC+ (Weighted Runs Created Plus) | wOBA scaled to a 100 baseline and adjusted for park and league. A wRC+ of 120 means 20% better than league average. |
K% (Strikeout Rate) | Strikeouts per plate appearance. |
BB% (Walk Rate) | Walks per plate appearance. |
BABIP (Batting Average on Balls in Play) | AVG on batted balls excluding home runs. Measures a blend of skill and luck — extreme values tend to regress. |
WAR (Wins Above Replacement) | The single-number estimate of total value in wins. Treated as a counting stat (scales with playing time). |
Barrel% (Barrel Rate) | Percentage of batted balls with the ideal combination of exit velocity and launch angle. Barrels produce a batting average over .500 and a slugging percentage over 1.500. |
Hard% (Hard-Hit Rate) | Percentage of batted balls with an exit velocity of 95+ mph. |
EV (Exit Velocity) | Average speed of the ball off the bat, in mph. |
Launch Angle | Average vertical angle of the ball off the bat, in degrees. |
How the Weighted Average Works
For rate stats (AVG, OBP, SLG, K%, BB%, ISO, BABIP, wOBA, Barrel%, Hard%, EV, Launch Angle), the system computes a PA-weighted average across seasons:
For each season i:
composite_weight[i] = recency_weight[i] * PA[i]
weighted_stat = sum(stat[i] * composite_weight[i]) / sum(composite_weight[i])
wRC+ uses the same PA-weighted approach. WAR, as a counting stat, uses only the recency weights (5/4/3) without PA weighting.
How the Pitcher Projections Work
Same 5/4/3 Weighting, IP-Weighted
Pitcher projections use the same 5/4/3 recency weighting as batters, but instead of plate appearances, each season is weighted by innings pitched. This ensures that injury-shortened seasons don't dominate rate projections — a pitcher who threw 40 IP in a lost season won't have that small sample overwhelm a 190 IP workhorse year.
A season needs at least 20 IP to qualify for inclusion.
Input Stats
The system tracks 12 statistics for each historical pitching season:
Stat | What It Measures |
|---|---|
ERA (Earned Run Average) | Earned runs per nine innings. The most traditional measure of pitching performance. |
FIP (Fielding Independent Pitching) | An ERA-like metric based only on strikeouts, walks, HBP, and home runs — the outcomes a pitcher controls directly, stripped of defense and sequencing luck. |
WHIP (Walks + Hits per IP) | Baserunners allowed per inning. A measure of how much traffic a pitcher creates. |
K/9 (Strikeouts per 9 IP) | Strikeout rate scaled to nine innings. |
BB/9 (Walks per 9 IP) | Walk rate scaled to nine innings. |
K% (Strikeout Percentage) | Strikeouts per batter faced. More sample-size stable than K/9 because it's denominated by batters, not innings. |
BB% (Walk Percentage) | Walks per batter faced. |
HR/9 (Home Runs per 9 IP) | Home run rate. Heavily influenced by fly ball tendency and ballpark. |
BABIP (Batting Average on Balls in Play Against) | The batting average opponents post on balls in play. Extreme values (very high or very low) tend to regress to around .295. |
LOB% (Left on Base Percentage) | Percentage of baserunners stranded. High LOB% often regresses — it's partly a sequencing/luck indicator. |
WAR (Wins Above Replacement) | Total value in wins. Treated as a counting stat and scaled to projected innings. |
SV (Saves) | Saves. Only projected for relievers, scaled by a historical saves-per-IP rate. |
Differences from Batters
IP thresholds vs. PA thresholds. Batter regression is keyed to plate appearances (500 PA for full reliability on most stats). Pitcher regression uses innings pitched, with most stats requiring 150 IP for full reliability and strikeout/walk rates needing 120 IP.
Inverted age curves. For pitchers, a declining age factor makes "lower is better" stats worse. ERA, FIP, WHIP, BB/9, and HR/9 are divided by the age factor rather than multiplied. A 34-year-old pitcher with a 0.95 age factor gets his ERA divided by 0.95 (pushed higher), while his K/9 is multiplied by 0.95 (pushed lower). This correctly models the dual reality that aging pitchers lose strikeout stuff and give up more runs.
Regression to the Mean
Raw weighted averages are overconfident. A player who walked 15% of the time in 200 PA isn't a true 15% walk-rate talent — some of that is noise. Marcel handles this by blending each stat toward the league average based on sample size.
The Formula
reliability = min(1.0, weighted_PA / reliability_threshold)
projected_stat = (reliability * weighted_average) + ((1 - reliability) * league_average)
A player with a weighted PA equal to or above the threshold gets a reliability of 1.0 — no regression. A player at half the threshold gets blended 50/50 with the league average.
Reliability Thresholds (Batters)
Stat | PA Needed for Full Reliability |
|---|---|
AVG, OBP, SLG, ISO, BABIP, wOBA | 500 PA |
K%, BB% | 400 PA |
HR Rate | 400 PA |
wRC+ | 500 PA |
Reliability Thresholds (Pitchers)
Stat | IP Needed for Full Reliability |
|---|---|
ERA, FIP, WHIP, BABIP, LOB%, HR/9 | 150 IP |
K/9, BB/9, K% | 120 IP |
League Average Baselines
The system regresses toward these 2025 MLB league averages:
Batters: .248 AVG, .312 OBP, .400 SLG, .310 wOBA, 22.5% K%, 8.2% BB%, .152 ISO, .292 BABIP, 100 wRC+
Pitchers: 4.15 ERA, 4.10 FIP, 1.28 WHIP, 8.8 K/9, 3.2 BB/9, 22.5% K%, 8.2% BB%, 1.25 HR/9, .295 BABIP, 72% LOB%
The Statcast Layer
After the Marcel baseline is computed, batter projections get an additional Statcast enhancement. This layer adjusts the projection based on quality-of-contact metrics that are more predictive of future performance than traditional stats alone. Statcast adjustments are not applied to pitchers — pitcher batted-ball data is noisier and less predictive at the individual level.
Exit Velocity
Baseline: 88 mph (league average)
For every mph above the 88 mph baseline, the system adds 0.003 points of wOBA. A hitter averaging 93 mph exit velocity gets a +0.015 wOBA bump. The adjustment is capped at +/- 0.030 wOBA to prevent outliers from distorting the projection.
Barrel Rate
Baseline: 7.5% (league average)
Barrels — batted balls with the ideal combination of exit velocity (98+ mph) and launch angle — convert to home runs at a 52% rate. The system calculates the expected HR difference between a player's barrel rate and the 7.5% baseline, applied over their projected PA. The HR boost is capped at +/- 8 home runs.
Hard-Hit Rate
Baseline: 38% (league average)
Hard-hit balls (95+ mph exit velocity) correlate with better outcomes across the board. For every 1% above the 38% baseline, the system adds 0.002 points of wOBA. Capped at +/- 0.025 wOBA.
Derived Stat Recalculation
After the Statcast wOBA adjustments are applied, the system recalculates derived stats proportionally based on the wOBA delta (the change from Statcast, not the total distance from league average). SLG is adjusted at 80% of the wOBA delta, ISO is recalculated as SLG minus AVG, OPS is recalculated from OBP + SLG, and wRC+ is adjusted at roughly 300 points per point of wOBA.
Data Validation
The system flags implausibly low values as data quality failures and falls back to baselines: EV below 83 mph, Barrel% below 2%, or Hard% below 20% are treated as bad data rather than real performance.
Age Curves
The final adjustment layer models the expected year-over-year change in performance based on a player's age.
Hitter Age Curve
Hitters peak between ages 25 and 27, with rapid development before the peak and gradual decline after. The year-over-year factors:
Age | Factor | Phase |
|---|---|---|
20 | 1.08 | Rapid development |
21 | 1.06 | Strong development |
22 | 1.04 | Continued growth |
23 | 1.02 | Approaching peak |
24 | 1.01 | Near peak |
25–27 | 1.00 | Peak years |
28 | 0.995 | Very slight decline begins |
29 | 0.99 | Slight decline |
30 | 0.985 | Decline continues |
31 | 0.98 | Moderate decline |
32 | 0.975 | Moderate decline |
33 | 0.97 | Steeper decline |
34 | 0.96 | Significant decline |
35 | 0.95 | Significant decline |
36 | 0.94 | Sharp decline |
37 | 0.93 | Sharp decline |
38 | 0.91 | Very sharp decline |
39 | 0.89 | Very sharp decline |
40 | 0.87 | Steep decline |
Pitcher Age Curve
Pitchers peak earlier (ages 25–26) and decline faster. A 36-year-old pitcher has a 0.92 factor compared to a hitter's 0.94 at the same age. By 40, pitchers are at 0.82 versus hitters at 0.87.
How the Adjustments Are Applied
Rate stats (AVG, OBP, SLG, wOBA, ISO) receive a dampened adjustment. The raw factor is halved before application: dampened_factor = 1.0 + (factor - 1.0) * 0.5. This reflects the reality that rate stats don't swing as wildly as counting stats — a 21-year-old with a 1.06 factor gets a dampened 1.03 multiplier on his slash line.
Counting stats (WAR) receive the full age factor.
wRC+ uses an additive method rather than a multiplicative one. The age factor is converted to a wRC+ point adjustment: wrc_adjustment = (age_factor - 1.0) * 100. A 21-year-old with a 1.06 factor gets +6 wRC+ points of expected development. A 31-year-old with a 0.98 factor gets -2 points. This avoids the problem where multiplying wRC+ by a factor disproportionately punishes or rewards players who are already far from 100.
In-Season Blending
Once the season begins, the system switches from pure pre-season projections to a blended rest-of-season (ROS) mode. The core idea: early in the season, the pre-season projection dominates because the small actual sample is unreliable. As the season progresses and the sample grows, actual stats gradually take over.
The Blending Formula
reliability = min(1.0, actual_sample / threshold)
blended_stat = (reliability * actual_stat) + ((1 - reliability) * preseason_projection)
This is the same formula used for regression to the mean, but now it's blending actual current-season performance against the pre-season projection instead of blending a weighted average against the league average.
Batter Thresholds
Most rate stats (AVG, OBP, SLG, wOBA, ISO, K%, BB%) reach full reliability at 150 PA — roughly the first quarter of the season. BABIP, HR rate, and SB rate require 200 PA because they're noisier stats.
At 75 PA (roughly May 1), the system gives about 50% weight to actual stats and 50% to the pre-season projection. By the All-Star break (~300 PA), actual performance fully dominates.
Pitcher Thresholds
Pitcher rates (ERA, FIP, WHIP) reach full reliability at 50 IP — roughly the first month-plus for a starter. Strikeout and walk rates stabilize faster at 40 IP. Save rates need only 30 IP.
Because pitchers face more variance per inning than batters face per plate appearance, the thresholds are lower in absolute terms — the system is designed to trust pitcher rates once there's enough of a sample to be meaningful, while still leaning on the pre-season projection during April.
Backtest Results
To verify that the system actually works, we backtested it against every qualifying MLB batter and pitcher — not just Orioles players — for the 2024 and 2025 seasons. For each target year, the system used the three prior seasons to generate projections, then compared those projections to what actually happened. Three modes were tested: Marcel-only, Marcel + Statcast (batters only), and a naive "repeat last year" baseline that simply assumes the player's most recent season will repeat.
Sample Size
Target Year | Batters | Pitchers |
|---|---|---|
2024 | 207 | 126 |
2025 | 215 | 127 |
Batters were included if they accumulated at least 400 PA in the target year; pitchers needed 100 IP. About 65–75% of players in each sample had a full three years of prior history. The remainder had one or two prior seasons — Marcel handles these via heavier regression toward the league average.
Marcel Beats the Naive Baseline on Every Stat
Across both target years, Marcel produced a lower RMSE (root mean squared error) than the naive baseline on all 14 evaluated stats — 8 batting and 6 pitching. The improvements ranged from 7% on WAR to 43% on ERA.
Batting — average RMSE across 2024–2025:
Stat | Marcel RMSE | Naive RMSE | Improvement | Marcel r |
|---|---|---|---|---|
wOBA | .029 | .043 | 33% | .53 |
wRC+ | 19.3 | 29.3 | 34% | .56 |
AVG | .024 | .041 | 41% | .44 |
OBP | .026 | .042 | 38% | .57 |
SLG | .054 | .075 | 28% | .53 |
K% | .036 | .041 | 13% | .79 |
BB% | .020 | .025 | 19% | .72 |
WAR | 1.8 | 1.9 | 7% | .53 |
Pitching — average RMSE across 2024–2025:
Stat | Marcel RMSE | Naive RMSE | Improvement | Marcel r |
|---|---|---|---|---|
ERA | 0.90 | 1.58 | 43% | .35 |
FIP | 0.64 | 0.98 | 35% | .51 |
WHIP | 0.16 | 0.27 | 42% | .44 |
K/9 | 1.20 | 1.45 | 17% | .68 |
BB/9 | 0.72 | 1.11 | 35% | .49 |
WAR | 1.3 | 1.5 | 14% | .53 |
Statcast Enhancement: Better Rankings, More Noise
The Marcel + Statcast mode improved correlations on power-related stats — wOBA correlation rose from .53 to .56, SLG from .53 to .56, wRC+ from .56 to .58 — meaning the Statcast layer does a better job of ranking hitters by true talent. However, it slightly increased RMSE on those same stats, meaning the adjustments sometimes overshoot. This is a common pattern with enhancement layers: they capture real signal but add variance. The Statcast layer does not affect plate discipline stats (K%, BB%) or counting stats (WAR), which are unchanged between the two modes.
We still decided to utilize this additional Statcast layer under the assumption that the projections will become less noisy as we start to accrue 2026 regular-season data. We assume that the pre-season projections will have a lot of variance, but over time become more grounded in actual in-season performance.
Limitations
What This System Cannot Do
It can't predict breakouts or collapses. Marcel is a regression-based system: it pulls every projection toward the league average. This is the correct strategy in aggregate — most extreme seasons are followed by less extreme ones — but it means the system will consistently underproject breakout years and overproject decline years. A 22-year-old who just hit .230 in his first full season might be about to figure it out, but Marcel doesn't know that; it will project something closer to .240 with heavy regression toward .248 (...cough cough, Jackson Holliday). Conversely, a 35-year-old coming off a career year will get a rosy projection that doesn't account for the possibility of sudden decline.
It can't project rookies. Players with no MLB history receive a league-average projection. The system has no minor league translation layer, so a top prospect and a replacement-level call-up look identical until they accumulate major league plate appearances. For this reason, we do not have players like Samuel Basallo, Dylan Beavers, and Coby Mayo in this core group.
Playing time is the hardest prediction. The system uses default playing time assumptions (550 PA for starters, 170 IP for healthy starting pitchers) unless it has injury information. It cannot anticipate mid-season trades, benchings, platoon changes, or role shifts. The backtest showed that using a player's actual playing time improved pitcher WAR correlation from .59 to .63 — the rate projection is solid, but playing time prediction adds noise.
Pitching is inherently noisier than hitting. ERA correlations (.35) are much lower than wOBA correlations (.53) because pitcher results depend on factors outside individual control: defense, ballpark, sequencing, and opponent quality. FIP-based evaluation (r ≈ .51) is more stable, but the system still projects ERA because that's what determines wins and losses.
Known Simplifications
Fixed league-average constants. The system regresses toward 2025 league averages. Year-to-year league-average wOBA varies by about .005, so this introduces minor error when backtesting against 2024 or earlier seasons. In practice, the effect is small — it shifts all projections by a roughly constant amount, which affects RMSE slightly but barely affects correlations.
Population-level age curves. The aging factors are averages across thousands of player-seasons. Individual players age differently — some decline sharply at 30 (let's hope this isn't Pete Alonso), others remain productive into their late 30s. The dampened application (rate stats use half the raw factor) mitigates this, but the system will systematically underestimate late-career stars and overestimate early decliners.
No park factor adjustments. The Marcel layer does not adjust for home ballpark. A player moving from Coors Field to Oracle Park will have an inflated projection. wRC+ partially accounts for this since it's park-adjusted by definition, but raw stats like AVG, OBP, and SLG are not corrected.
No pitch-level or batted-ball profile data. More sophisticated systems (ZiPS, Steamer) incorporate ground ball rates, pitch mix changes, spin rates, and contact quality distributions. Our Statcast layer captures some of this through Barrel% and Hard%, but it doesn't model the underlying pitch-level mechanics that drive those outcomes.
Statcast enhancement adds variance. As the backtest showed, the Statcast layer improves player rankings but slightly increases prediction error on power stats. The adjustments are calibrated to league-average relationships between exit velocity and outcomes, but individual players can deviate from those averages based on spray angle, speed, and other factors not captured in the model.
...And that pretty much sums it up. We look forward to following these 10 player profiles with you throughout the 2026 season, and hope that the projections you see right now will course-correct in a meaningful way as the regular season gets underway in a few weeks.
Data Sources & Attribution
This system relies on publicly available data from several sources:
FanGraphs — Historical batting and pitching statistics (AVG, OBP, SLG, wOBA, wRC+, FIP, WAR, and all other traditional and advanced stats). Accessed via the pybaseball Python library, which wraps the FanGraphs leaderboard data.
Baseball Savant / Statcast — Pitch-level data including exit velocity, launch angle, barrel rate, and hard-hit rate. Statcast is MLB's ball-tracking system, capturing every pitch and batted ball in real time. Accessed via pybaseball's statcast_batter function, which pulls from the Baseball Savant public API.
MLB Stats API — Roster data and player identification numbers (MLBAM IDs) used to link players across systems.
Tom Tango's Marcel Research — The foundational projection methodology. Tango's original Marcel documentation describes the 5/4/3 weighting, regression to the mean, and age curve adjustment that form the core of this system.
Age Curve Research — Based on Tom Tango's aging curves and the FanGraphs delta method analysis of year-over-year performance changes across thousands of player-seasons.

