Gold Price Forecasting: Walk-Forward Validation, No Leakage, 4 Models

Why Most Gold Price ML Models Are Lying to You — And How We Fixed It
XAU/USD $4,541.20 ▲1.2% MCX GOLD ₹1,28,450 ▲0.9% DXY 98.40 ▼0.3% VIX 18.2 ▲2.1% ATH $5,589.38 — Jan 28, 2026
Machine Learning · Finance · Data Science

Why Most Gold Price
ML Models Are Lying
to You

And how we built a genuinely honest next-day forecasting system — walk-forward validated, zero data leakage, real results across 121 folds.

Shehan Makani
May 17, 2026 · 12 min read
Gold ATH (Jan 2026)
$5,589
per troy oz · COMEX
2025 YTD Gain
+55%
strongest since 1979
J.P. Morgan 2026 Target
$5,000
Q4 2026 forecast
CB Purchases (2026E)
585t
per quarter · J.P. Morgan
Our Model RMSE
421
Ridge · 121 folds · INR/10g

Gold just had its most extraordinary run in modern history

In 2025 alone it climbed over 50%, broke through $4,000/oz for the first time in October, and hit an all-time record of $5,589/oz in January 2026. J.P. Morgan is calling for $5,000/oz by Q4 2026. Central banks bought over 1,000 tonnes a year for three consecutive years.

And yet — most machine learning models built to "predict" gold prices are fundamentally broken. Not because the math is wrong. Because the methodology is dishonest.

"Remove the leakage, set honest baselines, use time-aware validation — and the numbers look less impressive but mean something."

This post breaks down exactly what's wrong with 90% of gold price ML notebooks on Kaggle, what we did differently, and what the results actually mean when you stop fooling yourself.

Same-day leakage: the problem nobody talks about

Open any popular gold price prediction notebook on Kaggle. You will almost certainly find this pattern:

# What most Kaggle notebooks do features = ["Open", "High", "Low", "Volume"] target = "Close" # same-day close
⚠ Data Leakage: Today's High and Low values don't exist until the market closes — at the exact same moment as the Close you're predicting. Using them gives your model information from the future disguised as features from the present.

This produces spectacular-looking results that are completely useless in the real world. A model "achieves" 98% accuracy predicting gold prices — because it's essentially told the answer before being asked the question.

What we did instead

# Our approach: strict temporal integrity # ALL features use .shift(1) or greater df["HL_Spread"] = (df["High"] - df["Low"]).shift(1) # yesterday's range df["Lag_1"] = df["Price"].shift(1) # yesterday's close df["MA_7"] = df["Price"].rolling(7).mean().shift(1) # 7-day MA, lagged # Target: next-day close (t+1) df["Target"] = df["Price"].shift(-1)
✓ Zero leakage: No same-day Open, High, Low, or Volume used anywhere. Every feature is computed from data that existed before the prediction moment.

3,104 trading days of MCX India gold

We used the MCX India gold spot price dataset — January 2014 through January 2026. Prices are in Indian Rupees per 10 grams (the standard MCX format), ranging from ₹24,545 to ₹1,37,789.

This 12-year window captures everything: the 2016 demonetization shock, the 2018–19 trade war volatility, the 2020 COVID-era surge (gold hit ₹56,000/10g), post-pandemic normalization, and the 2024–25 global bull run.

19 strictly lagged features

Lag_1, 5, 10, 21, 63
Price lags at key lookback periods
MA_7, MA_21, MA_63
Rolling means, shift(1) applied
Std_7, Std_21
Rolling volatility, shift(1) applied
ROC_5, ROC_21
Rate-of-change momentum
HL_Spread
Previous day's High−Low range
Log_Return
Previous day's log return
MA7_vs_MA21
Short vs medium trend ratio
ZScore_21
21-day z-score of price
DayOfWeek, Month
Calendar seasonality signals

Walk-forward validation: the only honest approach

Most Kaggle notebooks split data randomly: 80% train, 20% test. For time series, this is wrong in a way that matters. Random splitting allows test data from 2018 to train alongside data from 2022 — your model learns from the future when evaluating the past.

We used expanding-window walk-forward cross-validation:

Walk-Forward Scheme · 121 Folds · Step = 21 Days
Fold 1
Train: Day 1 → Day 500
Predict: Day 501
Fold 2
Train: Day 1 → Day 521
Predict: Day 522
Fold 3
Train: Day 1 → Day 542
Predict: Day 543
· · · 121 folds total · · ·
Nested GridSearchCV + TimeSeriesSplit inside each training window · No future data bleeds into tuning

The numbers — honest ones

Baselines first

Before any ML model, we set two simple baselines. Any model that doesn't beat both is not useful.

BaselineMAPERMSEDir. Acc.
Persistence (tomorrow = today)0.60%49075.4%
7-day Moving Average1.26%99148.2%

Walk-forward results · 121 folds · 4 models

ModelMAPERMSEDir. Acc.vs Persistence RMSE
Ridge Regression 0.77% 421 51.2% −14% ✓
XGBoost 0.94% 535 52.9% +9%
Random Forest 0.94% 564 56.2% +15%
SVR 2.00% 1,290 55.4% +163%
[Persistence baseline] 0.60% 490 75.4%
[7-day MA baseline] 1.26% 991 48.2%

Ridge wins on RMSE — the metric that penalizes large errors. Even though persistence beats Ridge on MAPE (because gold is highly autocorrelated day-to-day), Ridge achieves RMSE 421 vs persistence 490. RMSE is what matters in practice. Large errors are what hurt you.

Directional accuracy definition: fraction of folds where sign(predicted − prev_close) == sign(actual − prev_close).

RandomForest gets better in a crisis

The aggregate metrics tell one story. The regime breakdown tells a more interesting one.

ModelCalm MAPE2020 MAPECalm DA2020 DA
Ridge0.74%1.04% ↑52.3%41.7% ↓
XGBoost0.90%1.29% ↑54.1%41.7% ↓
Random Forest0.95%0.87% ↓54.1%75.0% ↑
SVR1.72%4.53% ↑54.1%66.7%

RandomForest improves during the COVID gold surge

While Ridge and XGBoost degrade when the 2020 regime shift breaks autocorrelation patterns, RandomForest's MAPE drops from 0.95% to 0.87% and directional accuracy jumps to 75.0%. Tree-based models capture non-linear price dynamics that become visible precisely when the market is most stressed.

Practical implication: A production system might use Ridge during calm conditions and switch to RandomForest when a volatility indicator crosses a threshold. This data directly motivates a regime-switching ensemble.

Gold is in a new pricing era

Gold hit $4,000/oz in October 2025 — climbing from $3,500 to $4,000 in just 36 days, a rate of 50 basis points per day. It then broke $5,000/oz and reached a record $5,589/oz in January 2026. J.P. Morgan forecasts $5,000/oz by Q4 2026 based on projected central bank and investor demand averaging 585 tonnes per quarter.

In this environment, the difference between a model with genuine predictive validity and one inflated by leakage is the difference between actionable intelligence and expensive noise. The methodology here — walk-forward validation, strictly lagged features, honest baselines, regime analysis — is the foundation any serious gold forecasting system needs.

What would make this model materially better

  1. Macro feature layer Add DXY (US dollar index), VIX, crude oil, and US 10-year yield as lagged features. Academic research consistently shows VIX and DXY significantly improve gold price prediction accuracy.
  2. Regime-switching ensemble Use a volatility indicator to dynamically select between Ridge (calm) and RandomForest (volatile). The regime analysis here directly motivates this architecture.
  3. Multi-step forecasting (t+5, t+21) Extend from next-day to weekly and monthly horizons — what investors actually care about.
  4. Directional classification layer Convert regression outputs into buy/hold/sell signals. Evaluate on risk-adjusted returns rather than MAPE.
  5. USD/oz normalization Convert INR/10g to USD/oz using historical exchange rates for direct comparison with COMEX benchmarks and international research.
Full open-source code Notebook runs on Kaggle, Colab, or locally — auto-detects environment, no manual uploads.
S
Shehan Makani
Co-Founder & CEO · ChemeNova LLC & ChemRich Global
Tech MBA Candidate · NJIT · Entrepreneurship & AI
Machine Learning Gold Price Time Series Walk-Forward Data Leakage Kaggle Python XGBoost Finance NJIT Data Science

Comments

Popular posts from this blog

We Built an AI Formulation Co-Pilot for the Specialty Chemicals Industry. Try It Free.

Custom Manufacturing 2.0: Navigating the 2026 Agentic AI Revolution

Beyond Predictive Modeling: The Rise of the Agentic Chemical OS