Gold Price Forecasting: Walk-Forward Validation, No Leakage, 4 Models
Why Most Gold Price
ML Models Are Lying
to You
And how we built a genuinely honest next-day forecasting system — walk-forward validated, zero data leakage, real results across 121 folds.
Gold just had its most extraordinary run in modern history
In 2025 alone it climbed over 50%, broke through $4,000/oz for the first time in October, and hit an all-time record of $5,589/oz in January 2026. J.P. Morgan is calling for $5,000/oz by Q4 2026. Central banks bought over 1,000 tonnes a year for three consecutive years.
And yet — most machine learning models built to "predict" gold prices are fundamentally broken. Not because the math is wrong. Because the methodology is dishonest.
"Remove the leakage, set honest baselines, use time-aware validation — and the numbers look less impressive but mean something."
This post breaks down exactly what's wrong with 90% of gold price ML notebooks on Kaggle, what we did differently, and what the results actually mean when you stop fooling yourself.
Same-day leakage: the problem nobody talks about
Open any popular gold price prediction notebook on Kaggle. You will almost certainly find this pattern:
This produces spectacular-looking results that are completely useless in the real world. A model "achieves" 98% accuracy predicting gold prices — because it's essentially told the answer before being asked the question.
What we did instead
3,104 trading days of MCX India gold
We used the MCX India gold spot price dataset — January 2014 through January 2026. Prices are in Indian Rupees per 10 grams (the standard MCX format), ranging from ₹24,545 to ₹1,37,789.
This 12-year window captures everything: the 2016 demonetization shock, the 2018–19 trade war volatility, the 2020 COVID-era surge (gold hit ₹56,000/10g), post-pandemic normalization, and the 2024–25 global bull run.
19 strictly lagged features
Walk-forward validation: the only honest approach
Most Kaggle notebooks split data randomly: 80% train, 20% test. For time series, this is wrong in a way that matters. Random splitting allows test data from 2018 to train alongside data from 2022 — your model learns from the future when evaluating the past.
We used expanding-window walk-forward cross-validation:
The numbers — honest ones
Baselines first
Before any ML model, we set two simple baselines. Any model that doesn't beat both is not useful.
| Baseline | MAPE | RMSE | Dir. Acc. |
|---|---|---|---|
| Persistence (tomorrow = today) | 0.60% | 490 | 75.4% |
| 7-day Moving Average | 1.26% | 991 | 48.2% |
Walk-forward results · 121 folds · 4 models
| Model | MAPE | RMSE | Dir. Acc. | vs Persistence RMSE |
|---|---|---|---|---|
| Ridge Regression | 0.77% | 421 | 51.2% | −14% ✓ |
| XGBoost | 0.94% | 535 | 52.9% | +9% |
| Random Forest | 0.94% | 564 | 56.2% | +15% |
| SVR | 2.00% | 1,290 | 55.4% | +163% |
| [Persistence baseline] | 0.60% | 490 | 75.4% | — |
| [7-day MA baseline] | 1.26% | 991 | 48.2% | — |
Ridge wins on RMSE — the metric that penalizes large errors. Even though persistence beats Ridge on MAPE (because gold is highly autocorrelated day-to-day), Ridge achieves RMSE 421 vs persistence 490. RMSE is what matters in practice. Large errors are what hurt you.
Directional accuracy definition: fraction of folds where sign(predicted − prev_close) == sign(actual − prev_close).
RandomForest gets better in a crisis
The aggregate metrics tell one story. The regime breakdown tells a more interesting one.
| Model | Calm MAPE | 2020 MAPE | Calm DA | 2020 DA |
|---|---|---|---|---|
| Ridge | 0.74% | 1.04% ↑ | 52.3% | 41.7% ↓ |
| XGBoost | 0.90% | 1.29% ↑ | 54.1% | 41.7% ↓ |
| Random Forest | 0.95% | 0.87% ↓ | 54.1% | 75.0% ↑ |
| SVR | 1.72% | 4.53% ↑ | 54.1% | 66.7% |
RandomForest improves during the COVID gold surge
While Ridge and XGBoost degrade when the 2020 regime shift breaks autocorrelation patterns, RandomForest's MAPE drops from 0.95% to 0.87% and directional accuracy jumps to 75.0%. Tree-based models capture non-linear price dynamics that become visible precisely when the market is most stressed.
Practical implication: A production system might use Ridge during calm conditions and switch to RandomForest when a volatility indicator crosses a threshold. This data directly motivates a regime-switching ensemble.
Gold is in a new pricing era
Gold hit $4,000/oz in October 2025 — climbing from $3,500 to $4,000 in just 36 days, a rate of 50 basis points per day. It then broke $5,000/oz and reached a record $5,589/oz in January 2026. J.P. Morgan forecasts $5,000/oz by Q4 2026 based on projected central bank and investor demand averaging 585 tonnes per quarter.
In this environment, the difference between a model with genuine predictive validity and one inflated by leakage is the difference between actionable intelligence and expensive noise. The methodology here — walk-forward validation, strictly lagged features, honest baselines, regime analysis — is the foundation any serious gold forecasting system needs.
What would make this model materially better
-
Macro feature layer Add DXY (US dollar index), VIX, crude oil, and US 10-year yield as lagged features. Academic research consistently shows VIX and DXY significantly improve gold price prediction accuracy.
-
Regime-switching ensemble Use a volatility indicator to dynamically select between Ridge (calm) and RandomForest (volatile). The regime analysis here directly motivates this architecture.
-
Multi-step forecasting (t+5, t+21) Extend from next-day to weekly and monthly horizons — what investors actually care about.
-
Directional classification layer Convert regression outputs into buy/hold/sell signals. Evaluate on risk-adjusted returns rather than MAPE.
-
USD/oz normalization Convert INR/10g to USD/oz using historical exchange rates for direct comparison with COMEX benchmarks and international research.
Comments
Post a Comment