Why Most Gold Price ML Models Are Lying to You — And How We Fixed It

XAU/USD $4,541.20 ▲1.2% MCX GOLD ₹1,28,450 ▲0.9% DXY 98.40 ▼0.3% VIX 18.2 ▲2.1% ATH $5,589.38 — Jan 28, 2026

Machine Learning · Finance · Data Science

Why Most Gold Price
ML Models Are Lying
to You

And how we built a genuinely honest next-day forecasting system — walk-forward validated, zero data leakage, real results across 121 folds.

Shehan Makani

May 17, 2026 · 12 min read

⌥ GitHub ◈ Kaggle

Gold ATH (Jan 2026)

$5,589

per troy oz · COMEX

2025 YTD Gain

+55%

strongest since 1979

J.P. Morgan 2026 Target

$5,000

Q4 2026 forecast

CB Purchases (2026E)

585t

per quarter · J.P. Morgan

Our Model RMSE

421

Ridge · 121 folds · INR/10g

Context

Gold just had its most extraordinary run in modern history

In 2025 alone it climbed over 50%, broke through $4,000/oz for the first time in October, and hit an all-time record of $5,589/oz in January 2026. J.P. Morgan is calling for $5,000/oz by Q4 2026. Central banks bought over 1,000 tonnes a year for three consecutive years.

And yet — most machine learning models built to "predict" gold prices are fundamentally broken. Not because the math is wrong. Because the methodology is dishonest.

"Remove the leakage, set honest baselines, use time-aware validation — and the numbers look less impressive but mean something."

This post breaks down exactly what's wrong with 90% of gold price ML notebooks on Kaggle, what we did differently, and what the results actually mean when you stop fooling yourself.

The Core Issue

Same-day leakage: the problem nobody talks about

Open any popular gold price prediction notebook on Kaggle. You will almost certainly find this pattern:

# What most Kaggle notebooks do
features = ["Open", "High", "Low", "Volume"]
target   = "Close"  # same-day close
    

⚠ Data Leakage: Today's High and Low values don't exist until the market closes — at the exact same moment as the Close you're predicting. Using them gives your model information from the future disguised as features from the present.

This produces spectacular-looking results that are completely useless in the real world. A model "achieves" 98% accuracy predicting gold prices — because it's essentially told the answer before being asked the question.

What we did instead

# Our approach: strict temporal integrity
# ALL features use .shift(1) or greater

df["HL_Spread"] = (df["High"] - df["Low"]).shift(1)  # yesterday's range
df["Lag_1"]    = df["Price"].shift(1)              # yesterday's close
df["MA_7"]     = df["Price"].rolling(7).mean().shift(1) # 7-day MA, lagged

# Target: next-day close (t+1)
df["Target"]  = df["Price"].shift(-1)
    

✓ Zero leakage: No same-day Open, High, Low, or Volume used anywhere. Every feature is computed from data that existed before the prediction moment.

Dataset

3,104 trading days of MCX India gold

We used the MCX India gold spot price dataset — January 2014 through January 2026. Prices are in Indian Rupees per 10 grams (the standard MCX format), ranging from ₹24,545 to ₹1,37,789.

This 12-year window captures everything: the 2016 demonetization shock, the 2018–19 trade war volatility, the 2020 COVID-era surge (gold hit ₹56,000/10g), post-pandemic normalization, and the 2024–25 global bull run.

19 strictly lagged features

Lag_1, 5, 10, 21, 63

Price lags at key lookback periods

MA_7, MA_21, MA_63

Rolling means, shift(1) applied

Std_7, Std_21

Rolling volatility, shift(1) applied

ROC_5, ROC_21

Rate-of-change momentum

HL_Spread

Previous day's High−Low range

Log_Return

Previous day's log return

MA7_vs_MA21

Short vs medium trend ratio

ZScore_21

21-day z-score of price

DayOfWeek, Month

Calendar seasonality signals

Methodology

Walk-forward validation: the only honest approach

Most Kaggle notebooks split data randomly: 80% train, 20% test. For time series, this is wrong in a way that matters. Random splitting allows test data from 2018 to train alongside data from 2022 — your model learns from the future when evaluating the past.

We used expanding-window walk-forward cross-validation:

Walk-Forward Scheme · 121 Folds · Step = 21 Days

Fold 1

Train: Day 1 → Day 500

→

Predict: Day 501

Fold 2

Train: Day 1 → Day 521

→

Predict: Day 522

Fold 3

Train: Day 1 → Day 542

→

Predict: Day 543

· · · 121 folds total · · ·

Nested GridSearchCV + TimeSeriesSplit inside each training window · No future data bleeds into tuning

Results

The numbers — honest ones

Baselines first

Before any ML model, we set two simple baselines. Any model that doesn't beat both is not useful.

Baseline	MAPE	RMSE	Dir. Acc.
Persistence (tomorrow = today)	0.60%	490	75.4%
7-day Moving Average	1.26%	991	48.2%

Walk-forward results · 121 folds · 4 models

Model	MAPE	RMSE	Dir. Acc.	vs Persistence RMSE
Ridge Regression	0.77%	421	51.2%	−14% ✓
XGBoost	0.94%	535	52.9%	+9%
Random Forest	0.94%	564	56.2%	+15%
SVR	2.00%	1,290	55.4%	+163%
[Persistence baseline]	0.60%	490	75.4%	—
[7-day MA baseline]	1.26%	991	48.2%	—

Ridge wins on RMSE — the metric that penalizes large errors. Even though persistence beats Ridge on MAPE (because gold is highly autocorrelated day-to-day), Ridge achieves RMSE 421 vs persistence 490. RMSE is what matters in practice. Large errors are what hurt you.

Directional accuracy definition: fraction of folds where sign(predicted − prev_close) == sign(actual − prev_close).

Key Finding

RandomForest gets better in a crisis

The aggregate metrics tell one story. The regime breakdown tells a more interesting one.

Model	Calm MAPE	2020 MAPE	Calm DA	2020 DA
Ridge	0.74%	1.04% ↑	52.3%	41.7% ↓
XGBoost	0.90%	1.29% ↑	54.1%	41.7% ↓
Random Forest	0.95%	0.87% ↓	54.1%	75.0% ↑
SVR	1.72%	4.53% ↑	54.1%	66.7%

RandomForest improves during the COVID gold surge

While Ridge and XGBoost degrade when the 2020 regime shift breaks autocorrelation patterns, RandomForest's MAPE drops from 0.95% to 0.87% and directional accuracy jumps to 75.0%. Tree-based models capture non-linear price dynamics that become visible precisely when the market is most stressed.

Practical implication: A production system might use Ridge during calm conditions and switch to RandomForest when a volatility indicator crosses a threshold. This data directly motivates a regime-switching ensemble.

Why It Matters Now

Gold is in a new pricing era

Gold hit $4,000/oz in October 2025 — climbing from $3,500 to $4,000 in just 36 days, a rate of 50 basis points per day. It then broke $5,000/oz and reached a record $5,589/oz in January 2026. J.P. Morgan forecasts $5,000/oz by Q4 2026 based on projected central bank and investor demand averaging 585 tonnes per quarter.

In this environment, the difference between a model with genuine predictive validity and one inflated by leakage is the difference between actionable intelligence and expensive noise. The methodology here — walk-forward validation, strictly lagged features, honest baselines, regime analysis — is the foundation any serious gold forecasting system needs.

Next Steps

What would make this model materially better

Macro feature layer Add DXY (US dollar index), VIX, crude oil, and US 10-year yield as lagged features. Academic research consistently shows VIX and DXY significantly improve gold price prediction accuracy.
Regime-switching ensemble Use a volatility indicator to dynamically select between Ridge (calm) and RandomForest (volatile). The regime analysis here directly motivates this architecture.
Multi-step forecasting (t+5, t+21) Extend from next-day to weekly and monthly horizons — what investors actually care about.
Directional classification layer Convert regression outputs into buy/hold/sell signals. Evaluate on risk-adjusted returns rather than MAPE.
USD/oz normalization Convert INR/10g to USD/oz using historical exchange rates for direct comparison with COMEX benchmarks and international research.

Full open-source code Notebook runs on Kaggle, Colab, or locally — auto-detects environment, no manual uploads.

⌥ GitHub Repo ◈ Kaggle Notebook

Shehan Makani

Co-Founder & CEO · ChemeNova LLC & ChemRich Global
Tech MBA Candidate · NJIT · Entrepreneurship & AI

LinkedIn GitHub Kaggle ChemeNova

Machine Learning Gold Price Time Series Walk-Forward Data Leakage Kaggle Python XGBoost Finance NJIT Data Science

Search This Blog

Shehan Makani

Gold Price Forecasting: Walk-Forward Validation, No Leakage, 4 Models

Why Most Gold Price
ML Models Are Lying
to You

Gold just had its most extraordinary run in modern history

Same-day leakage: the problem nobody talks about

What we did instead

3,104 trading days of MCX India gold

19 strictly lagged features

Walk-forward validation: the only honest approach

The numbers — honest ones

Baselines first

Walk-forward results · 121 folds · 4 models

RandomForest gets better in a crisis

RandomForest improves during the COVID gold surge

Gold is in a new pricing era

What would make this model materially better

Comments

Post a Comment

Popular posts from this blog

We Built an AI Formulation Co-Pilot for the Specialty Chemicals Industry. Try It Free.

Custom Manufacturing 2.0: Navigating the 2026 Agentic AI Revolution

Beyond Predictive Modeling: The Rise of the Agentic Chemical OS

Gold Price Forecasting: Walk-Forward Validation, No Leakage, 4 Models

Why Most Gold PriceML Models Are Lyingto You

Gold just had its most extraordinary run in modern history

Same-day leakage: the problem nobody talks about

What we did instead

3,104 trading days of MCX India gold

19 strictly lagged features

Walk-forward validation: the only honest approach

The numbers — honest ones

Baselines first

Walk-forward results · 121 folds · 4 models

RandomForest gets better in a crisis

RandomForest improves during the COVID gold surge

Gold is in a new pricing era

What would make this model materially better

Comments

Post a Comment

Popular posts from this blog

We Built an AI Formulation Co-Pilot for the Specialty Chemicals Industry. Try It Free.

Custom Manufacturing 2.0: Navigating the 2026 Agentic AI Revolution

Beyond Predictive Modeling: The Rise of the Agentic Chemical OS

Why Most Gold Price
ML Models Are Lying
to You