Model Benchmarks

12 models evaluated on Vietnamese real estate price prediction using repeated stratified cross-validation on 81,790+ listings.

CatBoost dẫn đầu ở RMSE 0.336

Best Overall

CatBoost

0.336RMSE

25.2% MAPE

default

Best Speed / Accuracy

CatBoost

0.336RMSE

24s/fold

default

Foundation Model

TabICL v2

0.500RMSE

Zero-shot ICL

Pre-trained on 130M synthetic datasets, no fine-tuning

Gradient Boosting

Linear

Tree / Instance-based

Deep / Foundation

Production drift

Live MdAPE on 83,459 predictions vs. asking price · trailing 7 days · window ending 2026-06-10 · model catboost-3quantile-v1-20260606

Overall

9.27%

MAPE 18.7%

apartment

5.71%

n=25,310

house

11.60%

n=35,430

land

12.77%

n=16,089

villa

8.19%

n=5,883

Last 10 weekly windows (overall MdAPE)

Inference latency

Modal predict-endpoint latency (p50/p95/p99) by warm vs. cold container · trailing 7 days · window ending 2026-05-12

fastwarm

p50

509 ms

p95

1471 ms

p99

1515 ms

Endpoint	State	Requests	p50	p95	p99	Error rate
fast	warm	13	509 ms	1471 ms	1515 ms	0.00%

Leaderboard

Best configuration per model, ranked by cross-validation RMSE

#	Model	Config	RMSE	MAE	MAPE	Time/fold	Rows
1	CatBoostdeployed	default	0.3360	0.2246	25.2%	24s	81,790
2	LightGBM	lr=0.030	0.3816	0.2574	29.4%	35s	81,790
3	XGBoost

Model Primer

What is a foundation model?

A neural network pre-trained on massive data to serve as a reusable base for many downstream tasks. Unlike traditional models that train from scratch on your specific dataset, a foundation model arrives already knowing general patterns — you just show it a handful of examples and it generalizes. This is the paradigm behind GPT (text), CLIP (images), and now TabICL (tables). Pre-training on 130M synthetic tabular datasets lets TabICL do in-context learning: predicting on new data without any fine-tuning.

Foundation & AutoML

TabICLfoundation

Foundation model · 130M synthetic datasets

A transformer pre-trained on 130 million synthetic tabular datasets, learning the statistical patterns common across all tabular data. At inference it performs in-context learning: you show it a handful of labeled examples plus a test row, and it predicts — without any training on your specific dataset. The tabular equivalent of GPT's few-shot prompting. Wins statistically significantly at small scale (<1K rows); gradient boosters catch up as data grows. Runs on an A10G GPU, ~50s latency.

Methodology

Target: log_unit_price = ln(price_vnd / area_m2)
Evaluation: 5-fold stratified cross-validation (quantile-binned target)
Dataset: 81K+ Vietnamese real estate listings, 44 features (21 numeric + 7 categorical + 16 binary) after LLM price-quality filter
Source: batdongsan.com.vn, nationwide coverage
Tuning: Optuna TPE sampler, 50 trials with inner 3-fold CV for LightGBM and CatBoost
Statistical tests: Nadeau-Bengio corrected t-test with Cohen's d effect size
Refresh: Model and dataset retrained monthly by a scheduled job on our infrastructure: refresh the feature view → export → retrain with the winning config → RMSE/row-count gate → deploy (bounces the warm inference container onto the fresh model).

Cross-validation MAPE (mean) differs from production MdAPE (median): MdAPE = 14.8% on 81K listings with CatBoost depth=10 (verified 2026-04-15). Legacy runs on the 90K / 19-feature dataset are retained for family comparison.

Currently deployed: CatBoost (depth=10, autoresearch-tuned) for fast inference. TabICL (foundation model) available as an alternative.

Last updated: April 16, 2026

Gradient Boosting

CatBoostdeployed

Yandex · native categorical handling

Gradient boosting that builds decision trees sequentially — each new tree corrects the previous tree's errors. CatBoost's edge is native support for categorical variables via ordered target statistics, which matters enormously here: we have 257 districts, 235 wards, and 974 streets that other frameworks either one-hot-encode (explosion) or ordinal-encode (lossy). Deployed in production at depth=10, autoresearch-tuned.

LightGBM

Microsoft · leaf-wise, histogram-based

Microsoft's speed-optimized gradient booster. Uses histogram-based feature binning and leaf-wise tree growth (vs XGBoost's level-wise), which converges faster on most tabular problems. Trained here with Huber loss to resist the long tail of outlier listings, and handles NaN values natively.

XGBoost

The original boosting framework

Popularized gradient boosting on tabular data and still a strong baseline. Slightly behind CatBoost and LightGBM on this dataset because it needs ordinal encoding for high-cardinality categoricals (its native categorical mode fails on district-level granularity).

Linear Baselines

Ridge

L2-regularized linear regression

Linear regression with L2 (squared coefficient) penalty — shrinks all coefficients toward zero. Useful as a sanity-check baseline: quantifies how much of the variance is linearly explained before we invoke non-linear models.

Lasso

L1-regularized · sparse

Linear regression with L1 (absolute coefficient) penalty. Drives unimportant coefficients to exactly zero, giving implicit feature selection.

ElasticNet

L1 + L2 blend

Blends Lasso (L1) and Ridge (L2) penalties. Handles correlated features more gracefully than pure Lasso (which arbitrarily picks one of a correlated pair).

Linear SVR

ε-insensitive loss

Support vector regression with a linear kernel. Optimizes an ε-tube (ignores errors below ε, penalizes errors outside) rather than squared error — different loss geometry, comparable expressiveness to Ridge.

Linear Regression

Plain OLS · no regularization

Classical ordinary least squares. The null hypothesis against which every more complex model must justify its complexity.

Model Benchmarks

What is a foundation model?

Foundation & AutoML

Gradient Boosting

Tree & Instance-based

Linear Baselines