VnAppraiseVnAppraise

Model Benchmarks

12 models evaluated on Vietnamese real estate price prediction using repeated stratified cross-validation on 81,790+ listings.

Best Overall
CatBoost
0.336RMSE
25.2% MAPE

default

Best Speed / Accuracy
CatBoost
0.336RMSE
24s/fold

default

Foundation Model
TabICL v2
0.500RMSE
Zero-shot ICL

Pre-trained on 130M synthetic datasets, no fine-tuning

Gradient Boosting
Linear
Tree / Instance-based
Deep / Foundation
Leaderboard
Best configuration per model, ranked by cross-validation RMSE
#ModelConfigRMSEMAEMAPETime/foldRows
1
CatBoostdeployed
default0.3360+/-0.00390.224625.2%24s81,790
2
LightGBM
lr=0.0300.3816+/-0.00260.257429.4%35s81,790
3
XGBoost
default0.3836+/-0.00100.263029.9%20s81,790
4
AutoGluon
best quality, 600s limit0.3993+/-0.00550.271730.3%12m 20s90,566
5
TabICLfoundation
n=320.4995+/-0.00280.325638.9%1m 59s90,566
6
KNN
default0.5180+/-0.00330.345042.1%29s90,566
7
Ridge
default0.5516+/-0.02530.364443.5%<1s12,908
8
RandomForest
default0.5640+/-0.00420.392046.9%18m 4s90,566
9
LinearRegression
default0.5779+/-0.00670.378146.8%37s90,566
10
LinearSVR
default0.5922+/-0.00700.363843.9%23s90,566
11
ElasticNet
default0.6102+/-0.00470.414449.4%15s90,566
12
Lasso
default0.6196+/-0.00440.422451.0%8s90,566
Model Primer
?

What is a foundation model?

A neural network pre-trained on massive data to serve as a reusable base for many downstream tasks. Unlike traditional models that train from scratch on your specific dataset, a foundation model arrives already knowing general patterns — you just show it a handful of examples and it generalizes. This is the paradigm behind GPT (text), CLIP (images), and now TabICL (tables). Pre-training on 130M synthetic tabular datasets lets TabICL do in-context learning: predicting on new data without any fine-tuning.

Foundation & AutoML

TabICLfoundation

Foundation model · 130M synthetic datasets

A transformer pre-trained on 130 million synthetic tabular datasets, learning the statistical patterns common across all tabular data. At inference it performs in-context learning: you show it a handful of labeled examples plus a test row, and it predicts — without any training on your specific dataset. The tabular equivalent of GPT's few-shot prompting. Wins statistically significantly at small scale (<1K rows); gradient boosters catch up as data grows. Runs on an A10G GPU, ~50s latency.

AutoGluon

Amazon · stacked ensemble AutoML

Not a single model but a system. Trains LightGBM + CatBoost + XGBoost + Random Forest + neural nets + FT-Transformer in parallel, then stacks them with a meta-learner. The tradeoff for its accuracy is 10-minute fold times and a harder-to-deploy artifact.

Gradient Boosting

CatBoostdeployed

Yandex · native categorical handling

Gradient boosting that builds decision trees sequentially — each new tree corrects the previous tree's errors. CatBoost's edge is native support for categorical variables via ordered target statistics, which matters enormously here: we have 257 districts, 235 wards, and 974 streets that other frameworks either one-hot-encode (explosion) or ordinal-encode (lossy). Deployed in production at depth=10, autoresearch-tuned.

LightGBM

Microsoft · leaf-wise, histogram-based

Microsoft's speed-optimized gradient booster. Uses histogram-based feature binning and leaf-wise tree growth (vs XGBoost's level-wise), which converges faster on most tabular problems. Trained here with Huber loss to resist the long tail of outlier listings, and handles NaN values natively.

XGBoost

The original boosting framework

Popularized gradient boosting on tabular data and still a strong baseline. Slightly behind CatBoost and LightGBM on this dataset because it needs ordinal encoding for high-cardinality categoricals (its native categorical mode fails on district-level granularity).

Tree & Instance-based

Random Forest

Bagged decision trees

An ensemble of decision trees trained in parallel on bootstrapped samples, with predictions averaged across trees. Interpretable and robust, but lacks the sequential error-correction mechanism of boosting — which is why it plateaus around RMSE 0.56 here while boosters reach 0.34.

KNN

k-nearest neighbors · non-parametric

Predicts by averaging the 10 nearest listings in feature space, distance-weighted. No training phase; pays the cost at inference time. Struggles as categorical cardinality and feature dimensionality grow.

Linear Baselines

Ridge

L2-regularized linear regression

Linear regression with L2 (squared coefficient) penalty — shrinks all coefficients toward zero. Useful as a sanity-check baseline: quantifies how much of the variance is linearly explained before we invoke non-linear models.

Lasso

L1-regularized · sparse

Linear regression with L1 (absolute coefficient) penalty. Drives unimportant coefficients to exactly zero, giving implicit feature selection.

ElasticNet

L1 + L2 blend

Blends Lasso (L1) and Ridge (L2) penalties. Handles correlated features more gracefully than pure Lasso (which arbitrarily picks one of a correlated pair).

Linear SVR

ε-insensitive loss

Support vector regression with a linear kernel. Optimizes an ε-tube (ignores errors below ε, penalizes errors outside) rather than squared error — different loss geometry, comparable expressiveness to Ridge.

Linear Regression

Plain OLS · no regularization

Classical ordinary least squares. The null hypothesis against which every more complex model must justify its complexity.

Methodology
Target
log_unit_price = ln(price_vnd / area_m2)
Evaluation
5-fold stratified cross-validation (quantile-binned target)
Dataset
81K+ Vietnamese real estate listings, 44 features (21 numeric + 7 categorical + 16 binary) after LLM price-quality filter
Source
batdongsan.com.vn, nationwide coverage
Tuning
Optuna TPE sampler, 50 trials with inner 3-fold CV for LightGBM and CatBoost
Statistical tests
Nadeau-Bengio corrected t-test with Cohen's d effect size
Refresh
Model and dataset retrained monthly by a scheduled job on our infrastructure: refresh the feature view → export → retrain with the winning config → RMSE/row-count gate → deploy (bounces the warm inference container onto the fresh model).

Cross-validation MAPE (mean) differs from production MdAPE (median): MdAPE = 14.8% on 81K listings with CatBoost depth=10 (verified 2026-04-15). Legacy runs on the 90K / 19-feature dataset are retained for family comparison.

Currently deployed: CatBoost (depth=10, autoresearch-tuned) for fast inference. TabICL (foundation model) available as an alternative.

Last updated: April 16, 2026