PKBoost Drift Benchmark Report

Objective

Evaluate PKBoost, LightGBM, and XGBoost across 16 realistic drift scenarios using PR-AUC as the primary metric. All models trained on the same data split with no hyperparameter tuning beyond defaults and early stopping.

Dataset Summary

Split	Samples	Features	Fraud Rate
Train	170,884	30	~0.2%
Val	56,961	30	—
Test	56,962	30	0.17% (99 frauds)

Baseline Performance (No Drift)

Model	PR-AUC	ROC-AUC	F1
LightGBM	0.7931	0.9205	0.8427
XGBoost	0.7625	0.9287	0.8090
PKBoost	0.8740	0.9734	0.8715

PKBoost starts with the highest PR-AUC (+0.0809 over LightGBM, +0.1115 over XGBoost).

Average PR-AUC Across All 16 Scenarios

Model	Avg PR-AUC	Avg Degradation
PKBoost	0.8509	2.82%
LightGBM	0.7031	12.10%
XGBoost	0.6720	12.66%

PKBoost maintains performance closest to its baseline.

🔬 Scenario-by-Scenario Results (PR-AUC)

Scenario	LightGBM	XGBoost	PKBoost	Winner
No Drift (Baseline)	0.7931	0.7625	0.8740	PKBoost (+0.0809)
Mild Covariate (0.2× std)	0.7836	0.7688	0.8705	PKBoost (+0.0869)
Moderate Covariate (0.5× std)	0.7700	0.7852	0.8669	PKBoost (+0.0817)
Severe Covariate (1.0× std)	0.7556	0.7645	0.8520	PKBoost (+0.0875)
Extreme Covariate (2.0× std)	0.6998	0.7152	0.8337	PKBoost (+0.1185)
Sign Flip (Adversarial)	0.4814	0.5146	0.8344	PKBoost (+0.3198)
Gradual Drift	0.7790	0.7715	0.8674	PKBoost (+0.0884)
Sudden Drift (Half-way)	0.7888	0.7666	0.8639	PKBoost (+0.0751)
Light Noise Injection	0.6497	0.6687	0.8287	PKBoost (+0.1600)
Heavy Noise Injection	0.2270	0.0717	0.7462	PKBoost (+0.5192)
Feature Scaling Drift	0.7566	0.6665	0.8628	PKBoost (+0.1062)
Rotation Drift	0.7864	0.7467	0.8716	PKBoost (+0.0852)
Outlier Injection (10%)	0.7631	0.5123	0.8687	PKBoost (+0.1056)
Combined Multi-Drift	0.7743	0.7497	0.8503	PKBoost (+0.0760)
Temporal Decay	0.6696	0.7085	0.8530	PKBoost (+0.1445)
Cyclic/Seasonal Drift	0.7721	0.7797	0.8707	PKBoost (+0.0910)

PKBoost had the highest PR-AUC in all 16 scenarios. Margin of victory ranged from +0.0751 (Sudden Drift) to +0.5192 (Heavy Noise).

Performance by Drift Type (Average PR-AUC)

Category	LightGBM	XGBoost	PKBoost	PKBoost Margin
Covariate Drift	0.7522	0.7584	0.8558	+0.0974
Adversarial	0.6223	0.5134	0.8515	+0.2292
Temporal	0.7524	0.7566	0.8638	+0.1072
Noise-Based	0.4384	0.3702	0.7875	+0.3491
Complex Drifts	0.7724	0.7210	0.8615	+0.0891

Most Challenging Scenarios

Rank	Scenario	Avg PR-AUC	PKBoost	Best Other	PKBoost Lead
1	Heavy Noise Injection	0.3483	0.7462	0.2270	+0.5192
2	Sign Flip (Adversarial)	0.6101	0.8344	0.5146	+0.3198
3	Light Noise Injection	0.7157	0.8287	0.6687	+0.1600
4	Temporal Decay	0.7437	0.8530	0.7085	+0.1445
5	Extreme Covariate	0.7401	0.8337	0.7152	+0.1185

Worst-Case Resilience

Model	Worst PR-AUC	Scenario
PKBoost	0.7462	Heavy Noise Injection
LightGBM	0.2270	Heavy Noise Injection
XGBoost	0.0717	Heavy Noise Injection

Even in the most disruptive scenario, PKBoost retains PR-AUC > 0.74, while others drop below 0.23.

Key Observations

Perfect Record

PKBoost never lost a scenario — highest PR-AUC in all 16 tests.

Average Margin

vs LightGBM: +0.1478
vs XGBoost: +0.1789

Minimal Degradation

PKBoost: 2.82% drop
LightGBM: 12.10%
XGBoost: 12.66%

Noise & Adversarial

Heavy Noise: PKBoost 3.3× better
Sign Flip: PKBoost 1.6× better

LightGBM and XGBoost are strong models — especially on clean, stable data. But when distribution shifts occur, PKBoost maintains significantly higher predictive quality.

Limitations & Fair Notes

PKBoost uses adaptive internal mechanisms (buffer, metamorphosis triggers) not present in standard GBMs.
Training time is longer than LightGBM/XGBoost (not measured here).
All models used default-like settings — no exhaustive tuning.
Results are one dataset only — generalization to other domains untested.

Conclusion

PKBoost achieved the highest PR-AUC in every tested drift scenario, with an average lead of 0.16 and minimal degradation (2.82%).

LightGBM and XGBoost performed well under mild conditions but degraded sharply under noise, covariate shifts, and adversarial changes. This is not a claim of universal superiority — only a factual report of performance on this benchmark, under these conditions.

Files Generated

• drift_detailed_results.csv – Full per-scenario scores
• comprehensive_drift_analysis.png – Visual summary
• baseline_vs_worstcase.png – Resilience comparison

Script: drift_comparison_all.py

← BACK TO HOME RUST DOCS → PYTHON PACKAGE → STANDARD BENCHMARKS →

PKBOOST