 
            PKBOOST
Benchmark Reproduction Guide
Complete guide for reproducing all benchmarks from the PKBoost paper, including standard performance comparisons and drift resilience tests.
📋 Table of Contents
- Quick Start (5 minutes)
- Full Reproduction (1-2 hours)
- Dataset Preparation
- Running Benchmarks
- Expected Results
- Troubleshooting
🚀 Quick Start (5 minutes)
Use included sample data to verify PKBoost works.
Step 1: Clone Repository
git clone https://github.com/Pushp-Kharat1/pkboost.git
cd pkboostStep 2: Verify Sample Data
ls data/
# Should show: creditcard_train.csv, creditcard_val.csv, creditcard_test.csvStep 3: Run Rust Benchmark
cargo run --release --bin benchmarkExpected output:
=== PKBoost Benchmarking ===
Train: 102,530 samples (0.17% fraud)
Val:   34,177 samples
Test:  34,177 samples
Training PKBoost... 45.2s
  PR-AUC: 0.8782
Training XGBoost... 12.1s
  PR-AUC: 0.7458
Training LightGBM... 9.8s
  PR-AUC: 0.7931
PKBoost improves PR-AUC by +17.9% over XGBoost🔬 Full Reproduction (1-2 hours)
For complete results matching the paper, download full datasets and run extended benchmarks.
Dataset Preparation
# Download from Kaggle
python prepare_data.py mlg-ulb/creditcardfraud Class 1⚡ Running Benchmarks
Standard Performance Comparison (Rust)
cargo run --release --bin benchmarkStandard Performance Comparison (Python)
python run_single_benchmark.pyDrift Resilience Test
python drift_comparison_all.py📊 Expected Results
Credit Card Fraud (0.2% fraud rate)
| Model | PR-AUC | F1-Score | ROC-AUC | Training Time | 
|---|---|---|---|---|
| PKBoost | 87.8% | 87.4% | 97.5% | 45s | 
| XGBoost | 74.5% | 79.8% | 91.7% | 12s | 
| LightGBM | 79.3% | 71.3% | 92.1% | 10s | 
🔧 Troubleshooting
Issue 1: "File not found: data/creditcard_train.csv"
# Download dataset
python prepare_data.py mlg-ulb/creditcardfraud Class 1Issue 2: Kaggle API Authentication Error
# Set up Kaggle API credentials
mkdir -p ~/.kaggle
# Copy your kaggle.json there
chmod 600 ~/.kaggle/kaggle.json