PKBoost Logo

PKBOOST

Benchmark Reproduction Guide

Complete guide for reproducing all benchmarks from the PKBoost paper, including standard performance comparisons and drift resilience tests.

📋 Table of Contents

🚀 Quick Start (5 minutes)

Use included sample data to verify PKBoost works.

Step 1: Clone Repository

git clone https://github.com/Pushp-Kharat1/pkboost.git
cd pkboost

Step 2: Verify Sample Data

ls data/
# Should show: creditcard_train.csv, creditcard_val.csv, creditcard_test.csv

Step 3: Run Rust Benchmark

cargo run --release --bin benchmark

Expected output:

=== PKBoost Benchmarking ===
Train: 102,530 samples (0.17% fraud)
Val:   34,177 samples
Test:  34,177 samples

Training PKBoost... 45.2s
  PR-AUC: 0.8782

Training XGBoost... 12.1s
  PR-AUC: 0.7458

Training LightGBM... 9.8s
  PR-AUC: 0.7931

PKBoost improves PR-AUC by +17.9% over XGBoost

🔬 Full Reproduction (1-2 hours)

For complete results matching the paper, download full datasets and run extended benchmarks.

Dataset Preparation

# Download from Kaggle
python prepare_data.py mlg-ulb/creditcardfraud Class 1

⚡ Running Benchmarks

Standard Performance Comparison (Rust)

cargo run --release --bin benchmark

Standard Performance Comparison (Python)

python run_single_benchmark.py

Drift Resilience Test

python drift_comparison_all.py

📊 Expected Results

Credit Card Fraud (0.2% fraud rate)

Model PR-AUC F1-Score ROC-AUC Training Time
PKBoost 87.8% 87.4% 97.5% 45s
XGBoost 74.5% 79.8% 91.7% 12s
LightGBM 79.3% 71.3% 92.1% 10s

🔧 Troubleshooting

Issue 1: "File not found: data/creditcard_train.csv"

# Download dataset
python prepare_data.py mlg-ulb/creditcardfraud Class 1

Issue 2: Kaggle API Authentication Error

# Set up Kaggle API credentials
mkdir -p ~/.kaggle
# Copy your kaggle.json there
chmod 600 ~/.kaggle/kaggle.json