PKBoost Benchmark Reproduction Guide

Complete guide for reproducing all benchmarks from the PKBoost paper, including standard performance comparisons and drift resilience tests.

📋 Table of Contents

Quick Start (5 minutes)
Full Reproduction (1-2 hours)
Dataset Preparation
Running Benchmarks
Expected Results
Troubleshooting

🚀 Quick Start (5 minutes)

Use included sample data to verify PKBoost works.

Step 1: Clone Repository

git clone https://github.com/Pushp-Kharat1/pkboost.git
cd pkboost

Step 2: Verify Sample Data

ls data/
# Should show: creditcard_train.csv, creditcard_val.csv, creditcard_test.csv

Step 3: Run Rust Benchmark

cargo run --release --bin benchmark

Expected output:

=== PKBoost Benchmarking ===
Train: 102,530 samples (0.17% fraud)
Val:   34,177 samples
Test:  34,177 samples

Training PKBoost... 45.2s
  PR-AUC: 0.8782

Training XGBoost... 12.1s
  PR-AUC: 0.7458

Training LightGBM... 9.8s
  PR-AUC: 0.7931

PKBoost improves PR-AUC by +17.9% over XGBoost

🔬 Full Reproduction (1-2 hours)

For complete results matching the paper, download full datasets and run extended benchmarks.

Dataset Preparation

# Download from Kaggle
python prepare_data.py mlg-ulb/creditcardfraud Class 1

⚡ Running Benchmarks

Standard Performance Comparison (Rust)

cargo run --release --bin benchmark

Standard Performance Comparison (Python)

python run_single_benchmark.py

Drift Resilience Test

python drift_comparison_all.py

📊 Expected Results

Credit Card Fraud (0.2% fraud rate)

Model	PR-AUC	F1-Score	ROC-AUC	Training Time
PKBoost	87.8%	87.4%	97.5%	45s
XGBoost	74.5%	79.8%	91.7%	12s
LightGBM	79.3%	71.3%	92.1%	10s

🔧 Troubleshooting

Issue 1: "File not found: data/creditcard_train.csv"

# Download dataset
python prepare_data.py mlg-ulb/creditcardfraud Class 1

Issue 2: Kaggle API Authentication Error

# Set up Kaggle API credentials
mkdir -p ~/.kaggle
# Copy your kaggle.json there
chmod 600 ~/.kaggle/kaggle.json

← BACK TO HOME RUST DOCS → PYTHON PACKAGE →

PKBOOST