Black Simple Personal Logo

PKBoost

An Adaptive Gradient Boosting Library

Gradient boosting that adjusts to concept drift in imbalanced data.

Built from scratch in Rust, PKBoost manages changing data distributions in fraud detection with a fraud rate of 0.2%. It shows less than 2% degradation under drift. In comparison, XGBoost experiences a 31.8% drop and LightGBM a 42.5% drop. PKBoost outperforms XGBoost by 10-18% on the Standard dataset when no drift is applied. It employs information theory with Shannon entropy and Newton Raphson to identify shifts in rare events and trigger an adaptive "metamorphosis" for real-time recovery.

"Most boosting libraries overlook concept drift. PKBoost identifies it and evolves to persist."

Perfect for: Streaming fraud detection, real-time medical monitoring, anomaly detection in changing environments, or any scenario where data evolves over time and positive instances are rare.


🚀 Quick Start

To use it in Python Please refer to: PKBoost Python or install via pip install pkboost

Clone the repository and build:

git clone https://github.com/Pushp-Kharat1/pkboost.git
cd pkboost
cargo build --release

Run the benchmark:

1. Use included sample data (already in data/)

ls data/  # Should show creditcard_train.csv, creditcard_val.csv, etc.

2. Run benchmark

cargo run --release --bin benchmark

💻 Basic Usage

use pkboost::*;
use csv;
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // Load CSV with headers: feature1,feature2,...,Class
    let (x_train, y_train) = load_csv("train.csv")?;
    let (x_val, y_val) = load_csv("val.csv")?;
    let (x_test, y_test) = load_csv("test.csv")?;

    // Auto-configure based on data characteristics
    let mut model = OptimizedPKBoostShannon::auto(&x_train, &y_train);

    // Train with early stopping on validation set
    model.fit(
        &x_train,
        &y_train,
        Some((&x_val, &y_val)),  // Optional validation
        true  // Verbose output
    )?;

    // Predict probabilities (not classes)
    let test_probs = model.predict_proba(&x_test)?;

    // Evaluate
    let pr_auc = calculate_pr_auc(&y_test, &test_probs);
    println!("PR-AUC: {:.4}", pr_auc);

    Ok(())
}

✨ Key Features