 
            PKBOOST
Python Package Documentation
The official Python wrapper for PKBoost, providing seamless integration with Python's machine learning ecosystem.
📦 Installation
pip install pkboost🚀 Quick Start
import pkboost
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve, auc
# Load your data
data = pd.read_csv('your_data.csv')
X = data.drop('target', axis=1)
y = data['target']
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
# Create and train PKBoost classifier
model = pkboost.PKBoostClassifier()
model.fit(X_train, y_train)
# Make predictions
y_pred_proba = model.predict_proba(X_test)[:, 1]
# Evaluate
precision, recall, _ = precision_recall_curve(y_test, y_pred_proba)
pr_auc = auc(recall, precision)
print(f"PR-AUC: {pr_auc:.4f}")🔧 PKBoostClassifier
The main classifier class with the following parameters:
PKBoostClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=6,
    min_samples_split=2,
    min_samples_leaf=1,
    subsample=1.0,
    colsample_bytree=1.0,
    reg_lambda=1.0,
    reg_alpha=0.0,
    random_state=None,
    n_jobs=-1,
    verbose=0
)✨ Key Features
- Automatic Hyperparameter Tuning: Use auto_tune=Truefor automatic configuration
- Early Stopping: Monitor validation performance with eval_set
- Feature Importance: Access via feature_importances_attribute
- Handles Imbalance: Built-in class weighting for imbalanced datasets
⚡ Advanced Usage
With Early Stopping
from sklearn.model_selection import train_test_split
# Split into train, validation, test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, stratify=y)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, stratify=y_temp)
model = pkboost.PKBoostClassifier(
    n_estimators=1000,  # Set high, early stopping will determine actual number
    early_stopping_rounds=50,
    verbose=10
)
model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    verbose=True
)Automatic Hyperparameter Tuning
model = pkboost.PKBoostClassifier(auto_tune=True)
model.fit(X_train, y_train)🔄 PKBoostAdaptive
For streaming data and concept drift scenarios:
from pkboost import PKBoostAdaptive
# Initialize adaptive model
adaptive_model = PKBoostAdaptive(
    drift_detection_sensitivity=0.01,
    adaptation_rate=0.1,
    max_retraining_interval=1000
)
# For streaming data
for batch_X, batch_y in data_stream:
    adaptive_model.partial_fit(batch_X, batch_y)
    
    # Check if drift detected
    if adaptive_model.drift_detected:
        print("Concept drift detected! Model is adapting...")
    
    # Get current predictions
    predictions = adaptive_model.predict_proba(batch_X)🔗 Integration with Scikit-Learn
Pipeline Integration
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler()),
    ('classifier', pkboost.PKBoostClassifier())
])
pipeline.fit(X_train, y_train)💾 Model Persistence
Save and Load Models
import joblib
# Save model
joblib.dump(model, 'pkboost_model.pkl')
# Load model
loaded_model = joblib.load('pkboost_model.pkl')🚀 Performance Tips
- Data Preprocessing: Ensure numerical features are scaled and categorical features are encoded
- Early Stopping: Always use early stopping to prevent overfitting
- Subsampling: For large datasets, use subsample < 1.0for faster training
- Parallelism: Set n_jobs=-1to use all available cores
- Memory: Use float32data types for large datasets