If you like scorecards and gradient boosting, this repo is for you! π
This library is designed for productionizing credit scoring models with gradient boosting. The xbooster library serves as an extension of XGBoost logistic regression tailored to the needs of model developers and validators.
What problems does xbooster help you solve?
Boost model accuracy ratio: Gain better predictive performance during model development.
Open up the βblack boxβ: Leverage tree visualization, global and local importance features.
Streamline deployment and validation: Deploy using simple SQL and analyze models.
Install with pip install xbooster
β
Scorecard Boosting π#
Scorecard boosting is an innovative methodology for constructing credit scorecards by leveraging advanced machine learning (ML) techniques, specifically gradient boosting, that emerged in the domain of Credit Risk.
The benefits of boosted scorecards include:
Predictive Power: Boosted scorecards can outperform traditional scorecards in terms of predictive power.
Flexibility: Because boosted models are non-linear, they can capture complex relationships between features.
Given that boosted logistic models can deliver between 1-5% improvement in accuracy compared to traditional scorecards, the benefits can amount to millions of dollars in savings for financial institutions.
The following sections provide an overview of gradient boosting, the core algorithm behind scorecard boosting, and how it can be used to build boosted scorecards.
Gradient Boosting π³#
Gradient boosting, which lies at the π of scorecard boosting, is a machine learning technique that builds a predictive model by combining the outputs of multiple βweakβ models, typically decision trees, to create a strong predictive model.
The algorithm works sequentially, with each new model focusing on correcting errors made by the previous ones. It minimizes a loss function by adding new models to the ensemble, and the final prediction is the sum of the predictions from all models.
One of the most known frameworks for gradient boosting with decision trees is XGBoost. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable.
For binary classification tasks like credit scoring, XGBoost performs a form of Logistic Regression. The algorithm is trained to minimize the log loss function, which is the negative log-likelihood of the true labels given a probabilistic model.
The algorithm used in XGBoost Logistic Regression follows the Newton-Raphson update method, which was initially described by J. Friedman (2001). XGBoost Logistic Regression also has ties to LogitBoost, which was described by J. Friedman et al. (2000).
To familiarize yourself further with gradient boosting and XGBoost follow the links below:
Boosted Scorecards π³#
Boosted scorecards built on top of gradient-boosted trees allow to improve performance metrics like Gini score and Kolmogorov-Smirnov (KS) statistic compared to standard tools, while maintaining the interpretability of traditional scorecards. π This is achieved by combining the best of both worlds: the interpretability of scorecards and the predictive power of gradient boosting. π
A boosted scorecard can be seen as a collection of sequential decision trees transformed into a traditional scorecard format. π³ This scorecard comprises rules essential for computing a credit score, an evaluative measure of creditworthiness of new or existing customers. Typically ranging from 300 to 850, this score can be further customized using the Points to Double the Odds (PDO) technique, a concept extendable to gradient boosted decision trees.
Below we can see how the number of boosting iterations affects the distribution of boosted credit scores among good and bad customers: