Member-only story
LightGBM vs XGBoost vs Catboost
Quick summary
Hello 👋 In this article, I will compare LightGBM, XGBoost and CatBoost in the following areas:
- Boosting algorithm
- Node splitting
- Missing data handling
- Feature handling
- Data sampling
- LightGBM-specific features
- XGBoost-specific features
- CatBoost-specific features
- Tips for choosing between LightGBM, XGBoost and CatBoost
- Resources
🚀 Subscribe to us @ newsletter.datascienceletter.com
Boosting algorithm
Conventional boosting (LightGBM, XGBoost) vs Order boosting (CatBoost)
One of the major differences in tree building between LightGBM/XGBoost and CatBoost is the usage of ‘Order boosting’ in CatBoost.
In conventional boosting algorithms (used by LightGBM and XGBoost), at each boosting iteration, the tree is built using the same data points. It is argued that this repeated use of a single set of data points and can increase the chance of overfitting.
To mitigate this effect, CatBoost supports a different boosting algorithm as known as order boosting. The whole idea of this algorithm is to avoid repeatedly using same data points for both tree building and gradient or hessian computations. The method is briefly explained as follows:
- First the original training dataset with size N is shuffled S times.
- At each boosting iteration, for each shuffled dataset, a separate tree is built for each data position i (where i = 1, 2 ,…, N), using only data points before i (j < i).
- The gradients and hessians for a particular data point k are then computed using trees built before k.
In reality, it is not practical to train a tree for each data position for each shuffled datasets, as the computational complexity would scale as SN². The actual algorithm builds trees for a fixed number of…