As I discussed in an earlier post, one common mistake in classification is to treat an uncalibrated score as a probability. In the latest version of ML-Insights, we provide some simple functionality to use cross-validation and splines to calibrate models. After the calibration, the output of the model can more properly be used as a probability.
Interestingly, even quite accurate models like Gradient Boosting (including XGBoost) benefit substantially from a calibration approach.
Using ML-Insights, just a few short lines of code can improve your performance greatly when using probability based metrics. It’s as easy as:
rfm = RandomForestClassifier(n_estimators = 500, class_weight='balanced_subsample', n_jobs=-1)
rfm_calib = mli.SplineCalibratedClassifierCV(rfm)
test_res_calib = rfm_calib.predict_proba(X_test)[:,1]
I’ve written a couple of nice jupyter notebooks which walk through this issue quite carefully. Check them out if you are interested! The Calibration_Example_ICU_MIMIC_Short notebook is best if you want to get right to the point. For a more detailed explanation look at Calibration_Example_ICU_MIMIC.
Would love to hear any feedback or suggestions on this!