## Data Science, Probability, Life

### Life is best understood through a probabilistic lens

#### Tag: calibration

As I discussed in an earlier post, one common mistake in classification is to treat an uncalibrated score as a probability.  In the latest version of ML-Insights, we provide some simple functionality to use cross-validation and splines to calibrate models.  After the calibration, the output of the model can more properly be used as a probability.

Interestingly, even quite accurate models like Gradient Boosting (including XGBoost) benefit substantially from a calibration approach.

Using ML-Insights, just a few short lines of code can improve your performance greatly when using probability based metrics.  It’s as easy as:

`rfm = RandomForestClassifier(n_estimators = 500, class_weight='balanced_subsample', n_jobs=-1)`
`rfm_calib = mli.SplineCalibratedClassifierCV(rfm)`
`rfm_calib.fit(X_train,y_train)`

`test_res_calib = rfm_calib.predict_proba(X_test)[:,1]`

I’ve written a couple of nice jupyter notebooks which walk through this issue quite carefully.  Check them out if you are interested!  The Calibration_Example_ICU_MIMIC_Short notebook is best if you want to get right to the point.  For a more detailed explanation look at Calibration_Example_ICU_MIMIC.

Would love to hear any feedback or suggestions on this!

In most ML methods (including random forests, gradient boosting, logistic regression, and neural networks), the model outputs a score, which yields a “ranking classification“.  However, there are two very common mistakes that occur in dealing with this score:

1. Using a “default” threshold of 0.5 automatically to convert to a hard classification, rather than examing the performance across a range of thresholds.  (This is encouraged by sklearn’s convention that “model.predict” does precisely the former, while the latter requires the clunkier “model.predict_proba“)
2. Treating the score directly as a probability, without calibrating it.  This is patently wrong when using models like random forests (where the vote proportion certainly does not indicate the probability of being a ‘1’), and inaccurate even in logistic regression (where the output purports to be a probability, but often is not well calibrated).

We’ll dive into these mistakes in more detail in future posts.

Theme by Anders NorenUp ↑