Tune Threshold explained with code in python

Danil Zherebtsov
5 min readSep 24, 2021

This article will include three parts:

  • Theory
  • Manual approach with code
  • Automatic tool to ease your pain

What is a threshold?

A binary classification model returns the predicted class (0 or 1) and the probabilities of both classes. The predicted class is inferred from the probabilities, that is if a model predicts [0.35, 0.65] = [predicted_probability_of_0, predicted_probability_of_1], a default threshold (0.5) will be applied and all the predicted_probability_of_1 > 0.5 will be considered as 1, otherwise 0.

If a different threshold is applied, e.g. 0.3, then if the predicted_probability_of_1 is greater than 0.3, predicted class will be 1, otherwise 0.

So why should we consider a different (from default 0.5) threshold?

In case you have an imbalanced target variable (e.g. 80% zeros and 20% ones), a machine learning model will learn on statistics where the probability of ‘1’ is very low and the resulting predicted probabilities for ‘1’ will in most cases be very small and almost definately will they be smaller then the default threshold 0.5. Thus the default threshold of 0.5 will most certainly classify all the predictions to be 0. In…

--

--