Classification- Logistic Regression

4 min readSep 13, 2020

Welcome back! In the previous post, we saw how we can use linear regression to predict continuous outputs. What if we want to predict discrete value or even simply a True/False outcome?

Supervised learning problems are categorized into “Regression” and “Classification” problems.

In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

For example, a spam filter that marks an incoming email as a “spam” or not is a classification problem. If the discrete values are either a 0 or a 1, we call this binary classification.

In linear regression, we modeled the relationship using the function of a straight line. For binary classification, a different function whose range is (0,1) is preferred. In Machine Learning, the standard convention is to use a “logistic” or “sigmoid” function.

Logistic Function

The decision between the ‘0’ or ‘1’ is based as follows:

If 0.5 ≤ g(x) implies ‘1’ and
If 0 < g(x) < 0.5 implies ‘0’

Logistic Regression Cost Function

The cost function for logistic regression looks like this:

Or in general and for all the complete dataset:

Decision Boundary

In classification we want to identify the boundary that separates the categories as in the figure below.

Thus we are looking for a line (or curve) that separates. As seen in the linear regression post, the function for such can be expressed as

So, when using a Sigmoid function we are looking for

Gradient

In order to fit the best curve, very similar to Linear Regression, our goal is to minimize the cost (or error) function.

We can start differentiating the values based on the theta (θ)

and with the learning rate,

Gradient Descent Algorithm

Using the same approach as in the earlier post, we will do the following

The corresponding matrix equivalent until it converges is

Thus we can find the coefficients (theta) that best fit the curve (the one with the least error).

When we try to classify any new observation, we would apply the function with the relevant coefficients. By applying the function for the inputs, if the value is < 0.5, it implies binary output of ‘0’ and if the value is ≥ implies binary output of ‘1’.

In this summary we used Logistic Regression to classify binary outputs. The same approach can be extended to multi-class problems as well.

Hope you found this and my previous post interesting. I’m even more excited to learn and absorb Machine learning fundamentals. Looking forward to my next step in this journey. Stay Tuned in this medium.

** Enjoy Machine Learning! **