Nowadays, logistic regression is one of the most popular and most widely used classification algorithm. Here are some classification examples when logistic regression can help us:

- Email: Spam/Not Spam?
- Online Transactions: Fraudulent (Yes/No)?
- Tumor: Malignant/Benign?

In all of these problems, the variable that we are trying to predict is a variable **y** that we can think of as taking on two values: 0 or 1.

y ∈ {0, 1}

0: “Negative Class” (e.g. benign tumor)

1: “Positive Class” (e.g. malignant tumor)

The assignment of the two classes, to positive and negative to 0 and 1, is somewhat arbitrary and it does not really matter, but often there is this intuition that a negative class is conveying the absence of the event, whereas the positive class is conveying the event presence.

This case, when y (the dependent variable) can take just two values is called** binomial logistic regression**. According to Wiki, a binomial logistic regression (often referred to simply as logistic regression), predicts the probability that an observation falls into one of two categories of a dichotomous dependent variable based on one or more independent variables that can be either continuous or categorical.

**But**… The binary dependent variables are not the only ones which could be predicted by using logistic regression. We can also have multi-class classification problems as well. Therefore, y may take, for example, four values: *y ∈ {0, 1, 2,* 3}.

E.g. We want to predict which of the next four cities will be the most desired holiday destination in 2020: *y ∈ {1 – New York, 2 – London,* 3 – Paris, 4 – Moscow}.

**Example.** How do we develop a classification algorithm?** **For a better understanding, let’s assume we want to classify a tumor as malign or benign. So, our dependent variable takes only two values: **zero** or **no**, **one** or **yes**.

We can apply linear regression to this data set and just try to fit a straight line to the data. In this case, most probably we’ll get a hypothesis as in graph below (blue line: h_{θ}(x) = θ^{T}X).

If we want to make predictions, one thing we could try doing, is threshold the classifier output *h _{θ}(x)* at 0.5:

- If h
_{θ}(x) ≥ 0.5, predict “y = 1” - If h
_{θ}(x) < 0.5, predict “y = 0”

Therefore, as we can see in the above graph, everything that is to the left of the vertical black line will be predicted as being negative (benign), while everything to the right of the black line will be predicted as being positive (malignant). In this particular example, linear regression is actually doing something reasonable, even though this is a classification toss we’re interested in.

But now, let’s try changing the problem a bit. Let’s extend out the horizontal axis and let’s say we got one more training example as in graph below:

Notice that the additional training example does not actually change anything. Looking at the training set, it is pretty clear that a good hypothesis will predict everything to the right of the black vertical line as being positive, just as in the first case. This may happen because, according to this training set, it looks like all the tumors larger than a certain value are malignant and all the tumors smaller that that particular value are benign.

Anyway, once we have added that extra example, if we run again linear regression we will get a straight line which will fit the data as the yellow one. Now, if we threshold hypothesis at 0.5 we will end up with a threshold around the vertical yellow line. So, everything to the left of that line will be predicted as being negative (benign) and everything to the right of the black line will be predicted as being positive (malignant). In this case, linear regression seems to make a pretty bad prediction.

In the first example, before adding the extra training example, linear regression algorithm was just getting lucky and it got us a hypothesis that worked well for that particular example, but usually applying linear regression to a classification problem is not a great idea.

Also, if you will use linear regression for classification problems, this is one other funny thing that could happen:

**Classification:** y = 0 or 1

* h _{θ}(x) can be > 1 or < 0*

*English speaking: *We know that for our classification problem, y could take just two values: 1 or 0. If we will use linear regression for our predictions, the hypothesis can output values that are larger than 1 or less than 0, even if all our training examples have labels y *∈* {0, 1}.

Therefore, in the next articles we will go more in depth with logistic regression algorithm, which has the property that the predictions are always either 1 or 0. Maybe, sometimes it is confusing that the term regression appears in the name, even though logistic regression is actually a classification algorithm, and that’s just a name that was given for historical reasons.

Acknowledgments:

## Leave a Reply