Houses prices prediction
Let’s say we want to forecast the price of the house. So, we have two features – the frontage of the house and the depth of the house.
When you’re applying linear regression you don’t necessarily have to use just the two features which you were given, you can also create new features by yourself. Depending on what insight you might have, sometimes by defining new features you might actually get a better model.
For example, instead of predicting the price of a house by using frontage and depth, we can create a third feature which determines the size of a house – the area or the land area: area = frontage × depth. Therefore, now we can select our hypothesis as follows:
Closely related to the idea of choosing your own features, is the idea of using polynomial regression. Let’s say you have a housing price data set that looks like the graph below:
It doesn’t look like a straight line fits this data very well, so a linear regression wouldn’t be the best choice, but there are a few different models you might use to this kind of data.
Let’s take some examples:
1. The orange line determines the fit of a quadratic model: As can be observed from the Fig 1.1., a quadratic model doesn’t make sense because a quadratic function comes back down and we don’t think housing price should go down when the size goes up too high.
2. The green line determines the fitting of a cubic model: Given the fact that a cubic function doesn’t come back down, it seems the green line is a better fit to the data.
So, how do we fit a model like this to our data? Using the machinery of multivariate linear regression, we can do this with a pretty simple modification to our algorithm:
If we choose our features like this, then feature scaling becomes increasingly important. Therefore, if:
These three features take on very different ranges of values and it is important to apply feature scaling, if you’re using gradient descent, to get them into comparable ranges of values.