**Normal equation:**

- What if
**X**^{T}X is non-invertible**? **(singular/degenerate)
**R: **ginv(X’*X)*X’y from {MASS}
**Octave:** pinv(X’*X)*X’y

The issue of **X**^{T}X being non-invertible should happen pretty rarely. R and also Octave, have different functions for inverting matrices:

**R:**
**solve() **from {base}
**ginv()** from {MASS} – Generalized Inverse of a Matrix (computes the value of θ that you want even if **X**^{T}X is non-invertible)

**Octave:**
**inv() – **inverse
**pinv() – **pseudo**–**inverse (computes the value of θ that you want even if **X**^{T}X is non-invertible)

**Most common causes for the non-invertibility of ***X*^{T}X :

**Redundant features** (linearly dependent).
- E.g.
*x*_{1}_{ }= size in feet^{2
}* x*_{2}= size in m^{2}

Concretely, let’s say we want to predict housing prices. If *x1 *is the size of the house in square feet and *x*_{2 }is the size of the house in square meters, then you will have a linear dependence between these two features, because *1m = 3.28 feet*, they will always satisfy the constraint:

* x*_{1 }= (3.28)^{2}x_{2}

If we will remove one of these features, the non-invertibility problem should be solved.

**Too many features** (e.g. *m ≤ n*, so if we are trying to run the learning algorithm having more features then observations in the training set*).*

For example, let’s imagine we have m = 10 training examples and n = 100 features, then we are trying to fit a parameter back to θ which is n+1 dimensional. So, we would be trying to fit 101 parameters from just 10 training observations, which obviously is not a good idea.

If this would be the situation, a proper solution would be to delete some features or use regularization.

### Like this:

Like Loading...

*Related*

## Leave a Reply