A cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the invent.

An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (reward function, profit function, etc), in which case it is to be maximized.

**In statistics, a loss function is utilized for parameter estimation and the event in question is some function of the difference between estimated and true values for an instance of data.**

In the context of economics, loss function is usually economic cost or regret. In classification, it is the penalty for an incorrect classification of an example. In actuarial science, it is used in an insurance context to model benefits paid over premiums. In optimal control, the loss function is the penalty for falling to achieve a desired value. In financial risk management the function is precisely mapped to a monetary loss.

Parameter estimation for supervised learning tasks, such as a regression or classification, can be formulated as the minimization of a loss function over a training set. The goal of estimation is to find a function that models its input well: if it were applied to the training set, if should predict the values or class labels associated with the samples in the set. **The cost function quantifies the amount by which the prediction deviates from actual values.**

Let’s start by giving the definition according with Wiki. We consider some family of distributions for a random variable X, indexed by some *θ (theta :D).*

Actually, we can think of X as our data, so:

** X = (X1, …, Xn) **where** Xi ~ F***θ iid.*

X – set of things the decision rule will be making decision

Fθ – possible ways to model our data.

**Finite numbers of models ⇒** we can thing of θ as the index to this family of probability models.

**Infinite numbers of models ⇒ **θ is a set of parameters to the family of distribution.

**Acknowledgments:**

Wiki

### Like this:

Like Loading...

*Related*

[…] the image above, first graph represents the way of the cost function to the global minimum before using feature scaling on data and the second one is the […]

LikeLike

[…] cost function for gradient descent should look like in the graph […]

LikeLike