# Statistical Learning Theory (for own reference)

For my own reference:

Given $X \in R^{p}$ as the real-valued random input vector and $Y \in R^{p}$ is a real valued quantitative output, our objective is to seek a function f(x) for which we can predict the output based on the input. To do this, we would need a loss function and find f(x) such that this loss function is minimized.

The squared error loss function is the obvious choice. ie. Penalize ($ (Y-f(x)^2)$ To see what we will choose f, let EPE(f) be the expected prediction error from f(x). $ EPE(f) = E(Y-f(x))^2) \ = E_X E_{Y|X}((Y-f(x))^2|X) $ Given X = x, $ E_X (E_{Y|X}(Y-f(x))^2|X = x) = E_{Y|X}((Y-f(x))^2|X = x)$. From here, we see that it suffices to minimize the function pointwise (because it has a quadratic form). If we expand $ E_{Y|X}((Y-f(x))^2|X = x)$ out and let f(x) = c, then we see that we have a quadratic function : $ E_{Y|X}((Y-c)^2|X = x) \ = E(Y^2|X = x ) - 2c E(Y|X = x) + c^2$. Under the quadratic form of $ ax^2 + bx + c$, we have $ a = E(Y^2|X = x) $, $ b = E(Y|X = x)$ and the minimum of c is given by $ c = -b/2a = E(Y|X=x)$. This conditional expectation is the regression function.