Definition:
- A scalar y and an n-vector x are related by model, y≈f(x) which is f: Rn→R
- x is independent variable, the feature vector
- y is the outcome variable, sth we want to predict
- x(i), y(i) is the ith data pair
- xj(i) isthe jth component of ith data point x(i)
- Choose model f^: Rn→R are basis functions that we choose
- θi are model parameters we can choose
- f^(x)=θ1f1(x)+...+θpfp(x)
- y^(i)=f^(x(i)) is the model prediction of y(i)
- Define:
- yd=(y(1),...,y(N)) as vectors of outcomes
- y^d=(y^(1),...,y^(N)) as vectors of predictions
- rd=(r(1),...,r(N)) as vectors of residuals
- for ri=y(i)−y^(i)
- rms=(N(r(1))2+...+(r(N))2)1/2
- Define a matrix mapping A∈RN×p with Aij=fj(x(i)) so y^d=Aθ
- Then we choose θ to minize ∣∣rd∣∣2=∣∣yd−y^d∣∣2=∣∣yd−Aθ∣∣2=∣∣Aθ−yd∣∣2
- Similar to ordinary least square, θ^=(A⊺A)−1A⊺y (if columns of A are linearly independent)
- ∣∣Aθ^−y∣∣2/N is the minimum mean-square error
- For the weights of θ, we define the first weight f1(x)=1 as other weight are relative to it
- The function y^(x)=ax+b but we can add another last element in a to incorporate the value of b, so we have y^(x)=Ax
For p=1:
- Then f(x)=1 so the model f^(x)=θ1 is a constant
For p=2:
- f1(x)=1,f2(x)=x
- Model has form f^(x)=θ1+θ2x
- Matrix A has form: A=11.1x(1)x(2).x(N)=[1xd]
- Then θ^1,θ^2 can be found with f^(x)=avg(yd)+ρstd(xd)std(yd)(x−avg(xd))
For p= polynomial:
- fi(x)=xi−1, i=1,...,p
- Model is p−1 degree f^(x)=θ1+θ2x+...+θpxp−1, here x is powered, not the element
- A is then Vandermonde Matrix