The coefficients b chosen by the least squares criterion are called least squares estimates of β, denoted by β^
The deviations ϵ^j=yj−β^0−β^1zj1−⋯−β^rzjr,j=1,…,n are called residuals. The vector of residuals ϵ^=y−Zβ^ contains the information about the remaining unknown parameter σ2
Proposition:
Let Z have full rank r+1≤n. The least squares estimate of β is given by β^=(ZTZ)−1ZTy
Let y^=Zβ^=Hy denote the fitted values of y, where H=Z(ZTZ)−1ZT is called hat matrix.
Then the residuals ϵ^=y−y^=(I−H)y satisfy ZTϵ^=0 and g^Tϵ^=0.
The residual sum of squares ∑j=1n(yj−β^0−β^1zj1−⋯−β^rzjr)2=ϵ^Tϵ^=yTy−yTZβ^
The coefficient of determination R2=∑j=1n(yj−yˉ)2∑j=1n(y^j−yˉ)2 gives the proportion of the total variation in the yj ‘s explained by, or attributable to, the predictor variables z1,…,zr.
Here R2 equals 1 if the fitted equation passes through all the data points, so that ϵ^j=0 for all j.
In addition, R2=0 if β^0=yˉ and β^1=⋯=β^r=0. The predictor variables z1,…,zr have no influence on the response.
The least squares estimator β^=(ZTZ)−1ZTY has E(β^)=β and Cov(β^)=σ2(Z⊤Z)−1
The residuals ϵ^ have the properties E(ϵ^)=0 and Cov(ε^)=σ2(I−H)
Let Y=Zβ+ϵ where Z has full rank r+1 and ϵ is distributed as Nn(0,σ2І). Then the maximum likelihood estimator of β is the same as the least squares estimator β^.
Moreover, β^=(ZTZ)−1ZTY∼Nr+1(β,σ2(ZTZ)−1) and is distributed independently of the residuals ϵ^=Y−Zβ^.
Further, nσ^2=ϵ^Tϵ^ is distributed as σ2χn−r−12 where σ^2 is the maximum likelihood estimator of σ2
Let Y=Zβ+ϵ, where Z has full rank r+1 and ϵ is Nn(0,σ2I). Then a 100(1−α) confidence region for β is given by (β−β^)TZTZ(β−β^)≤(r+1)s2Fr+1,n−r−1(α)
Also, simultaneous 100(1−α)% confidence intervals for the βi are given by β^i∓Var(β^i)(r+1)Fr+1,n−r−1(α),i=0,…,r where Var(β^i) is the diagonal element of s2(ZTZ)−1 corresponding to β^i.
The confidence ellipsoid is centered at the maximum likelihood estimate β^ and its orientation and size are determined by the eigenvalues and eigenvectors of ZTZ.
If an eigenvalue is nearly zero, the confidence ellipsoid will be very long in the direction of the corresponding eigenvector.
Practitioners often use the intervals β^∓tn−r−1(2α)Var(β^i) when searching for important predictor variables.
Likelihood ratio tests for regression parameters:
Part of regression analysis is concerned with assessing the effects of particular predictor variables on the response variable. One null hypothesis of interest states that certain of the zj‘s do not influence the response Y.
These predictors will be labeled zq+1,…,zr. The statement that zq+1,…,zr do not influence Y translates into the statistical hypothesis H0:βq+1=⋯=βr=0 or H0:β(2)=0 where β(2)T=[βq+1,…,βr].
we can express the general linear model as Y=Zβ+ϵ=Z1β(1)+Z2β(2)+ϵ
Define extra sum of squares SSres (Z1)−SSres (Z) to be (y−Z1β^(1))T(y−Z1β^(1))−(y−zβ^)T(y−zβ^)
where β^(1)=(Z1TZ1)−1Z1Ty
Let Z have full rank r+1 and ϵ be distributed as Nn(0,σ2I) The likelihood ratio test rejects H0 if s2(r−q)SSres(Z1)−SSres(Z)>Fr−q,n−r−1(α)
Inferences from the estimated regression function
Once an investigator is satisfied with the fitted regression model, it can be used to solve two prediction problems.
Let z0T=[1,z01,…,z0r] be selected values for the predictor variables.
Let Y0 denote the value of the response when the predictor variables have values z0T. According to the classical linear regression model, E(Y0∣z0)=β0+β1z01+⋯+βrz0r=z0Tβ
Its least squares estimate is z0Tβ^
z0Tβ^ is the unbiased linear estimator of E(Y0∣z0) with minimum variance Var(z0Tβ^)=z0T(ZTZ)−1z0σ2
If the errors ϵ are normally distributed, then a 100(1−α)% confidence interval for E(Y0∣z0)=z0Tβ is z0Tβ^∓tn−r−1(2α)(z0T(ZTZ)−1z0)s2
Prediction of a new observation, such as Y0 at z0, is more uncertain than estimating the expected value of Y0.
Y0=z0Tβ+ϵ0where ϵ0∼N(0,σ2) and is independent of ϵ and, hence, of β^ and s2.
A new observation Y0 has the unbiased predictor z0Tβ^=β^0+β^1z01+⋯+β^rz0r
The variance of the forecast error Y0−z0Tβ^ is Var(Y0−z0Tβ^)=σ2(1+z0T(ZTZ)−1z0)
When the errors ϵ have a normal distribution, a 100(1−α)% prediction interval for Y0 is given by z0Tβ^∓tn−r−1(2α)s2(1+z0T(ZTZ)−1z0)
The prediction interval for Y0 is wider than the confidence interval for estimating the value of the regression function E(Y0∣z0).
The additional uncertainty in forecasting Y0, which is represented by the extra term s2 in the expression s2(1+z0T(ZTZ)−1z0), comes from the presence of the unknown error term ϵ0.