Multivariate Linear Regression Model

Definition:

Linear regression but in multiple dimensions
With $n$ independent observations on $Y$ and the associated values of $z_{i}$ , the complete model becomes:
- $Y_{1} = β_{0} + β_{1} z_{11} + ... + β_{r} z_{1 r} + ϵ_{1}$
- …
- $Y_{n} = β_{0} + β_{n} z_{n 1} + ... + β_{r} z_{n r} + ϵ_{n}$
Error term, $ϵ$ :
- $E (e_{j}) = 0$
- $Va r (e_{j}) = σ^{2}$ (constant)
- $C o v (e_{j}, e_{k}) = 0$ for $j \neq = k$
Then we have $Y = Z \times β + ϵ$ where:
- $Z$ is design matrix
- $C o v (ϵ) = E (ϵ ϵ^{⊺}) = σ^{2} I$
We introduce the artificial variable $z_{j 0} = 1$ so that $β_{0} + β_{1} z_{j 1} + \dots + β_{r} z_{j r} = β_{0} z_{j 0} + β_{1} z_{j 1} + \dots + β_{r} z_{j r}$

Least square estimation:

We must determine the values for the regression coefficients $β$ and the error variance $σ^{2}$ consistent with the available data.
Let $b$ be trial values for $β$ . The method of least squares selects $b$ so as to minimize the sum of the squares of the differences:
- $S (b) = j = 1 \sum n (y_{j} - b_{0} - b_{1} z_{j 1} - \dots - b_{r} z_{j r})^{2} = (y - z b)^{T} (y - z b)$
The coefficients $b$ chosen by the least squares criterion are called least squares estimates of $β$ , denoted by $\hat{β}$
The deviations $\overset{ϵ}{^}_{j} = y_{j} - \hat{β}_{0} - \hat{β}_{1} z_{j 1} - \dots - \hat{β}_{r} z_{j r}, j = 1, \dots, n$ are called residuals. The vector of residuals $\overset{ϵ}{^} = y - Z \hat{β}$ contains the information about the remaining unknown parameter $σ^{2}$
Proposition:
- Let $Z$ have full rank $r + 1 \leq n$ . The least squares estimate of $β$ is given by $\hat{β} = (Z^{T} Z)^{- 1} Z^{T} y$
- Let $\overset{y}{^} = Z \hat{β} = Hy$ denote the fitted values of $y$ , where $H = Z (Z^{T} Z)^{- 1} Z^{T}$ is called hat matrix.
- Then the residuals $\overset{ϵ}{^} = y - \overset{y}{^} = (I - H) y$ satisfy $Z^{T} \hat{ϵ} = 0$ and $\hat{g}^{T} \hat{ϵ} = 0$ .
- The residual sum of squares $\sum_{j = 1}^{n} (y_{j} - \hat{β}_{0} - \hat{β}_{1} z_{j 1} - \dots - \hat{β}_{r} z_{j r})^{2} = \hat{ϵ}^{T} \hat{ϵ} = y^{T} y - y^{T} Z \hat{β}$
The coefficient of determination $R^{2} = \frac{\sum _{j = 1}^{n} ( y ^ _{j} - y ˉ ) ^{2}}{\sum _{j = 1}^{n} ( y _{j} - y ˉ ) ^{2}}$ gives the proportion of the total variation in the $y_{j}$ ‘s explained by, or attributable to, the predictor variables $z_{1}, \dots, z_{r}$ .
- Here $R^{2}$ equals 1 if the fitted equation passes through all the data points, so that $\overset{ϵ}{^}_{j} = 0$ for all $j$ .
- In addition, $R^{2} = 0$ if $\hat{β}_{0} = \overset{y}{ˉ}$ and $\hat{β}_{1} = \dots = \hat{β}_{r} = 0$ . The predictor variables $z_{1}, \dots, z_{r}$ have no influence on the response.
The least squares estimator $\hat{β} = (Z^{T} Z)^{- 1} Z^{T} Y$ has $E (\hat{β}) = β and Cov (\hat{β}) = σ^{2} (Z^{⊤} Z)^{- 1}$
- The residuals $\hat{ϵ}$ have the properties $E (\hat{ϵ}) = 0 and Cov (\hat{ε}) = σ^{2} (I - H)$
- Also, $E (\hat{ϵ}^{T} \hat{ϵ}) = (n - r - 1) σ^{2}$ . Defining $s^{2} = \frac{ϵ ^ ^{T} ϵ ^}{n - r - 1} = \frac{Y ^{T} ( I - H ) Y}{n - r - 1}$
- We have $E (s^{2}) = σ^{2}$
- Moreover, $\hat{β}$ and $\hat{ϵ}$ are uncorrelated.

Inferences about the regression model:

Let $Y = Z β + ϵ$ where $Z$ has full rank $r + 1$ and $ϵ$ is distributed as $N_{n} (0, σ^{2} І)$ . Then the maximum likelihood estimator of $β$ is the same as the least squares estimator $\hat{β}$ .
- Moreover, $\hat{β} = (Z^{T} Z)^{- 1} Z^{T} Y \sim N_{r + 1} (β, σ^{2} (Z^{T} Z)^{- 1})$ and is distributed independently of the residuals $\overset{ϵ}{^} = Y - Z \hat{β}$ .
- Further, $n \overset{σ}{^}^{2} = \hat{ϵ}^{T} \hat{ϵ} is distributed as σ^{2} χ_{n - r - 1}^{2}$ where $\overset{σ}{^}^{2}$ is the maximum likelihood estimator of $σ^{2}$
Let $Y = Zβ + ϵ$ , where $Z$ has full rank $r + 1$ and $ϵ$ is $N_{n} (0, σ^{2} I)$ . Then a $100 (1 - α)$ confidence region for $β$ is given by $(β - \hat{β})^{T} Z^{T} Z (β - \hat{β}) \leq (r + 1) s^{2} F_{r + 1, n - r - 1} (α)$
- Also, simultaneous $100 (1 - α) %$ confidence intervals for the $β_{i}$ are given by $\hat{β}_{i} \mp Var (\hat{β}_{i}) (r + 1) F_{r + 1, n - r - 1} (α), i = 0, \dots, r$ where $Var (\hat{β}_{i})$ is the diagonal element of $s^{2} (Z^{T} Z)^{- 1}$ corresponding to $\hat{β}_{i}$ .
The confidence ellipsoid is centered at the maximum likelihood estimate $\hat{β}$ and its orientation and size are determined by the eigenvalues and eigenvectors of $Z^{T} Z$ .
- If an eigenvalue is nearly zero, the confidence ellipsoid will be very long in the direction of the corresponding eigenvector.
- Practitioners often use the intervals $\hat{β} \mp t_{n - r - 1} (\frac{α}{2}) Var (\hat{β}_{i})$ when searching for important predictor variables.

Likelihood ratio tests for regression parameters:

Part of regression analysis is concerned with assessing the effects of particular predictor variables on the response variable. One null hypothesis of interest states that certain of the $z_{j}$ ‘s do not influence the response $Y$ .
These predictors will be labeled $z_{q + 1}, \dots, z_{r}$ . The statement that $z_{q + 1}, \dots, z_{r}$ do not influence $Y$ translates into the statistical hypothesis $H_{0} : β_{q + 1} = \dots = β_{r} = 0 or H_{0} : β_{(2)} = 0$ where $β_{(2)}^{T} = [β_{q + 1}, \dots, β_{r}]$ .
Setting $Z = [Z_{1} n \times (q + 1) Z_{2} n \times (r - q)] β = β_{(1)} (q + 1) \times 1 β_{(2)} (r - q) \times 1$
- we can express the general linear model as $Y = Zβ + ϵ = Z_{1} β_{(1)} + Z_{2} β_{(2)} + ϵ$
Define extra sum of squares $S S_{res} (Z_{1}) - S S_{res} (Z)$ to be $(y - Z_{1} \hat{β}_{(1)})^{T} (y - Z_{1} \hat{β}_{(1)}) - (y - z \hat{β})^{T} (y - z \hat{β})$
- where $\hat{β}_{(1)} = (Z_{1}^{T} Z_{1})^{- 1} Z_{1}^{T} y$
Let $Z$ have full rank $r + 1$ and $ϵ$ be distributed as $N_{n} (0, σ^{2} I)$ The likelihood ratio test rejects $H_{0}$ if $\frac{S S _{res} ( Z _{1} ) - S S _{res} ( Z )}{s ^{2} ( r - q )} > F_{r - q, n - r - 1} (α)$

Inferences from the estimated regression function

Once an investigator is satisfied with the fitted regression model, it can be used to solve two prediction problems.
Let $z_{0}^{T} = [1, z_{01}, \dots, z_{0 r}]$ be selected values for the predictor variables.
Let $Y_{0}$ denote the value of the response when the predictor variables have values $z_{0}^{T}$ . According to the classical linear regression model, $E (Y_{0} ∣ z_{0}) = β_{0} + β_{1} z_{01} + \dots + β_{r} z_{0 r} = z_{0}^{T} β$
Its least squares estimate is $z_{0}^{T} \hat{β}$
$z_{0}^{T} \hat{β}$ is the unbiased linear estimator of $E (Y_{0} ∣ z_{0})$ with minimum variance $Var (z_{0}^{T} \hat{β}) = z_{0}^{T} (Z^{T} Z)^{- 1} z_{0} σ^{2}$
- If the errors $ϵ$ are normally distributed, then a $100 (1 - α) %$ confidence interval for $E (Y_{0} ∣ z_{0}) = z_{0}^{T} β$ is $z_{0}^{T} \hat{β} \mp t_{n - r - 1} (\frac{α}{2}) (z_{0}^{T} (Z^{T} Z)^{- 1} z_{0}) s^{2}$
Prediction of a new observation, such as $Y_{0}$ at $z_{0}$ , is more uncertain than estimating the expected value of $Y_{0}$ .
- $Y_{0} = z_{0}^{T} β + ϵ_{0}$ where $ϵ_{0} \sim N (0, σ^{2})$ and is independent of $ϵ$ and, hence, of $\hat{β}$ and $s^{2}$ .
A new observation $Y_{0}$ has the unbiased predictor $z_{0}^{T} \hat{β} = \hat{β}_{0} + \hat{β}_{1} z_{01} + \dots + \hat{β}_{r} z_{0 r}$
- The variance of the forecast error $Y_{0} - z_{0}^{T} \hat{β}$ is $Var (Y_{0} - z_{0}^{T} \hat{β}) = σ^{2} (1 + z_{0}^{T} (Z^{T} Z)^{- 1} z_{0})$
- When the errors $ϵ$ have a normal distribution, a $100 (1 - α) %$ prediction interval for $Y_{0}$ is given by $z_{0}^{T} \hat{β} \mp t_{n - r - 1} (\frac{α}{2}) s^{2} (1 + z_{0}^{T} (Z^{T} Z)^{- 1} z_{0})$
The prediction interval for $Y_{0}$ is wider than the confidence interval for estimating the value of the regression function $E (Y_{0} ∣ z_{0})$ .
- The additional uncertainty in forecasting $Y_{0}$ , which is represented by the extra term $s^{2}$ in the expression $s^{2} (1 + z_{0}^{T} (Z^{T} Z)^{- 1} z_{0})$ , comes from the presence of the unknown error term $ϵ_{0}$ .

StrixTheKiet Notes

Explorer

Multivariate Linear Regression Model

Definition:

Least square estimation:

Inferences about the regression model:

Likelihood ratio tests for regression parameters:

Inferences from the estimated regression function

Graph View

Table of Contents

Backlinks