beta1 <- 1.2
beta2 <- 0.5
x <- seq(1,3,by = 0.5)
set.seed(231)
u <- rnorm(n = 5)
y <- beta1 +beta2*x +u
dt <- tibble(x = x, y = y)4.14: 系数方差
1 系数方差矩阵
我们将 \(\boldsymbol{V}_{\widehat{\beta}} \stackrel{\text { def }}{=} \operatorname{var}[\widehat{\beta} \mid \boldsymbol{X}]\) 定义为回归系数估计量的条件协方差矩阵。
我们现在推导出它的形式。
1.1 随机干扰项方差
\(n \times 1\) 回归误差 \(\boldsymbol{e}\) 的条件协方差矩阵是 \(n \times n\) 矩阵
\[ \operatorname{var}[\boldsymbol{e} \mid \boldsymbol{X}]=\mathbb{E}\left[\boldsymbol{e} \boldsymbol{e}^{\prime} \mid \boldsymbol{X}\right] \stackrel{\text { def }}{=} \boldsymbol{D} . \]
\(\boldsymbol{D}\) 的 \(i^{t h}\) 对角线元素是
\[ \mathbb{E}\left[e_{i}^{2} \mid \boldsymbol{X}\right]=\mathbb{E}\left[e_{i}^{2} \mid X_{i}\right]=\sigma_{i}^{2} \]
而 \(\boldsymbol{D}\) 的 \(i j^{t h}\) 非对角线元素是
\[ \mathbb{E}\left[e_{i} e_{j} \mid \boldsymbol{X}\right]=\mathbb{E}\left(e_{i} \mid X_{i}\right) \mathbb{E}\left[e_{j} \mid X_{j}\right]=0 \]
因此 \(\boldsymbol{D}\) 是一个对角矩阵,其中 \(i^{t h}\) 对角元素为 \(\sigma_{i}^{2}\):
\[ \boldsymbol{D}=\operatorname{diag}\left(\sigma_{1}^{2}, \ldots, \sigma_{n}^{2}\right)=\left(\begin{array}{cccc} \sigma_{1}^{2} & 0 & \cdots & 0 \\ 0 & \sigma_{2}^{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{n}^{2} \end{array}\right) \]
1.2 回归系数方差
命题 1 (方差变换) 对于任意 \(n \times r\) 矩阵 \(\boldsymbol{A}=\boldsymbol{A}(\boldsymbol{X})\),
\[ \operatorname{var}\left[\boldsymbol{A}^{\prime} \boldsymbol{Y} \mid \boldsymbol{X}\right]=\operatorname{var}\left[\boldsymbol{A}^{\prime} \boldsymbol{e} \mid \boldsymbol{X}\right]=\boldsymbol{A}^{\prime} \boldsymbol{D} \boldsymbol{A} . \]
因此,可以把最小二乘估计系数写为
\[ \begin{aligned} \boldsymbol{\widehat{\beta}{}} &= \left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \boldsymbol{X}\boldsymbol{Y}\\ &= \left( \boldsymbol{X}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\right)^{\prime}\boldsymbol{Y}\\ &=\boldsymbol{A}^{\prime} \boldsymbol{Y} \end{aligned} \]
其中:\(\boldsymbol{A}=\boldsymbol{X}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\) 。
因此
\[ \boldsymbol{V}_{\widehat{\beta}}=\operatorname{var}[\widehat{\beta} \mid \boldsymbol{X}]=\boldsymbol{A}^{\prime} \boldsymbol{D} \boldsymbol{A}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} . \]
其中,下面的等式将会在计算中比较有用:
\[ \boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}=\sum_{i=1}^{n} X_{i} X_{i}^{\prime} \sigma_{i}^{2}, \]
本质上,它就是对方阵\(\boldsymbol{X}^{\prime} \boldsymbol{X}\) 进行了一种加权计算。
2 同方差下的协方差矩阵
\[ \mathbb{E}\left[\widehat{\boldsymbol{V}}_{\widehat{\beta}}^0 \mid \boldsymbol{X}\right]=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \mathbb{E}\left[s^2 \mid \boldsymbol{X}\right]=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \sigma^2=\boldsymbol{V}_{\widehat{\beta}} \]
3 异方差下的协方差矩阵
3.1 理论表达
系数协方差矩阵的理论表达式为:
\[ \boldsymbol{V}_{\widehat{\beta}}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} . \]
其中,矩阵\(\boldsymbol{D}\) 可以写为:
\[ \boldsymbol{D}=\operatorname{diag}\left(\sigma_1^2, \ldots, \sigma_n^2\right)=\mathbb{E}\left[\boldsymbol{e} \boldsymbol{e}^{\prime} \mid \boldsymbol{X}\right]=\mathbb{E}[\widetilde{\boldsymbol{D}} \mid \boldsymbol{X}] \]
其中 \(\widetilde{\boldsymbol{D}}=\operatorname{diag}\left(e_1^2, \ldots, e_n^2\right)\).
矩阵\(\widetilde{\boldsymbol{D}}\) 是一个对 \(\boldsymbol{D}\)的无偏估计量。因此,如果残差平方\(e_i^2\) 可以观测得到,那么就可以得到\(V_{\widehat{\beta}}\)的无偏估计量:
\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widetilde{\beta}}^{\text {ideal }} & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\boldsymbol{X}^{\prime} \widetilde{\boldsymbol{D}} \boldsymbol{X}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} e_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]
论证 (无偏性证明). \[ \begin{aligned} \mathbb{E}\left[\widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\text {ideal }} \mid \boldsymbol{X}\right] & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \mathbb{E}\left[e_i^2 \mid \boldsymbol{X}\right]\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \sigma_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}=\boldsymbol{V}_{\widehat{\beta}} \end{aligned} \]
3.2 HC0公式
\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 0}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \widehat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]
\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 0}&=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} {e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\\ &=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \otimes \left[ \left(\boldsymbol{X}\odot(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}}) \right)^{\prime} \otimes \left(\boldsymbol{X}\odot(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}}) \right) \right] \otimes \left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\\ \end{aligned} \]
\(\otimes\)代表矩阵相乘(也即我们正常理解的矩阵相乘,要求矩阵维度相容)
\(\odot\)代表矩阵元素相乘(也即两个矩阵的对应元素直接相乘)
\(\boldsymbol{\overrightarrow{1}_{1*k}}\)表示行向量(1行k列)。
3.3 HC1公式
\[ \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 1}=\left(\frac{n}{n-k}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \hat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \]
3.4 HC2公式
HC2公式使用标准化残差\(\overline{e}_i\):
\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 2} & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \bar{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n\left(1-h_{i i}\right)^{-1} X_i X_i^{\prime} \widehat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]
其中:
\[ \begin{aligned} \bar{e}_i&=\left(1-h_{i i}\right)^{-1 / 2} \widehat{e}_i \\ \overline{\boldsymbol{e}}&=\left(\bar{e}_1, \ldots, \bar{e}_n\right)^{\prime}=\boldsymbol{M}^{* 1 / 2} \boldsymbol{M e} \end{aligned} \]
3.5 HC3公式
HC3公式使用删一法(leave-one-out)预测残差\(\tilde{e}_i\):
\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 3} & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \tilde{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n\left(1-h_{i i}\right)^{-2} X_i X_i^{\prime} \widehat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]
其中预测残差:
\[ \widetilde{e}_{i}=\left(1-h_{i i}\right)^{-1} \widehat{e}_{i} \]
4 数值复现
4.1 案例背景
蒙特卡洛模拟数据生成过程(DGP):
\[ \begin{aligned} Y_i = \beta_1 +\beta_2 X_i +u_i\\ \beta_1 = 1.2; \quad \beta_2 =0.5\\ n = 5 \end{aligned} \]
模拟数据结果如下:
| x | y |
|---|---|
| 1.0 | 1.1669 |
| 1.5 | -0.3617 |
| 2.0 | 1.2458 |
| 2.5 | 2.7125 |
| 3.0 | 2.2266 |
4.2 OLS回归
smry.lm <- summary(lm(y~x,data = dt))
smry.lm
Call:
lm(formula = y ~ x, data = dt)
Residuals:
1 2 3 4 5
0.8076 -1.2403 -0.1522 0.7951 -0.2101
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6794 1.3169 -0.516 0.642
x 1.0387 0.6208 1.673 0.193
Residual standard error: 0.9815 on 3 degrees of freedom
Multiple R-squared: 0.4828, Adjusted R-squared: 0.3103
F-statistic: 2.8 on 1 and 3 DF, p-value: 0.1929
4.3 同方差情形
X <- matrix(c(rep(1,5),x),ncol = 2)
Y <- matrix(y)
# get estimators
b <- solve(t(X)%*%X)%*%(t(X)%*%Y)
# step 1: get residual
e <- Y-X%*%b
# step 2: get regression variance
n <- nrow(Y)
k <- ncol(X)
sig2 <- c(t(e) %*% e)/(n-k)
# step 3: coefficients' standard error
## --- base case: with homoskedastic formula (OLS)
## --- we have done this before!
XX <- solve(t(X)%*%X)
v0 <- XX*sig2
(s0 <- sqrt(diag(v0))) # Homoskedastic formula[1] 1.3168530 0.6207705
4.4 HC0复现(怀特公式)
## --- base case: HC0
u1 <- X*(e%*%matrix(1,1,k))
v1 <- XX %*% (t(u1)%*%u1) %*% XX # White formula
(s1 <- sqrt(diag(v1)) )[1] 1.1172076 0.4452449
具体矩阵过程如下:
残差矩阵\(\boldsymbol{e}\)
e [,1]
[1,] 0.8075928
[2,] -1.2403359
[3,] -0.1522368
[4,] 0.7951100
[5,] -0.2101301
复制残差 \(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}}\)
e%*%matrix(1,1,k) [,1] [,2]
[1,] 0.8075928 0.8075928
[2,] -1.2403359 -1.2403359
[3,] -0.1522368 -0.1522368
[4,] 0.7951100 0.7951100
[5,] -0.2101301 -0.2101301
矩阵元素相乘:\(\boldsymbol{X}\odot(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}})\)
(u1 <- X*(e%*%matrix(1,1,k))) [,1] [,2]
[1,] 0.8075928 0.8075928
[2,] -1.2403359 -1.8605038
[3,] -0.1522368 -0.3044736
[4,] 0.7951100 1.9877749
[5,] -0.2101301 -0.6303903
\(\boldsymbol{X'\widetilde{D}X}\):
t(u1)%*%(u1) [,1] [,2]
[1,] 2.890170 4.719172
[2,] 4.719172 8.555026
方差协方差矩阵:\(\boldsymbol{(X'X)^{-1}(X'\widetilde{D}X)(X'X)^{-1}}\)
XX %*%(t(u1)%*%(u1)) %*%XX [,1] [,2]
[1,] 1.2481528 -0.4813796
[2,] -0.4813796 0.1982431
4.5 HC1复现
## --- base case: HC1
a <- n/(n-k)
v1a <- a*v1
(s1a <- sqrt(diag(v1a))) # HC1 formula[1] 1.4423088 0.5748087
4.6 HC2复现
leverage <- rowSums(X*(X%*%solve(t(X)%*%X)))
u2 <- X*((e/sqrt(1-leverage))%*%matrix(1,1,k))
v2 <- XX %*% (t(u2)%*%u2) %*% XX
(s2 <- sqrt(diag(v2))) # HC2 formula[1] 1.5807382 0.6344923
其中影响力值的计算可以参看03章的节内容影响力值。
leverage[1] 0.6 0.3 0.2 0.3 0.6
影响力值
R代码中,后面部分是正常的矩阵运算\(\otimes\),然后再进行矩阵元素相乘运算\(\odot\)。u2的R代码用了开根号sqrt(1-leverage),而最后t(u2)%*%u2会得到\((1-h_{ii})^{-1}\),从而满足理论公式。
4.7 HC3复现
u3 <- X*((e/(1-leverage))%*%matrix(1,1,k))
v3 <- XX %*% (t(u3)%*%u3) %*% XX
(s3 <- sqrt(diag(v3))) # HC3 formula[1] 2.3149450 0.9346418
4.8 结果比较
tbl_result <- tibble(
vars = c("intercept", "x"),
s0 = s0, hc0 =s1, hc1 = s1a,
hc2 = s2, hc3 = s3)
kable(tbl_result, digits = rep(4,7), align = "c")| vars | s0 | hc0 | hc1 | hc2 | hc3 |
|---|---|---|---|---|---|
| intercept | 1.3169 | 1.1172 | 1.4423 | 1.5807 | 2.3149 |
| x | 0.6208 | 0.4452 | 0.5748 | 0.6345 | 0.9346 |