4.14: 系数方差

1 系数方差矩阵

我们将 \(\boldsymbol{V}_{\widehat{\beta}} \stackrel{\text { def }}{=} \operatorname{var}[\widehat{\beta} \mid \boldsymbol{X}]\) 定义为回归系数估计量的条件协方差矩阵。

我们现在推导出它的形式。

1.1 随机干扰项方差

\(n \times 1\) 回归误差 \(\boldsymbol{e}\) 的条件协方差矩阵是 \(n \times n\) 矩阵

\[ \operatorname{var}[\boldsymbol{e} \mid \boldsymbol{X}]=\mathbb{E}\left[\boldsymbol{e} \boldsymbol{e}^{\prime} \mid \boldsymbol{X}\right] \stackrel{\text { def }}{=} \boldsymbol{D} . \]

\(\boldsymbol{D}\)\(i^{t h}\) 对角线元素是

\[ \mathbb{E}\left[e_{i}^{2} \mid \boldsymbol{X}\right]=\mathbb{E}\left[e_{i}^{2} \mid X_{i}\right]=\sigma_{i}^{2} \]

\(\boldsymbol{D}\)\(i j^{t h}\) 非对角线元素是

\[ \mathbb{E}\left[e_{i} e_{j} \mid \boldsymbol{X}\right]=\mathbb{E}\left(e_{i} \mid X_{i}\right) \mathbb{E}\left[e_{j} \mid X_{j}\right]=0 \]

因此 \(\boldsymbol{D}\) 是一个对角矩阵,其中 \(i^{t h}\) 对角元素为 \(\sigma_{i}^{2}\)

\[ \boldsymbol{D}=\operatorname{diag}\left(\sigma_{1}^{2}, \ldots, \sigma_{n}^{2}\right)=\left(\begin{array}{cccc} \sigma_{1}^{2} & 0 & \cdots & 0 \\ 0 & \sigma_{2}^{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{n}^{2} \end{array}\right) \]

1.2 回归系数方差

命题 1 (方差变换) 对于任意 \(n \times r\) 矩阵 \(\boldsymbol{A}=\boldsymbol{A}(\boldsymbol{X})\)

\[ \operatorname{var}\left[\boldsymbol{A}^{\prime} \boldsymbol{Y} \mid \boldsymbol{X}\right]=\operatorname{var}\left[\boldsymbol{A}^{\prime} \boldsymbol{e} \mid \boldsymbol{X}\right]=\boldsymbol{A}^{\prime} \boldsymbol{D} \boldsymbol{A} . \]

因此,可以把最小二乘估计系数写为

\[ \begin{aligned} \boldsymbol{\widehat{\beta}{}} &= \left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \boldsymbol{X}\boldsymbol{Y}\\ &= \left( \boldsymbol{X}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\right)^{\prime}\boldsymbol{Y}\\ &=\boldsymbol{A}^{\prime} \boldsymbol{Y} \end{aligned} \]

其中:\(\boldsymbol{A}=\boldsymbol{X}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\)

因此

\[ \boldsymbol{V}_{\widehat{\beta}}=\operatorname{var}[\widehat{\beta} \mid \boldsymbol{X}]=\boldsymbol{A}^{\prime} \boldsymbol{D} \boldsymbol{A}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} . \]

提示

其中,下面的等式将会在计算中比较有用:

\[ \boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}=\sum_{i=1}^{n} X_{i} X_{i}^{\prime} \sigma_{i}^{2}, \]

本质上,它就是对方阵\(\boldsymbol{X}^{\prime} \boldsymbol{X}\) 进行了一种加权计算。

2 同方差下的协方差矩阵

\[ \mathbb{E}\left[\widehat{\boldsymbol{V}}_{\widehat{\beta}}^0 \mid \boldsymbol{X}\right]=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \mathbb{E}\left[s^2 \mid \boldsymbol{X}\right]=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \sigma^2=\boldsymbol{V}_{\widehat{\beta}} \]

3 异方差下的协方差矩阵

3.1 理论表达

系数协方差矩阵的理论表达式为:

\[ \boldsymbol{V}_{\widehat{\beta}}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} . \]

其中,矩阵\(\boldsymbol{D}\) 可以写为:

\[ \boldsymbol{D}=\operatorname{diag}\left(\sigma_1^2, \ldots, \sigma_n^2\right)=\mathbb{E}\left[\boldsymbol{e} \boldsymbol{e}^{\prime} \mid \boldsymbol{X}\right]=\mathbb{E}[\widetilde{\boldsymbol{D}} \mid \boldsymbol{X}] \]

其中 \(\widetilde{\boldsymbol{D}}=\operatorname{diag}\left(e_1^2, \ldots, e_n^2\right)\).

矩阵\(\widetilde{\boldsymbol{D}}\) 是一个对 \(\boldsymbol{D}\)的无偏估计量。因此,如果残差平方\(e_i^2\) 可以观测得到,那么就可以得到\(V_{\widehat{\beta}}\)的无偏估计量:

\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widetilde{\beta}}^{\text {ideal }} & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\boldsymbol{X}^{\prime} \widetilde{\boldsymbol{D}} \boldsymbol{X}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} e_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]

论证 (无偏性证明). \[ \begin{aligned} \mathbb{E}\left[\widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\text {ideal }} \mid \boldsymbol{X}\right] & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \mathbb{E}\left[e_i^2 \mid \boldsymbol{X}\right]\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \sigma_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\boldsymbol{X}^{\prime} \boldsymbol{D} \boldsymbol{X}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}=\boldsymbol{V}_{\widehat{\beta}} \end{aligned} \]

3.2 HC0公式

\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 0}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \widehat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]

\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 0}&=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} {e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\\ &=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \otimes \left[ \left(\boldsymbol{X}\odot(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}}) \right)^{\prime} \otimes \left(\boldsymbol{X}\odot(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}}) \right) \right] \otimes \left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\\ \end{aligned} \]

  • \(\otimes\)代表矩阵相乘(也即我们正常理解的矩阵相乘,要求矩阵维度相容)

  • \(\odot\)代表矩阵元素相乘(也即两个矩阵的对应元素直接相乘)

  • \(\boldsymbol{\overrightarrow{1}_{1*k}}\)表示行向量(1行k列)。

3.3 HC1公式

\[ \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 1}=\left(\frac{n}{n-k}\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \hat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \]

3.4 HC2公式

HC2公式使用标准化残差\(\overline{e}_i\)

\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 2} & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \bar{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n\left(1-h_{i i}\right)^{-1} X_i X_i^{\prime} \widehat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]

其中:

\[ \begin{aligned} \bar{e}_i&=\left(1-h_{i i}\right)^{-1 / 2} \widehat{e}_i \\ \overline{\boldsymbol{e}}&=\left(\bar{e}_1, \ldots, \bar{e}_n\right)^{\prime}=\boldsymbol{M}^{* 1 / 2} \boldsymbol{M e} \end{aligned} \]

3.5 HC3公式

HC3公式使用删一法(leave-one-out)预测残差\(\tilde{e}_i\)

\[ \begin{aligned} \widehat{\boldsymbol{V}}_{\widehat{\beta}}^{\mathrm{HC} 3} & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n X_i X_i^{\prime} \tilde{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \\ & =\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}\left(\sum_{i=1}^n\left(1-h_{i i}\right)^{-2} X_i X_i^{\prime} \widehat{e}_i^2\right)\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \end{aligned} \]

其中预测残差:

\[ \widetilde{e}_{i}=\left(1-h_{i i}\right)^{-1} \widehat{e}_{i} \]

4 数值复现

4.1 案例背景

蒙特卡洛模拟数据生成过程(DGP):

\[ \begin{aligned} Y_i = \beta_1 +\beta_2 X_i +u_i\\ \beta_1 = 1.2; \quad \beta_2 =0.5\\ n = 5 \end{aligned} \]

beta1 <- 1.2
beta2 <- 0.5
x <- seq(1,3,by = 0.5)
set.seed(231)
u <- rnorm(n = 5)
y <- beta1 +beta2*x +u
dt <- tibble(x = x, y = y)

模拟数据结果如下:

表 1: 模拟数据
x y
1.0 1.1669
1.5 -0.3617
2.0 1.2458
2.5 2.7125
3.0 2.2266

4.2 OLS回归

smry.lm <- summary(lm(y~x,data = dt))
smry.lm

Call:
lm(formula = y ~ x, data = dt)

Residuals:
      1       2       3       4       5 
 0.8076 -1.2403 -0.1522  0.7951 -0.2101 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.6794     1.3169  -0.516    0.642
x             1.0387     0.6208   1.673    0.193

Residual standard error: 0.9815 on 3 degrees of freedom
Multiple R-squared:  0.4828,    Adjusted R-squared:  0.3103 
F-statistic:   2.8 on 1 and 3 DF,  p-value: 0.1929

4.3 同方差情形

X <- matrix(c(rep(1,5),x),ncol = 2)
Y <- matrix(y)

# get estimators
b <- solve(t(X)%*%X)%*%(t(X)%*%Y)

# step 1: get residual
e <- Y-X%*%b

# step 2: get regression variance 
n <- nrow(Y)
k <- ncol(X)
sig2 <- c(t(e) %*% e)/(n-k)

# step 3: coefficients' standard error 
## --- base case: with homoskedastic formula (OLS)
## --- we have done this before!
XX <- solve(t(X)%*%X)
v0 <- XX*sig2
(s0 <- sqrt(diag(v0))) # Homoskedastic formula
[1] 1.3168530 0.6207705

4.4 HC0复现(怀特公式)

## --- base case: HC0
u1 <- X*(e%*%matrix(1,1,k))
v1 <- XX %*% (t(u1)%*%u1) %*% XX # White formula
(s1 <- sqrt(diag(v1)) )
[1] 1.1172076 0.4452449

具体矩阵过程如下:

残差矩阵\(\boldsymbol{e}\)

e
           [,1]
[1,]  0.8075928
[2,] -1.2403359
[3,] -0.1522368
[4,]  0.7951100
[5,] -0.2101301

复制残差 \(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}}\)

e%*%matrix(1,1,k)
           [,1]       [,2]
[1,]  0.8075928  0.8075928
[2,] -1.2403359 -1.2403359
[3,] -0.1522368 -0.1522368
[4,]  0.7951100  0.7951100
[5,] -0.2101301 -0.2101301

矩阵元素相乘:\(\boldsymbol{X}\odot(\boldsymbol{e\otimes\overrightarrow{1}_{1*k}})\)

(u1 <- X*(e%*%matrix(1,1,k)))
           [,1]       [,2]
[1,]  0.8075928  0.8075928
[2,] -1.2403359 -1.8605038
[3,] -0.1522368 -0.3044736
[4,]  0.7951100  1.9877749
[5,] -0.2101301 -0.6303903

\(\boldsymbol{X'\widetilde{D}X}\)

t(u1)%*%(u1)
         [,1]     [,2]
[1,] 2.890170 4.719172
[2,] 4.719172 8.555026

方差协方差矩阵:\(\boldsymbol{(X'X)^{-1}(X'\widetilde{D}X)(X'X)^{-1}}\)

XX %*%(t(u1)%*%(u1)) %*%XX
           [,1]       [,2]
[1,]  1.2481528 -0.4813796
[2,] -0.4813796  0.1982431

4.5 HC1复现

## --- base case: HC1
a <- n/(n-k)
v1a <- a*v1
(s1a <- sqrt(diag(v1a))) # HC1 formula
[1] 1.4423088 0.5748087

4.6 HC2复现

leverage <- rowSums(X*(X%*%solve(t(X)%*%X)))
u2 <- X*((e/sqrt(1-leverage))%*%matrix(1,1,k))
v2 <- XX %*% (t(u2)%*%u2) %*% XX
(s2 <- sqrt(diag(v2))) # HC2 formula
[1] 1.5807382 0.6344923

其中影响力值的计算可以参看03章的节内容影响力值

leverage
[1] 0.6 0.3 0.2 0.3 0.6
注释
  • 影响力值R代码中,后面部分是正常的矩阵运算\(\otimes\),然后再进行矩阵元素相乘运算\(\odot\)

  • u2R代码用了开根号sqrt(1-leverage),而最后t(u2)%*%u2会得到\((1-h_{ii})^{-1}\),从而满足理论公式。

4.7 HC3复现

u3 <- X*((e/(1-leverage))%*%matrix(1,1,k))
v3 <- XX %*% (t(u3)%*%u3) %*% XX
(s3 <- sqrt(diag(v3))) # HC3 formula
[1] 2.3149450 0.9346418

4.8 结果比较

tbl_result <- tibble(
  vars = c("intercept", "x"), 
  s0 = s0, hc0 =s1, hc1 = s1a,
  hc2 = s2, hc3 = s3)

kable(tbl_result, digits = rep(4,7), align = "c")
vars s0 hc0 hc1 hc2 hc3
intercept 1.3169 1.1172 1.4423 1.5807 2.3149
x 0.6208 0.4452 0.5748 0.6345 0.9346