最小二乘法斜率分布

2 minute read

Published:

问题

对于线性回归问题

\[y_i = \alpha + \beta x_i + \varepsilon_i\] \[\varepsilon_i \sim \mathcal N(0, \sigma^2)\]

我们最小二乘法算出来的直线是

\[y = a + b x\]

令 $a = \hat \alpha, b = \hat \beta$。求出 $b$ 的分布。

答案

令 $s$ 为残差的标准差,可以证明:

\[s = \sqrt{\frac 1 {n-2} \sum_{i=1}^n \hat u_i^2}\]

可以证明:

\[\text{SE}(b) = \frac s {\sqrt{\sum (x_i - \bar x)^2}}\]

AP 书上的版本为:

\[\text{SE}(b) = \frac s {s_x \sqrt{n - 1}}\]

可以证明:

\[\boxed{ \frac {b - \beta} {\text{SE}(b)} \sim t_{n-2} }\]

证明

把 $\hat \beta$ 用 $\beta$ 表示:

\[\begin{aligned} & y_i - \bar y \\ =& (\alpha + \beta x_i + \varepsilon_i) - (\alpha + \beta \bar x + \bar \varepsilon) \\ =& \alpha + \beta x_i + \varepsilon_i - \alpha - \beta \bar x - \bar \varepsilon \\ =& \beta (x_i - \bar x) + \varepsilon_i - \bar \varepsilon \\ \end{aligned}\] \[\begin{aligned} \hat \beta &= \frac {\sum (x_i - \bar x) (y_i - \bar y)} {\sum (x_i - \bar x)^2} \\ &= \frac {\sum (x_i - \bar x) (\beta (x_i - \bar x) + \varepsilon_i - \bar \varepsilon)} {\sum (x_i - \bar x)^2} \\ &= \beta + \frac {\sum (x_i - \bar x) (\varepsilon_i - \bar \varepsilon)} {\sum (x_i - \bar x)^2} \\ &= \beta + \frac {\sum (x_i - \bar x) \varepsilon_i} {\sum (x_i - \bar x)^2} \\ \end{aligned}\]

计算其方差:

\[\begin{aligned} \text{Var}(\hat \beta) &= \text{Var}\left( \frac {\sum (x_i - \bar x) \varepsilon_i} {\sum (x_i - \bar x)^2} \right) \\ &= \frac {\text{Var}(\sum (x_i - \bar x) \varepsilon_i)} {(\sum (x_i - \bar x)^2)^2} \\ &= \frac {\sigma^2 \sum (x_i - \bar x)^2} {(\sum (x_i - \bar x)^2)^2} \\ &= \frac {\sigma^2} {\sum (x_i - \bar x)^2} \\ \end{aligned}\] \[\begin{aligned} \text{SD}(\hat \beta) &= \sqrt{\frac {\sigma^2} {\sum (x_i - \bar x)^2}} \\ &= \frac \sigma {\sum (x_i - \bar x)^2} \\ \end{aligned}\]

考虑对 $\sigma$ 进行估计。残差 $u$ 满足两个线性的性质:

\[\begin{cases} \sum u_i = 0 \\ \sum x_i u_i = 0 \\ \end{cases}\]

因此,向量 $u$ 并不在整个 $\mathbb R^n$ 上,而是在子空间 $\mathbb R^{n-2}$ 中。因此方差的无偏估计量为

\[\boxed{ s^2 = \frac 1 {n-2} \sum_{i=1}^n \hat u_i^2 }\]

因此得到标准误:

\[\boxed{ \text{SE}(\hat \beta) = \frac s {\sqrt{\sum (x_i - \bar x)^2}} }\]

接下来我们计算这个量的分布:

\[\frac {\hat \beta - \beta} {\text{SE}(\hat \beta)} = \frac {\frac {\hat \beta - \beta} {\text{SD}(\hat \beta)}} {\frac {\text{SE}(\hat \beta)} {\text{SD}(\hat \beta)}}\]

分子显然服从标准正态分布:

\[\frac {\hat \beta - \beta} {\text{SD}(\hat \beta)} \sim \mathcal N(0, 1)\]

分母推一下:

\[\begin{aligned} & \frac {\text{SE}(\hat \beta)} {\text{SD}(\hat \beta)} \\ =& \sqrt{\frac {s^2} {\sigma^2}} \\ =& \sqrt{\frac 1 {n-2} \sum \left( \frac {\hat u_i} \sigma \right)^2} \end{aligned}\]

所以

\[(n-2) \left( \frac {\text{SE}(\hat \beta)} {\text{SD}(\hat \beta)} \right)^2 \sim \chi^2_{n-2}\]

组合起来:

\[\frac {\mathcal N(0, 1)} {\sqrt{\frac {\chi^2_{n-2}} {n-2}}}\]

可以证明,这两个随机变量是独立的,证明不会。因此符合 Student $t$ 分布的定义:

\[\boxed{ \frac {\hat \beta - \beta} {\text{SE}(b)} \sim t_{n-2} }\]