最小二乘法斜率分布
Published:
问题
对于线性回归问题
\[y_i = \alpha + \beta x_i + \varepsilon_i\] \[\varepsilon_i \sim \mathcal N(0, \sigma^2)\]我们最小二乘法算出来的直线是
\[y = a + b x\]令 $a = \hat \alpha, b = \hat \beta$。求出 $b$ 的分布。
答案
令 $s$ 为残差的标准差,可以证明:
\[s = \sqrt{\frac 1 {n-2} \sum_{i=1}^n \hat u_i^2}\]可以证明:
\[\text{SE}(b) = \frac s {\sqrt{\sum (x_i - \bar x)^2}}\]AP 书上的版本为:
\[\text{SE}(b) = \frac s {s_x \sqrt{n - 1}}\]可以证明:
\[\boxed{ \frac {b - \beta} {\text{SE}(b)} \sim t_{n-2} }\]证明
把 $\hat \beta$ 用 $\beta$ 表示:
\[\begin{aligned} & y_i - \bar y \\ =& (\alpha + \beta x_i + \varepsilon_i) - (\alpha + \beta \bar x + \bar \varepsilon) \\ =& \alpha + \beta x_i + \varepsilon_i - \alpha - \beta \bar x - \bar \varepsilon \\ =& \beta (x_i - \bar x) + \varepsilon_i - \bar \varepsilon \\ \end{aligned}\] \[\begin{aligned} \hat \beta &= \frac {\sum (x_i - \bar x) (y_i - \bar y)} {\sum (x_i - \bar x)^2} \\ &= \frac {\sum (x_i - \bar x) (\beta (x_i - \bar x) + \varepsilon_i - \bar \varepsilon)} {\sum (x_i - \bar x)^2} \\ &= \beta + \frac {\sum (x_i - \bar x) (\varepsilon_i - \bar \varepsilon)} {\sum (x_i - \bar x)^2} \\ &= \beta + \frac {\sum (x_i - \bar x) \varepsilon_i} {\sum (x_i - \bar x)^2} \\ \end{aligned}\]计算其方差:
\[\begin{aligned} \text{Var}(\hat \beta) &= \text{Var}\left( \frac {\sum (x_i - \bar x) \varepsilon_i} {\sum (x_i - \bar x)^2} \right) \\ &= \frac {\text{Var}(\sum (x_i - \bar x) \varepsilon_i)} {(\sum (x_i - \bar x)^2)^2} \\ &= \frac {\sigma^2 \sum (x_i - \bar x)^2} {(\sum (x_i - \bar x)^2)^2} \\ &= \frac {\sigma^2} {\sum (x_i - \bar x)^2} \\ \end{aligned}\] \[\begin{aligned} \text{SD}(\hat \beta) &= \sqrt{\frac {\sigma^2} {\sum (x_i - \bar x)^2}} \\ &= \frac \sigma {\sum (x_i - \bar x)^2} \\ \end{aligned}\]考虑对 $\sigma$ 进行估计。残差 $u$ 满足两个线性的性质:
\[\begin{cases} \sum u_i = 0 \\ \sum x_i u_i = 0 \\ \end{cases}\]因此,向量 $u$ 并不在整个 $\mathbb R^n$ 上,而是在子空间 $\mathbb R^{n-2}$ 中。因此方差的无偏估计量为
\[\boxed{ s^2 = \frac 1 {n-2} \sum_{i=1}^n \hat u_i^2 }\]因此得到标准误:
\[\boxed{ \text{SE}(\hat \beta) = \frac s {\sqrt{\sum (x_i - \bar x)^2}} }\]接下来我们计算这个量的分布:
\[\frac {\hat \beta - \beta} {\text{SE}(\hat \beta)} = \frac {\frac {\hat \beta - \beta} {\text{SD}(\hat \beta)}} {\frac {\text{SE}(\hat \beta)} {\text{SD}(\hat \beta)}}\]分子显然服从标准正态分布:
\[\frac {\hat \beta - \beta} {\text{SD}(\hat \beta)} \sim \mathcal N(0, 1)\]分母推一下:
\[\begin{aligned} & \frac {\text{SE}(\hat \beta)} {\text{SD}(\hat \beta)} \\ =& \sqrt{\frac {s^2} {\sigma^2}} \\ =& \sqrt{\frac 1 {n-2} \sum \left( \frac {\hat u_i} \sigma \right)^2} \end{aligned}\]所以
\[(n-2) \left( \frac {\text{SE}(\hat \beta)} {\text{SD}(\hat \beta)} \right)^2 \sim \chi^2_{n-2}\]组合起来:
\[\frac {\mathcal N(0, 1)} {\sqrt{\frac {\chi^2_{n-2}} {n-2}}}\]可以证明,这两个随机变量是独立的,证明不会。因此符合 Student $t$ 分布的定义:
\[\boxed{ \frac {\hat \beta - \beta} {\text{SE}(b)} \sim t_{n-2} }\]