We will use a mtcars dataset for demonstration. Let’s try fitting a full model by including every predictor.
1 2 3 4 |
###### Fit - 1 ##### base_model <- lm(hp ~ ., data = mtcars) summary(base_model) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 79.0484 184.5041 0.428 0.67270 mpg -2.0631 2.0906 -0.987 0.33496 cyl 8.2037 10.0861 0.813 0.42513 disp 0.4390 0.1492 2.942 0.00778 ** drat -4.6185 16.0829 -0.287 0.77680 wt -27.6600 19.2704 -1.435 0.16591 qsec -1.7844 7.3639 -0.242 0.81089 vs 25.8129 19.8512 1.300 0.20758 am 9.4863 20.7599 0.457 0.65240 gear 7.2164 14.6160 0.494 0.62662 carb 18.7487 7.0288 2.667 0.01441 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 25.97 on 21 degrees of freedom Multiple R-squared: 0.9028, Adjusted R-squared: 0.8565 F-statistic: 19.5 on 10 and 21 DF, p-value: 1.898e-08 |
We can see that predictor disp is significant at \(\alpha = 0.01\). \(Adj-r^2\) is 0.8565. So, let’s try creating a nested model in which we exclude disp.
1 2 3 4 |
##### Fit - 2 ##### nested_model <- lm(hp ~ . -disp, data = mtcars) summary(nested_model) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 108.034 213.915 0.505 0.6186 mpg -1.711 2.423 -0.706 0.4876 cyl 21.127 10.542 2.004 0.0575 . drat -1.059 18.620 -0.057 0.9551 wt 14.697 14.874 0.988 0.3339 qsec -10.564 7.817 -1.351 0.1903 vs 28.943 23.015 1.258 0.2217 am 8.491 24.100 0.352 0.7280 gear 13.370 16.796 0.796 0.4345 carb 7.812 6.926 1.128 0.2715 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 30.15 on 22 degrees of freedom Multiple R-squared: 0.8627, Adjusted R-squared: 0.8066 F-statistic: 15.36 on 9 and 22 DF, p-value: 1.47e-07 |
We can see that \(Adj-r^2\) in the second model is lower than that of the first. It is not surprising as disp is a significant predictor.
But then, how can we be certain that the first model is a better model than the second? We can use anova() to compare them.
1 2 |
##### Comparison ##### anova(base_model, nested_model, test = "Chisq") |
1 2 3 4 5 6 7 8 9 10 11 |
> anova(base_model, nested_model, test = "Chisq") Analysis of Variance Table Model 1: hp ~ mpg + cyl + disp + drat + wt + qsec + vs + am + gear + carb Model 2: hp ~ (mpg + cyl + disp + drat + wt + qsec + vs + am + gear + carb) - disp Res.Df RSS Df Sum of Sq Pr(>Chi) 1 21 14165 2 22 20004 -1 -5839.6 0.003257 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |
We can see that RSS in the base model is much less than that of the second model. Also, we can see that the second model is significantly different from the first model at \(\alpha = 0.05\). So, yep, the base_model is better.