Assume that we have the following dataset.
1 2 3 4 5 |
data <- data.frame(weight = c(160,170,180,190,200), Fat_Consumption = c(38,16,40,21,29), Daily_Workout = c(43,33,37,44,35)) data |
1 2 3 4 5 6 7 |
> data weight Fat_Consumption Daily_Workout 1 160 38 43 2 170 16 33 3 180 40 37 4 190 21 44 5 200 29 35 |
Using lm() to calculate the regression coefficient is very easy.
1 |
lm(weight ~ ., data) |
1 2 3 4 5 6 7 8 |
> lm(weight ~ ., data) Call: lm(formula = weight ~ ., data = data) Coefficients: (Intercept) Fat_Consumption Daily_Workout 202.2095 -0.2531 -0.3886 |
Then we plug the coefficients in the formula.
$$\hat{y}=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+…+\beta_{p}x_{p}$$
But how are \(\beta_{0}\), \(\beta_{f}\), and \(\beta_{w}\) calculated? \(\bar{x}_{p}\), \(\rho\), and \(\sigma_{p}\) are the building blocks.
1 2 3 4 5 6 7 8 9 10 11 12 |
##### Stats ##### mean_weight <- mean(data$weight) mean_fat <- mean(data$Fat_Consumption) mean_workout <- mean(data$Daily_Workout) sd_weight <- sd(data$weight) sd_fat <- sd(data$Fat_Consumption) sd_workout <- sd(data$Daily_Workout) cor_x1_x2 <- cor(data$Fat_Consumption, data$Daily_Workout) cor_x1_y <- cor(data$Fat_Consumption, data$weight) cor_x2_y <- cor(data$Daily_Workout, data$weight) |
Now we are ready to calculate the intercepts.
$$\beta_{f}=\frac{\rho_{y,1}-\rho_{y,2}\cdot\rho_{1,2}}{1-\rho_{1,2}^2}=-0.253$$
$$\beta_{w}=\frac{\rho_{y,2}-\rho_{y,1}\cdot\rho_{1,2}}{1-\rho_{1,2}^2}=-0.389$$
Then we can calculate \(\beta_{0}\)
$$\beta_{0}=\bar{x}-(\beta_{f}\cdot\bar{x}_{f})-(\beta_{w}\cdot\bar{x}_{w})=202.21$$
Now we are ready to predict.
$$\hat{y}=202.21+-0.253x_{1}-0.389x_{2}$$
TL;DR Manually fitting multiple regression is bearable. But if predictors are more than two. No need to reinvent the wheel, let’s just use
lm() . 🙂