Multiple Regression Calculation

Manually calculate multiple regression.

Assume that we have the following dataset.

data <- data.frame(weight = c(160,170,180,190,200),
                   Fat_Consumption = c(38,16,40,21,29),
                   Daily_Workout = c(43,33,37,44,35))

data

data <- data.frame(weight = c(160,170,180,190,200),

Fat_Consumption = c(38,16,40,21,29),

Daily_Workout = c(43,33,37,44,35))

data

> data
  weight Fat_Consumption Daily_Workout
1    160              38            43
2    170              16            33
3    180              40            37
4    190              21            44
5    200              29            35

> data

weight Fat_Consumption Daily_Workout

1 160 38 43

2 170 16 33

3 180 40 37

4 190 21 44

5 200 29 35

Using lm() to calculate the regression coefficient is very easy.

lm(weight ~ ., data)

1	lm(weight ~ ., data)

> lm(weight ~ ., data)

Call:
lm(formula = weight ~ ., data = data)

Coefficients:
    (Intercept)  Fat_Consumption    Daily_Workout  
       202.2095          -0.2531          -0.3886

> lm(weight ~ ., data)

Call:

lm(formula = weight ~ ., data = data)

Coefficients:

(Intercept) Fat_Consumption Daily_Workout

202.2095 -0.2531 -0.3886

Then we plug the coefficients in the formula.

$$\hat{y}=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+…+\beta_{p}x_{p}$$

But how are $\beta_{0}$, $\beta_{f}$, and $\beta_{w}$ calculated? $\bar{x}_{p}$, $\rho$, and $\sigma_{p}$ are the building blocks.

##### Stats #####
mean_weight <- mean(data$weight)
mean_fat <- mean(data$Fat_Consumption)
mean_workout <- mean(data$Daily_Workout)
 
sd_weight <- sd(data$weight)
sd_fat <- sd(data$Fat_Consumption)
sd_workout <- sd(data$Daily_Workout)
 
cor_x1_x2 <- cor(data$Fat_Consumption, data$Daily_Workout)
cor_x1_y <- cor(data$Fat_Consumption, data$weight)
cor_x2_y <- cor(data$Daily_Workout, data$weight)

##### Stats #####

mean_weight <- mean(data$weight)

mean_fat <- mean(data$Fat_Consumption)

mean_workout <- mean(data$Daily_Workout)

sd_weight <- sd(data$weight)

sd_fat <- sd(data$Fat_Consumption)

sd_workout <- sd(data$Daily_Workout)

cor_x1_x2 <- cor(data$Fat_Consumption, data$Daily_Workout)

cor_x1_y <- cor(data$Fat_Consumption, data$weight)

cor_x2_y <- cor(data$Daily_Workout, data$weight)

Now we are ready to calculate the intercepts.
$$\beta_{f}=\frac{\rho_{y,1}-\rho_{y,2}\cdot\rho_{1,2}}{1-\rho_{1,2}^2}=-0.253$$

$$\beta_{w}=\frac{\rho_{y,2}-\rho_{y,1}\cdot\rho_{1,2}}{1-\rho_{1,2}^2}=-0.389$$

Then we can calculate $\beta_{0}$

$$\beta_{0}=\bar{x}-(\beta_{f}\cdot\bar{x}_{f})-(\beta_{w}\cdot\bar{x}_{w})=202.21$$

Now we are ready to predict.

$$\hat{y}=202.21+-0.253x_{1}-0.389x_{2}$$

TL;DR Manually fitting multiple regression is bearable. But if predictors are more than two. No need to reinvent the wheel, let’s just use lm() . 🙂