I’ll implement Gradient Descent algorithm on a linear regression. First, we need to set an objective equation.
Objective Equation
Since the algorithm seeks to minimize error, we need an error equation. I’ll use Sum of Squared Error or SSE as the objective equation.
$$SSE=\sum_{i=1}^N(y_{i}-mx_{i}+\beta)^2$$
Let’s suppose we have this data set:
1 2 |
data <- data.frame(x = (1:3), y = (4:6)) |
Next, we supply the starting points, in this case, it is \(m\) and \(\beta\)
We then can write the first for loop() to calculate the SSE where \(m = 0\) and \(\beta = 0\)
1 2 3 4 5 6 7 8 9 10 11 12 |
for (i in 1:1){ for (i in 1:nrow(data)){ x <- data[i,1] y <- data[i,2] sse <- (y - ((m * x) + b))^2 sum_sse <<- sum_sse + sse if(i == nrow(data)){ temp <<- cbind(b, m, sum_sse) storage <<- rbind(storage, temp) totalsse <<- 0 } } |
The loop will loop through all observations in the data. Then assign the error to \(sse\). Then it will sum up to \(sum sse\) When it is done; it will put in a storage dataset to store \(\beta\),\( m\), and \(sum sse\).
Up until this point, it is the usual \(SSE\) calculation. Now is the bread and butter of gradient descent: Partial Derivatives. Partial derivatives are to calculate the derivative of a function (equation in this case) with more than one variable. So, it works perfectly in our case.
$$\frac{\partial}{\partial m}=\frac{2}{N}\sum_{i=1}^{N}-x_{i}(y_{i}-(mx_{i}+\beta))$$
$$\frac{\partial}{\partial \beta}=\frac{2}{N}\sum_{i=1}^{N}-(y_{i}-(mx_{i}+\beta))$$
Paul’s Online Math Notes website gives a very detailed explanation of Partial Derivative (link).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#Calculating Gradient for (j in 1:steps){ for (i in 1:nrow(data)){ x <- data[i,1] y <- data[i,2] #Partial derivative respect to m dm <- -((2/nrow(data)) * x * (y - (m*x + b))) #Partial derivative respect to beta db <- -((2/nrow(data)) * (y - (m*x + b))) #Update values m_gradient <<- m_gradient + dm b_gradient <<- b_gradient + db } m <- (m_base - (learningrate * m_gradient)) b <- (b_base - (learningrate * b_gradient)) |
First, we calculate the derivatives, then assign the values to gradient values. After that, we will reassign the new value to \(m\) and \(\beta\). Another important variable in the calculation is the learning rate \(\alpha\). The value will control how much we will move down the slope.
After we get the new value of \(m\) and \(\beta\), we then recalculate the \(SSE\) as many times as we want.
So, to put all of that in a function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
gradient_descent <- function(m, b, data, steps, learningrate){ sse <<- 0 sum_sse <<- 0 storage <<- data.frame(b = c(99), m = c(99), sum_sse = c(99)) m_gradient <<- 0 b_gradient <<- 0 dm <<- 0 dx <<- 0 m_base <<- m b_base <<- b for (i in 1:1){ for (i in 1:nrow(data)){ x <- data[i,1] y <- data[i,2] sse <- (y - ((m * x) + b))^2 sum_sse <<- sum_sse + sse if(i == nrow(data)){ temp <<- cbind(b, m, totalerror) storage <<- rbind(storage, temp) sum_sse <<- 0 } } #Calculating Gradient for (j in 1:steps){ for (i in 1:nrow(data)){ x <- data[i,1] y <- data[i,2] #Partial derivative respect to m dm <- -((2/nrow(data)) * x * (y - (m*x + b))) #Partial derivative respect to beta db <- -((2/nrow(data)) * (y - (m*x + b))) #Update values m_gradient <<- m_gradient + dm b_gradient <<- b_gradient + db } m <- (m_base - (learningrate * m_gradient)) b <- (b_base - (learningrate * b_gradient)) for (i in 1:nrow(data)){ x <- data[i,1] y <- data[i,2] sse <- (y - ((m * x) + b))^2 sum_sse <<- sum_sse + sse if(i == nrow(data)){ temp <<- cbind(b, m, totalerror) storage <<- rbind(storage, temp) sum_sse <<- 0 } } } } } |
Let’s give it a try.
1 |
gradient_descent(m = 2, b = 3, steps = 10, data=data, learningrate = 0.0001) |
1 2 |
storage %>% filter(b != 99) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
> storage %>% + filter(b != 99) b m totalerror 1 3.000000 2.000000 14.00000 2 2.999600 1.999067 13.96908 3 2.999200 1.998134 13.93824 4 2.998801 1.997203 13.90746 5 2.998403 1.996273 13.87675 6 2.998005 1.995344 13.84610 7 2.997607 1.994415 13.81553 8 2.997210 1.993488 13.78502 9 2.996813 1.992562 13.75458 10 2.996416 1.991637 13.72421 11 2.996020 1.990713 13.69390 |