Fitting Support Vector Machine – Regression

Using Support Vector Machine (SVM) for Regression.

SVM is not only applicable to classification but also to regression. The steps are mostly the same as in the case of classification. We just need very slight modification in the syntax.

##### Load Goodies #####
library(e1071)
library(tidyverse)

##### Load Goodies #####

library(e1071)

library(tidyverse)

Again, we will use the diamonds dataset as we did in the classification.

##### Load Data #####
data <- diamonds

##### glimpse #####
glimpse(data)

##### Subset #####
data <- data[1:500,]

##### Load Data #####

data <- diamonds

##### glimpse #####

glimpse(data)

##### Subset #####

data <- data[1:500,]

If we’d like to use the default settings, we can use svm() .

##### Fit #####
svm_1 <- svm(price ~ . ,
             data = data
             #kernel = 'radial',
             #cost = 1,
             #gamma = 1,
             #epsilon = 1)
)

summary(svm_1)

##### Fit #####

svm_1 <- svm(price ~ . ,

data = data

#kernel = 'radial',

#cost = 1,

#gamma = 1,

#epsilon = 1)

)

summary(svm_1)

> summary(svm_1)

Call:
svm(formula = price ~ ., data = data)


Parameters:
   SVM-Type:  eps-regression 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.04166667 
    epsilon:  0.1 


Number of Support Vectors:  103

> summary(svm_1)

Call:

svm(formula = price ~ ., data = data)

Parameters:

SVM-Type: eps-regression

SVM-Kernel: radial

cost: 1

gamma: 0.04166667

epsilon: 0.1

Number of Support Vectors: 103

But if we would like to run the combination of the variables, we then can use tune() as we did in classification. The only small difference is that we also need to add an epsilon value as it is used in the regression calculation.

##### Tune #####
set.seed(1)
svm_tune <- tune(svm, price ~ ., data = data,
                 kernel  = "radial", 
                 ranges =list(cost=seq(0.1,1,0.1),
                              gamma=seq(0.1,1,0.1),
                              epsilon=seq(0.1,1,0.1)))

summary(svm_tune)

##### Tune #####

set.seed(1)

svm_tune <- tune(svm, price ~ ., data = data,

kernel = "radial",

ranges =list(cost=seq(0.1,1,0.1),

gamma=seq(0.1,1,0.1),

epsilon=seq(0.1,1,0.1)))

summary(svm_tune)

> summary(svm_tune)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:
 cost gamma epsilon
    1   0.1     0.1

- best performance: 13669.51 

- Detailed performance results:
     cost gamma epsilon      error dispersion
1     0.1   0.1     0.1   32058.04  14952.548
2     0.2   0.1     0.1   21044.66   9624.365
3     0.3   0.1     0.1   17604.88   8021.423
4     0.4   0.1     0.1   16243.15   7311.850

> summary(svm_tune)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation

- best parameters:

cost gamma epsilon

1 0.1 0.1

- best performance: 13669.51

- Detailed performance results:

cost gamma epsilon error dispersion

1 0.1 0.1 0.1 32058.04 14952.548

2 0.2 0.1 0.1 21044.66 9624.365

3 0.3 0.1 0.1 17604.88 8021.423

4 0.4 0.1 0.1 16243.15 7311.850

svm_tune$best.model

1	svm_tune$best.model

> svm_tune$best.model

Call:
best.tune(method = svm, train.x = price ~ ., data = data, ranges = list(cost = seq(0.1, 
    1, 0.1), gamma = seq(0.1, 1, 0.1), epsilon = seq(0.1, 1, 0.1)), kernel = "radial")


Parameters:
   SVM-Type:  eps-regression 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.1 
    epsilon:  0.1 


Number of Support Vectors:  97

> svm_tune$best.model

Call:

best.tune(method = svm, train.x = price ~ ., data = data, ranges = list(cost = seq(0.1,

1, 0.1), gamma = seq(0.1, 1, 0.1), epsilon = seq(0.1, 1, 0.1)), kernel = "radial")

Parameters:

SVM-Type: eps-regression

SVM-Kernel: radial

cost: 1

gamma: 0.1

epsilon: 0.1

Number of Support Vectors: 97

Next, we need to fit a separate model from svm() as predict() is not applicable to the tune class.

##### Refit #####
svm_2 <- svm(price ~ ., data = data, cost = 1, gamma = 0.1, epsilon = 0.1, kernel = 'radial')

summary(predict(svm_2, data))

##### Refit #####

svm_2 <- svm(price ~ ., data = data, cost = 1, gamma = 0.1, epsilon = 0.1, kernel = 'radial')

summary(predict(svm_2, data))

> summary(predict(svm_2, data))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    277    2661    2749    2225    2799    2963

> summary(predict(svm_2, data))

Min. 1st Qu. Median Mean 3rd Qu. Max.

277 2661 2749 2225 2799 2963