Fitting Support Vector Machine – Classification

Using Support Vector Machine (SVM) for Classification.

As usual, we need to load the goodies.

##### Load Libraries #####
library(e1071)
library(tidyverse)

##### Load Libraries #####

library(e1071)

library(tidyverse)

Instead of creating a new dataset, we will just use diamonds data from GGPLOT2.

##### Load Data #####
data <- diamonds

1 2	##### Load Data ##### data <- diamonds

We will try to predict the quality of the cut using other nine predictors.

svm() needs the dependent variable to be a factor before putting in the function. So, let’s see.

##### glimpse #####
glimpse(data)

1 2	##### glimpse ##### glimpse(data)

> glimpse(data)
Observations: 500
Variables: 10
$ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.30, 0.23, 0.22, 0.3...
$ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Very Good, Fair, Very ...
$ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I, E, H, J, J, G, I, ...
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, SI1, SI2, SI2, I1, ...
$ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64.0, 62.8, 60.4, 62....
$ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58, 54, 54, 56, 59, 5...
$ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 342, 344, 345, 345, 3...
$ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.25, 3.93, 3.88, 4.3...
$ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.28, 3.90, 3.84, 4.3...
$ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.73, 2.46, 2.33, 2.7...

> glimpse(data)

Observations: 500

Variables: 10

$ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.30, 0.23, 0.22, 0.3...

$ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Very Good, Fair, Very ...

$ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I, E, H, J, J, G, I, ...

$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, SI1, SI2, SI2, I1, ...

$ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64.0, 62.8, 60.4, 62....

$ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58, 54, 54, 56, 59, 5...

$ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 342, 344, 345, 345, 3...

$ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.25, 3.93, 3.88, 4.3...

$ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.28, 3.90, 3.84, 4.3...

$ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.73, 2.46, 2.33, 2.7...

Nice! Cut is already a factor variable. One fewer task.

As my laptop is not that powerful and svm() does take a lot of time to work on, let’s use only first 500 observations.

##### Subset #####
data <- data[1:500,]

1 2	##### Subset ##### data <- data[1:500,]

Next, we fit the data.

##### Fit #####
svm_1 <- svm(cut ~ . ,
             data = data
             #kernel = 'radial',
             #cost = 1,
             #gamma = 1)
             )

summary(svm_1)

##### Fit #####

svm_1 <- svm(cut ~ . ,

data = data

#kernel = 'radial',

#cost = 1,

#gamma = 1)

)

summary(svm_1)

> summary(svm_1)

Call:
svm(formula = cut ~ ., data = data)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.04761905 

Number of Support Vectors:  427

 ( 107 124 51 124 21 )


Number of Classes:  5 

Levels: 
 Fair Good Very Good Premium Ideal

> summary(svm_1)

Call:

svm(formula = cut ~ ., data = data)

Parameters:

SVM-Type: C-classification

SVM-Kernel: radial

cost: 1

gamma: 0.04761905

Number of Support Vectors: 427

( 107 124 51 124 21 )

Number of Classes: 5

Levels:

Fair Good Very Good Premium Ideal

Kernel, cost, and gamma are three critical parameters in the svm() . If we don’t specify a value for cost and gamma, the function will use the default value which is only a specific value. As for kernel, a simple EDA should be able to give a glimpse into data distribution. The chance is it is not linear. Haha. But it’s worth to check. If we don’t specify, as usual, the model will just pick one for us.

With svm() we can only specify only one value. But the creator did provide a handy function: tune() .

##### Tune #####
set.seed(1)
svm_tune <- tune(svm, cut ~ ., data = data,
                 kernel  = "radial", 
                 ranges =list(cost=seq(0.1,1,0.1),
                              gamma=seq(0.1,1,0.1)))

##### Tune #####

set.seed(1)

svm_tune <- tune(svm, cut ~ ., data = data,

kernel = "radial",

ranges =list(cost=seq(0.1,1,0.1),

gamma=seq(0.1,1,0.1)))

We need to use set.seed() because tune() will perform 10-fold cross validation by default.

summary(svm_tune)

1	summary(svm_tune)

> summary(svm_tune)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:
 cost gamma
    1   0.3

- best performance: 0.418 

- Detailed performance results:
    cost gamma error dispersion
1    0.1   0.1 0.498 0.05452828
2    0.2   0.1 0.464 0.04299871
3    0.3   0.1 0.434 0.04221637
4    0.4   0.1 0.436 0.04402020

> summary(svm_tune)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation

- best parameters:

cost gamma

1 0.3

- best performance: 0.418

- Detailed performance results:

cost gamma error dispersion

1 0.1 0.1 0.498 0.05452828

2 0.2 0.1 0.464 0.04299871

3 0.3 0.1 0.434 0.04221637

4 0.4 0.1 0.436 0.04402020

Yep, the function will calculate all possible combinations of cost and gamma. It’s too troublesome to eyeball the best model. We can do this.

svm_tune$best.model

1	svm_tune$best.model

> svm_tune$best.model

Call:
best.tune(method = svm, train.x = cut ~ ., data = data, ranges = list(cost = seq(0.1, 
    1, 0.1), gamma = seq(0.1, 1, 0.1)), kernel = "radial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.3 

Number of Support Vectors:  434

> svm_tune$best.model

Call:

best.tune(method = svm, train.x = cut ~ ., data = data, ranges = list(cost = seq(0.1,

1, 0.1), gamma = seq(0.1, 1, 0.1)), kernel = "radial")

Parameters:

SVM-Type: C-classification

SVM-Kernel: radial

cost: 1

gamma: 0.3

Number of Support Vectors: 434

When it comes to prediction, unfortunately, predict() cannot deal with the class “tune.”

##### Predict #####
predict(svm_tune, data)

1 2	##### Predict ##### predict(svm_tune, data)

> predict(svm_tune, data)
Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "tune"

> predict(svm_tune, data)

Error in UseMethod("predict") :

no applicable method for 'predict' applied to an object of class "tune"

Therefore we have to manually create a new svm() model with the best values from the tune, then plug the model in predict() .

##### Refit and Predict#####
svm_2 <- svm(cut ~ ., data = data, cost = 1, gamma = 0.3)
summary(predict(svm_2, data))

##### Refit and Predict#####

svm_2 <- svm(cut ~ ., data = data, cost = 1, gamma = 0.3)

summary(predict(svm_2, data))

> summary(predict(svm_2, data))
     Fair      Good Very Good   Premium     Ideal 
       24        31       119       157       169

> summary(predict(svm_2, data))

Fair Good Very Good Premium Ideal

24 31 119 157 169

Now it works!