Menu

As usual, we need to load the goodies.

Instead of creating a new dataset, we will just use diamonds data from GGPLOT2.

We will try to predict the quality of the cut using other nine predictors.

svm()  needs the dependent variable to be a factor before putting in the function. So, let’s see.

Nice! Cut is already a factor variable. One fewer task.

As my laptop is not that powerful and svm()  does take a lot of time to work on, let’s use only first 500 observations.

Next, we fit the data.

Kernel, cost, and gamma are three critical parameters in the svm() . If we don’t specify a value for cost and gamma, the function will use the default value which is only a specific value. As for kernel, a simple EDA should be able to give a glimpse into data distribution. The chance is it is not linear. Haha. But it’s worth to check. If we don’t specify, as usual, the model will just pick one for us.

With svm()  we can only specify only one value. But the creator did provide a handy function: tune() .

We need to use set.seed()  because tune()  will perform 10-fold cross validation by default.

Yep, the function will calculate all possible combinations of cost and gamma. It’s too troublesome to eyeball the best model. We can do this.

When it comes to prediction, unfortunately, predict() cannot deal with the class “tune.”

Therefore we have to manually create a new svm()  model with the best values from the tune, then plug the model in predict() .

Now it works!