How we write “work-life balance” can come in many forms. It could be written “wlb,” “work life balance,” “work/life balance.” They are all the same. But the variability could pose some issue in Natural Language Processing. Changing them all back to the normal is the solution. But writing repetitive str_replace_all() may not be the most efficient. For loop() can help.
1 2 3 4 5 6 7 8 9 |
##### Load Library ##### library(stringr) ##### Creating Sample Data ##### Name <- as.character(c("Abbey", "Brian", "Connie", "Dan", "Ethan")) Pros <- as.character(c("WLB", "Work life balance", "work-life balance", "Work/Life Balance", "worklife")) data <- data.frame(Name, Pros, stringsAsFactors = FALSE) data |
1 2 3 4 5 6 7 |
> data Name Pros 1 Abbey WLB 2 Brian Work life balance 3 Connie work-life balance 4 Dan Work/Life Balance 5 Ethan worklife |
Let’s assume that Pros represent comments about his/her company from Glassdoor. The first step is to convert them to lowercase letters.
1 2 |
##### Converting to Lowercase ##### data$Pros2 <- str_replace_all(data$Pros, "[:alpha:]",tolower) |
Next, we create an array storing the variations.
1 2 3 |
##### Creating an Array ##### worklife <- array(c("work life balance", "work-life balance", "work/life balance", "worklife", "wlb")) |
We then create a for loop() to go over the list.
1 2 3 4 5 6 7 8 9 |
##### For Loop ##### for (i in 1:nrow(worklife)){ print(i) for (j in 3:ncol(data)) { data[[j]] <- str_replace_all(data[[j]],worklife[[i]],"wlb") } } data |
1 2 3 4 5 6 7 |
> data Name Pros Pros2 1 Abbey WLB wlb 2 Brian Work life balance wlb 3 Connie work-life balance wlb 4 Dan Work/Life Balance wlb 5 Ethan worklife wlb |