We have seen what people said about Airbnb (link) and Microsoft (link, and link.) Now is the time for Google. I scraped a total of 3,320 reviews from 2015 to 2017. So, let’s take a look what Googlers said. But as usual, we need to load the goodies.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
##### Load Goodies ##### library(tidyverse) library(stringr) library(tidytext) library(ggraph) library(igraph) ##### Create Theme for GGPLOT2 ##### theme_moma <- function(base_size = 12, base_family = "Helvetica") { theme( plot.background = element_rect(fill = "#F7F6ED"), legend.key = element_rect(fill = "#F7F6ED"), legend.background = element_rect(fill = "#F7F6ED"), panel.background = element_rect(fill = "#F7F6ED"), panel.border = element_rect(colour = "black", fill = NA, linetype = "dashed"), panel.grid.minor = element_line(colour = "#7F7F7F", linetype = "dotted"), panel.grid.major = element_line(colour = "#7F7F7F", linetype = "dotted") ) } |
As I use ‘xx’ as the first observation in the initial data frame and some other features, we need to exclude them first.
1 2 3 4 |
##### Processing ##### data2 <- data[2:6] data2 <- data2 %>% filter(Summary != "xx") |
Next is date conversion.
1 2 |
##### Date Management ##### data2$Posted_Date <- as.Date(data2$Posted_Date, format = " %b %d, %Y") |
For Google, I didn’t apply any filter in Glassdoor. So there are both domestic (US) and international reviews. As Glassdoor include employment status, title, and location in Title column. We need to split them.
1 2 3 4 5 6 |
##### Separating Employee Type and Locations ##### data2 <- data2%>% separate(col = Title, into = c("Employee_Type","Location"),sep = ' in ') %>% separate(col = Employee_Type, into = c("Employee_Type", "Title"), sep = 'ee - ') data2$Employee_Type <- str_replace_all(data2$Employee_Type, "Employ","Employee") |
Next, we classify the reviews into Domestic and International. It’s very straightforward as the international review will have a country abbreviation in parentheses. So, we can just use “(” to distinguish between the two.
1 2 3 |
##### US or International ##### data2 <- data2 %>% mutate(Location_2 = ifelse(str_detect(Location, "[(]")==TRUE,"International","Domestic")) |
Let’s visualize.
1 2 3 4 5 |
##### Visualize - 1 ##### ggplot(data2, aes(x = Location_2)) + geom_bar() + theme_moma() + geom_text(stat='count',aes(label=..count..), vjust = -0.5) |
Well, why wouldn’t Googlers want to disclose where they work at? 😐 I doubt if someone is going to chase them. Oh well, since NA consists of almost half of the reviews, let’s just ignore the location of the reviews, for now anyway.
So, it’s time to process their comments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
##### Creating New Variables for Processing ##### data2<- data2 %>% mutate(Summary_2 = Summary, Pros_2 = Pros, Cons_2 = Cons) ##### Change to lower ##### data2$Summary_2 <- str_replace_all(data2$Summary_2, "[:alpha:]",tolower) data2$Pros_2 <- str_replace_all(data2$Pros_2, "[:alpha:]",tolower) data2$Cons_2 <- str_replace_all(data2$Cons_2, "[:alpha:]",tolower) ##### Remove Punct ##### data2$Summary_2 <- str_replace_all(data2$Summary_2, "[:punct:]","") data2$Pros_2 <- str_replace_all(data2$Pros_2, "[:punct:]","") data2$Cons_2 <- str_replace_all(data2$Cons_2, "[:punct:]","") ##### Remove Numbers ##### data2$Summary_2 <- str_replace_all(data2$Summary_2, "[:digit:]","") data2$Pros_2 <- str_replace_all(data2$Pros_2, "[:digit:]","") data2$Cons_2 <- str_replace_all(data2$Cons_2, "[:digit:]","") |
Next, many variations of work/life balance. We need to change them to have the same word: ‘wlb.’
1 2 3 4 5 6 7 8 9 10 11 |
##### Work Life Balance & Other Words ##### worklife <- array(c("work life balance", "work-life balance", "work/life balance", "work life", "work&life", "worklife")) for (i in 1:nrow(worklife)){ print(i) for (j in 8:ncol(data2)) { data2[[j]] <- str_replace_all(data2[[j]],worklife[[i]],"wlb") } } |
Alright, let’s count the titles.
1 2 3 4 5 |
##### Title ##### data2 %>% group_by(Title) %>% summarise(count = n()) %>% arrange(desc(count)) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# A tibble: 590 x 2 Title count <chr> <int> 1 Anonymous Employee 1843 2 Software Engineer 233 3 Senior Software Engineer 58 4 Product Manager 37 5 Program Manager 34 6 Account Manager 29 7 Account Strategist 26 8 Staff Software Engineer 26 9 Software Engineer III 21 10 Intern 20 # ... with 580 more rows |
There are 3,321 observations. But Anonymous Employee accounted for 55.5%. Well, I’ll just ignore the review distribution by title then. Seriously, what are these Googlers afraid of? 😐
Okay, then let’s move to creating a bigram chart.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
#### Overall - Pros ##### #Step 1: Unnest data_pros <- data2 %>% select(Pros_2) %>% unnest_tokens(words, Pros_2, token = 'ngrams',n = 2) #Step 2: Separate data_pros_split <- data_pros %>% separate(words, c("from","to",sep = " ")) %>% select(1:2) #Step 3: Remove stopwords data_pros_clean <- data_pros_split %>% filter(!from %in% stop_words$word) %>% filter(!to %in% stop_words$word) #Step 4: Count data_pros_counts <- data_pros_clean %>% count(from, to) pros_bigram <- data_pros_counts %>% filter(n > 15) %>% graph_from_data_frame() arrow_control <- grid::arrow(type = "closed", length = unit(.15, "inches")) Pros_chart <- ggraph(pros_bigram) + geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = arrow_control) + geom_node_point(color = "lightgreen", size = 3) + geom_node_text(aes(label = name), vjust = 1, hjust = 1) + theme_void() + theme( plot.background = element_rect(fill = "#F7F6ED")) + ggtitle("Pros") Pros_chart |
Isn’t that almost about the same as that of Microsoft? Smart Colleague, Competitive Pay. But there is apparently one thing that Microsoft lacks… FREE FOODS!
Let’s move to Cons.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
#### Overall - Cons ##### #Step 1: Unnest data_cons <- data2 %>% select(Cons_2) %>% unnest_tokens(words, Cons_2, token = 'ngrams',n = 2) #Step 2: Separate data_cons_split <- data_cons %>% separate(words, c("from","to",sep = " ")) %>% select(1:2) #Step 3: Remove stopwords data_cons_clean <- data_cons_split %>% filter(!from %in% stop_words$word) %>% filter(!to %in% stop_words$word) #Step 4: Count data_cons_counts <- data_cons_clean %>% count(from, to) cons_bigram <- data_cons_counts %>% filter(n > 10) %>% graph_from_data_frame() arrow_control <- grid::arrow(type = "closed", length = unit(.15, "inches")) cons_chart <- ggraph(cons_bigram) + geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = arrow_control) + geom_node_point(color = "lightgreen", size = 3) + geom_node_text(aes(label = name), vjust = 1, hjust = 1) + theme_void() + theme( plot.background = element_rect(fill = "#F7F6ED")) + ggtitle("cons") cons_chart |
Yeah, that looks familiar: management, review process, politics, and growth. But one interesting comment is “reallyno cons.” Wow. Even Microsoft doesn’t have that compliment. Mountain View also showed up; I’d think that’s because the commute can be brutal for those living in San Francisco.
If you could scroll up to the Pros chart, you could see that “wlb” shows up both in Pros and Cons. Hm, well, let’s take a look at those who complain about wlb. Could the lack of work-life balance come from a specific title?
1 2 3 4 5 6 |
##### wlb in Cons ##### data2 %>% filter(str_detect(Cons_2,'wlb') == TRUE) %>% group_by(Title) %>% summarise(count = n()) %>% arrange(desc(count)) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# A tibble: 43 x 2 Title count <chr> <int> 1 Anonymous Employee 71 2 Software Engineer 11 3 Administrative 2 4 Analyst 2 5 Associate Account Strategist 2 6 Associate Product Marketing Manager 2 7 Contracts Manager 2 8 Senior Analyst 2 9 Senior Software Engineer 2 10 Software Engineer III 2 # ... with 33 more rows |
Well, “Anonymous Employee” doesn’t help.
TL;DR Googlers are afraid of telling their title. Not sure what they are so scared of. The Pros and Cons are quite similar to those of Microsoft. Pros: smart colleagues, excellent pay, and free foods. Cons: management, long commute, politics.