A couple of decades ago, we can only know about a company after we start working or from rumors. Those certainly are not the best ways. Let’s use R to systematically harvest reviews about companies in Glassdoor and visualize with its great NLP and graphic packages

As I mainly use R, I used Hadley Wickham’s rvest package. First, we need to obtain a URL and figure out how to move between pages. I found that we can only change the page we view by modifying a page number (much easier than I thought.)

We need to obtain a URL which will be specific to each company in Glassdoor. The URL is separated into three parts. The first part is unique to a company. The page number is next. Then, the third part depends on what filter you set. For example, if you only want reviews from full-time employees,
filter.employmentStatus=REGULAR  will be included in the URL.

The next thing to do before scraping is to figure out the HTML tag or CSS tag. I used SelectorGadget (Creator’s Web) As I am only interested in a few attributes, figuring it out was very simple. Now we are ready to scrape reviews with these codes:

I added lag time in the code so that it wouldn’t be too rough on the server. So, it may take some time to get the reviews.