As usual, let’s create sample datasets
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
##### Load Goodies ##### library(dplyr) ##### Create Sample Data ##### Name <- as.character(c("Abbey", "Brian", "Connie", "Dan", "Ethan")) GPA <- c(3.0, 2.8, 2.1, 4.0, NA) Grade <- c(3, 4, NA, NA, 6) State <- c("AL", NA, NA, NA, "CA") data <- data.frame(Name, GPA, Grade, State, stringsAsFactors = FALSE) Name <- as.character(c("Abbey", "Brian", "Connie", "Dan", "Ethan")) Nombre <- as.character(c("Abbey", "Brian", "Connie", "Dan", "Ethan")) Dream_Vacation <- as.character(c("Maldives", "Paris", "London", "Tokyo", "Hawaii")) Dream_College <- as.character(c("Harvard", "Stanford", "MIT", "NYU", "UCLA")) dream <- data.frame(Name, Nombre, Dream_Vacation, Dream_College, stringsAsFactors = FALSE |
Case 1: Joining data and dream
We can easily use
left_join() in dplyr.
1 |
left_join(data, dream, by = "Name") |
1 2 3 4 5 6 7 |
> left_join(data, dream, by = "Name") Name GPA Grade State Nombre Dream_Vacation Dream_College 1 Abbey 3.0 3 AL Abbey Maldives Harvard 2 Brian 2.8 4 <NA> Brian Paris Stanford 3 Connie 2.1 NA <NA> Connie London MIT 4 Dan 4.0 NA <NA> Dan Tokyo NYU 5 Ethan NA 6 CA Ethan Hawaii UCLA |
Case 2: But I only want Dream_Vacation
Left_join() can do about the same as
Vlookup() . But the former will bring every column in the other data set. All we need to do is to subset in the
left_join() statement.
1 |
left_join(data, dream[,c("Name", "Dream_Vacation")], by = "Name") |
1 2 3 4 5 6 7 |
> left_join(data, dream[,c("Name", "Dream_Vacation")], by = "Name") Name GPA Grade State Dream_Vacation 1 Abbey 3.0 3 AL Maldives 2 Brian 2.8 4 <NA> Paris 3 Connie 2.1 NA <NA> London 4 Dan 4.0 NA <NA> Tokyo 5 Ethan NA 6 CA Hawaii |
Case 3: What if column names are different…
Suppose the other dataset only has “Nombre” column, using “Name” wouldn’t work. But making it works is very easy.
1 |
left_join(data, dream[,c("Nombre", "Dream_Vacation")], by = c("Name" = "Nombre")) |
1 2 3 4 5 6 7 |
> left_join(data, dream[,c("Nombre", "Dream_Vacation")], by = c("Name" = "Nombre")) Name GPA Grade State Dream_Vacation 1 Abbey 3.0 3 AL Maldives 2 Brian 2.8 4 <NA> Paris 3 Connie 2.1 NA <NA> London 4 Dan 4.0 NA <NA> Tokyo 5 Ethan NA 6 CA Hawaii |
Case 4: I only need Name, Grade and Dream Vacation
We need to subset both data sets.
1 2 3 |
left_join(data[,c("Name", "Grade")], dream[,c("Nombre", "Dream_Vacation")], by = c("Name" = "Nombre")) |
1 2 3 4 5 6 7 8 9 |
> left_join(data[,c("Name", "Grade")], + dream[,c("Nombre", "Dream_Vacation")], + by = c("Name" = "Nombre")) Name Grade Dream_Vacation 1 Abbey 3 Maldives 2 Brian 4 Paris 3 Connie NA London 4 Dan NA Tokyo 5 Ethan 6 Hawaii |