We can do both the Dplyr way and Base R way.
First, let’s create a sample dataset.
1 2 3 4 5 6 7 8 9 10 11 |
##### Load Libraries ##### library(dplyr) ##### Creating Sample Data ##### Name <- c("Abbey", "Brian", "Connie", "Dan", "Ethan") GPA <- c(3.0, 2.8, 2.1, NA, NA) Grade <- c(3, 4, NA, NA, 6) State <- c("AL", NA, NA, NA, NA) Year <- c(2013, 2014, 2015, 2016, 2017) data <- data.frame(Name, GPA, Grade, State, Year) |
Case 1: Selecting One Column by Index Number
1 2 3 4 5 |
#Base data[,2] #Dplyr select(data, 2) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
> select(data, "GPA") #Dplyr GPA 1 3.0 2 2.8 3 2.1 4 NA 5 NA > data[,2] #Base [1] 3.0 2.8 2.1 NA NA > select(data, 2) #Dplyr GPA 1 3.0 2 2.8 3 2.1 4 NA 5 NA |
Case 2: Selecting One Column by Name
We can also simply spell out the name.
1 2 3 4 5 |
#Base data[,"GPA"] #Base #Dplyr select(data, "GPA") |
1 2 3 4 5 6 7 8 9 10 11 12 |
> #Base > data[,"GPA"] [1] 3.0 2.8 2.1 NA NA > > #Dplyr > select(data, "GPA") GPA 1 3.0 2 2.8 3 2.1 4 NA 5 NA |
Case 3: Select Multiple Columns
1 2 3 4 5 |
#Base data[,1:3] #Dplyr select(data, 1:3) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
> #Base > data[,1:3] Name GPA Grade 1 Abbey 3.0 3 2 Brian 2.8 4 3 Connie 2.1 NA 4 Dan NA NA 5 Ethan NA 6 > > #Dplyr > select(data, 1:3) Name GPA Grade 1 Abbey 3.0 3 2 Brian 2.8 4 3 Connie 2.1 NA 4 Dan NA NA 5 Ethan NA 6 |
Another way to select only the first 3 columns is to get rid of the fourth and fifth columns.
1 2 3 4 5 |
#Base data[,-c(4:5)] #Dplyr select(data, -(4:5)) #Dplyr |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
> #Base > data[,-c(4:5)] Name GPA Grade 1 Abbey 3.0 3 2 Brian 2.8 4 3 Connie 2.1 NA 4 Dan NA NA 5 Ethan NA 6 > > #Dplyr > select(data, -(4:5)) Name GPA Grade 1 Abbey 3.0 3 2 Brian 2.8 4 3 Connie 2.1 NA 4 Dan NA NA 5 Ethan NA 6 |
Or we can just specify the names too.
1 2 3 4 5 |
#Base data[, c("Name", "GPA", "Grade")] #Dplyr select(data, "Name", "GPA", "Grade") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
> #Base > data[, c("Name", "GPA", "Grade")] Name GPA Grade 1 Abbey 3.0 3 2 Brian 2.8 4 3 Connie 2.1 NA 4 Dan NA NA 5 Ethan NA 6 > > #Dplyr > select(data, "Name", "GPA", "Grade") Name GPA Grade 1 Abbey 3.0 3 2 Brian 2.8 4 3 Connie 2.1 NA 4 Dan NA NA 5 Ethan NA 6 |