NAs are part of data… like it or not :). This entry will utilize both Dplyr and Base R functionality to remove NA from the dataset.
Suppose you have the following dataset with NAs in 2 columns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
##### Load Libraries ##### library(dplyr) ##### Creating Sample Data ##### Name <- c("Abbey", "Brian", "Connie", "Dan", "Ethan") GPA <- c(3.0, 2.8, 2.1, 4.0, NA) Grade <- c(3, 4, NA, 6, 6) State <- c("AL", NA, "AZ", "AR", "CA") # Merge Them Together # data <- data.frame(Name, GPA, Grade, State) # Let's Take a Look # data |
1 2 3 4 5 6 7 |
> data Name GPA Grade State 1 Abbey 3.0 3 AL 2 Brian 2.8 4 <NA> 3 Connie 2.1 NA AZ 4 Dan 4.0 6 AR 5 Ethan NA 6 CA |
Case 1: Specifying a Column as a Criterion
We will use GPA as a criterion for removing NA. If there is NA in GPA column, we will remove the row.
1 2 |
data %>% filter(is.na(GPA) == FALSE) |
1 2 3 4 5 6 7 8 |
> data %>% + filter(is.na(GPA) == FALSE) Name GPA Grade State 1 Abbey 3.0 3 AL 2 Brian 2.8 4 <NA> 3 Connie 2.1 NA AZ 4 Dan 4.0 6 AR |
The fifth row is now excluded.
Case 2: Specifying Multiple Columns as Criteria
We will use GPA and Grade columns as criteria.
1 2 3 |
data %>% filter(is.na(GPA) == FALSE & is.na(Grade) == FALSE) |
1 2 3 4 5 6 7 8 |
> data %>% + filter(is.na(GPA) == FALSE & + is.na(Grade) == FALSE) Name GPA Grade State 1 Abbey 3.0 3 AL 2 Brian 2.8 4 <NA> 3 Dan 4.0 6 AR |
The fourth and fifth rows are excluded.
Case 3: Remove all NAs from the dataset
Specifying the name in filter works just fine, what if we have 90 columns and each column has NAs? It will be too troublesome to type all 90 columns.
1 |
na.omit(data) |
1 2 3 4 5 |
> na.omit(data) Name GPA Grade State 1 Abbey 3 3 AL 4 Dan 4 6 AR |