Let’s load two cool libraries and create sample texts.
1 2 3 4 5 6 |
##### Load Libraries ##### library(stringr) ##### Create Sample Data Frame ##### sample <- data.frame(text = c("Hello", "Good Morning", "that's cool.", "thats cool.")) (sample$text <- as.character(sample$text)) |
1 2 |
> (sample$text <- as.character(sample$text)) [1] "Hello" "Good Morning" "that's cool." "thats cool." |
Number of Characters
1 |
str_count(sample$text) |
1 2 |
> str_count(sample$text) [1] 5 12 12 11 |
Number of Words
1 |
str_count(sample$text, '\\w+') |
1 2 |
> str_count(sample$text, '\\w+') [1] 1 2 3 2 |
How str_count handles the ‘ might get a misleading result. I believe str_count will replace ‘ with space which results in extra word. So, to correctly count the word, we probably need to use str_replace_all before using str_count
Strigi Function
Stringi has cool function: stri_stats_latex .
1 |
stri_stats_latex(sample[1,]) |
1 2 3 |
> stri_stats_latex(sample[1,]) CharsWord CharsCmdEnvir CharsWhite Words Cmds Envirs 5 0 0 1 0 0 |
With just one function, you get all. The only disadvantage of stri_stats_latex is we need to specify what observation to count. If we were to…
1 |
stri_stats_latex(sample$text) |
1 2 3 |
> stri_stats_latex(sample$text) CharsWord CharsCmdEnvir CharsWhite Words Cmds Envirs 34 0 6 8 0 0 |
Yep… not too good. So, another way to circumvent is to use apply .
1 |
apply(sample, 1, stri_stats_latex) |
1 2 3 4 5 6 7 8 |
> apply(sample, 1, stri_stats_latex) [,1] [,2] [,3] [,4] CharsWord 5 11 9 9 CharsCmdEnvir 0 0 0 0 CharsWhite 0 1 3 2 Words 1 2 3 2 Cmds 0 0 0 0 Envirs 0 0 0 0 |