Here is just a trick note to demonstrate how to split the column when you use the
mutate function from the
dplyr package in R.
All the ways are referred to in this discussion in Stackoverflow. I keep a record of this due to the convenience for next reference.
First of all, I show one wrong way that I’ve done before. Given you have a dummy data below, and would like to split and get the first half of the string with
library(tidyverse) data <- tibble( label = c("a_1", "b_2", "c_3", "d_4", "e_5") )
As per my past experience, I got used to splitting the label column by
str_split(label, "_")[]. But that is unable to give the correct output where the values are all “a”. You can see below or try it by yourself.
data %>% mutate(sublabel = str_split(label, "_")[]) # A tibble: 5 × 2 label sublabel <chr> <chr> 1 a_1 a 2 b_2 a 3 c_3 a 4 d_4 a 5 e_5 a
Obviously you can see that’s definitely wrong. The correct way you can use has been listed below and I summarize them from that article in Stackoverflow.
simplify = Targument that can return the data frame instead of a list, so that I can use
[,1]to extract the first half one.
data %>% mutate(sublabel = str_split(label, "_", simplify = T)[,1])
separate()function instead of
str_split()through a very clever way to avoid the error.
data %>% separate(label, c("sublabel1", "sublabel2"))
Similar to the first one, but use a more straight and explicit way to extract the first half one with the
map_chr()function that can apply a function to each element of a list. So if I want to select the first one in one list, just using
data %>% mutate(sublabel = str_split(label, "_") %>% map_chr(., 1))
This is a brief post, and I hope it will be a reminder for me when I forget something.