首页 > 解决方案 > Merging two list columns based on a condition

问题描述

I have data which looks like:

                           extractedNames1                 extractedNames2
1                                                                         
2                                                                 synopsys
3                                                                         
4                                                                   somero
5   cbiz medical management professionals,    cbiz, cbiz, cbiz, cbiz, cbiz
6                                                                         
7                                                                         
8                                                johnson, johnson, johnson
9                                                                  iridium
10       skillsoft,  skillsoft,  skillsoft skillsoft, skillsoft, skillsoft

It has two columns which contain lists. I want to merge the two columns into a single column.

First I would like to only keep the unique values. So cbiz medical management professionals would be a unique value in column extractedNames1 but cbiz, cbiz, cbiz, cbiz, cbiz would be collapsed down to cbiz in column extractedNames2.

Second I would like to merge based on a condition. If there is no result in extractedNames1 then take the unique value from extractedNames2. If there is a result in both columns then just keep the result from extractedNames1. (So I want to merge the two columns but only when there is an empty space in the column extractedNames1.)

Expected output:

extractedNamesFina

l
1
2   synopsys
3
4   somero
5   cbiz medical management professions,
6
7
8   Johnson
9   iridium
10  skillsoft

Data

structure(list(extractedNames1 = list(character(0), character(0), 
    character(0), character(0), " cbiz medical management professionals,", 
    character(0), character(0), character(0), character(0), c(" skillsoft", 
    " skillsoft", " skillsoft")), extractedNames2 = list(character(0), 
    "synopsys", character(0), "somero", c("cbiz", "cbiz", "cbiz", 
    "cbiz", "cbiz"), character(0), character(0), c("johnson", 
    "johnson", "johnson"), "iridium", c("skillsoft", "skillsoft", 
    "skillsoft"))), row.names = c(NA, -10L), class = "data.frame")

标签: r

解决方案


You can use the map() family in purrr.

library(purrr)
df2 <- as.data.frame(matrix(nrow = nrow(df)))
df2[[1]] <- pmap(map(df, map, unique), ~ if(length(.x)) .x else .y)
df2

#                                         V1
# 1                                         
# 2                                 synopsys
# 3                                         
# 4                                   somero
# 5   cbiz medical management professionals,
# 6                                         
# 7                                         
# 8                                  johnson
# 9                                  iridium
# 10                               skillsoft

If you want to insert a list into a data.frame, you need to create a data.frame with the same length as the list. A more convenient way is to use tibble(), which can take a list as input.

library(tibble)
tibble(new = pmap(map(df, map, unique), ~ if(length(.x)) .x else .y))

# # A tibble: 10 x 1
#    new      
#    <list>   
#  1 <chr [0]>
#  2 <chr [1]>
#  3 <chr [0]>
#  4 <chr [1]>
#  5 <chr [1]>
#  6 <chr [0]>
#  7 <chr [0]>
#  8 <chr [1]>
#  9 <chr [1]>
# 10 <chr [1]>

or

as_tibble_col(pmap(map(df, map, unique), ~ if(length(.x)) .x else .y), column_name = "new")

推荐阅读