首页 > 解决方案 > 将不同数据表的两列与字符串部分匹配

问题描述

我有两个大数据表,df1 一列(full.name)

full.name  
brad pitt
shah rukh khan       
salman khan
taylor swift
justin bieber
xyz abc

和具有两列名称和年龄的 df2

name         age
brad         10
shah         15
salman khan  20
taylor       30
justin       25

我想要的输出是

full.name            name          age
brad pitt            brad          10
shah rukh khan       shah          15
salman khan          salman khan   20
taylor swift         taylor        30
justin bieber        justin        25

但是直到现在我才想按字符串匹配列,inner_join()但它适用于那些完全匹配的值,所以我想按字符串匹配

标签: rmerge

解决方案


样本数据

library( data.table )

dt1 <- fread("full.name
brad pitt
             shah rukh khan       
             salman khan
             taylor swift
             justin bieber
             xyz abc", sep = "%")

dt2 <- fread('name,         age
brad,         10
shah,         15
salman khan,  20
taylor,       30
justin,       25')

代码

library( fuzzyjoin )
regex_left_join( dt1, dt2, by = c( full.name = "name" ) )

输出

#         full.name        name age
# 1:      brad pitt        brad  10
# 2: shah rukh khan        shah  15
# 3:    salman khan salman khan  20
# 4:   taylor swift      taylor  30
# 5:  justin bieber      justin  25
# 6:        xyz abc        <NA>  NA

推荐阅读