r - 基于部分字符串匹配合并来自另一个数据帧的值,而匹配列的顺序不同
问题描述
我想df2
通过df1
匹配df1$District_name
和合并一列df2$Districts
。df1$District_name
但是和中的字符值df2$Districts
的顺序不同,并且df1
和df2
的长度也不同。值不完全匹配。df1
的行数多于df2
,因此这些额外区域名称的对应值应为零。
df1=data.frame(State_name=c("Maharashtra","Andhra Pradesh","Bihar","Bihar","West Bengal","Gujarat","Gujarat","Assam"),
District_name=c("Nashik","Chittoor","Madhepura","Kishanganj","Howrah","Gandhinagar","Ahmadabad","Sivasagar"),
Value1=c(5,3,6,4,4,3,2,4))
df2=data.frame(Districts=c("Nashik","Chitoor","Kishanganj","Madhepur","Sibhasagar","Ahmadabad"),
FinanceIndex=c(0.20975,0.12187,0.37155,0.66128,0.10918,0.54730))
# df1
State_name District_name Value1
1 Maharashtra Nashik 5
2 Andhra Pradesh Chittoor 3
3 Bihar Madhepura 6
4 Bihar Kishanganj 4
5 West Bengal Howrah 4
6 Gujarat Gandhinagar 3
7 Gujarat Ahmadabad 2
8 Assam Sivasagar 4
# df2
Districts FinanceIndex
1 Nashik 0.20975
2 Chitoor 0.12187
3 Kishanganj 0.37155
4 Madhepur 0.66128
5 Sibhasagar 0.10918
6 Ahmadabad 0.54730
我使用了 match 函数,但由于拼写差异,我将它们中的大多数设为零值。
index<-match(df1$District_name, df2$Districts)
df1$finindex=df2$FinanceIndex[index]
df1$finindex[is.na(df1$finindex]=0
对于字符串匹配,我发现这个函数可以匹配相似的拼音词:
library(RecordLinkage)
soundex('Nellore')==soundex('Vellore')
#FALSE
输出应该是:
# df1
State_name District_name Value1 finindex
1 Maharashtra Nashik 5 0.20975
2 Andhra Pradesh Chittoor 3 0.12187
3 Bihar Madhepura 6 0.66128
4 Bihar Kishanganj 4 0.37155
5 West Bengal Howrah 4 0.00000
6 Gujarat Gandhinagar 3 0.00000
7 Gujarat Ahmadabad 2 0.54730
8 Assam Sivasagar 4 0.10918
有什么办法可以一起使用这两个功能来解决问题?或者有什么其他方法可以解决问题?
解决方案
一种选择是进行部分匹配stringddist
library(fuzzyjoin)
stringdist_left_join(df1, df2, by = c("District_name" = "Districts")) %>%
select(-Districts)
# State_name District_name Value1 FinanceIndex
#1 Maharashtra Nashik 5 0.20975
#2 Andhra Pradesh Chittoor 3 0.12187
#3 Bihar Madhepura 6 0.66128
#4 Bihar Kishanganj 4 0.37155
#5 West Bengal Howrah 4 NA
#6 Gujarat Gandhinagar 3 NA
#7 Gujarat Ahmadabad 2 0.54730
#8 Assam Sivasagar 4 0.10918
推荐阅读
- graphql - 如何从 JWT 身份验证中排除 Quarkus Graphql-UI?
- swift - Swift - 计算属性与数据库操作
- python - 是否可以发出命令 !messages @user
- sql-server - 解析包含多列和子查询的查询返回超过 1 个值
- python - 需要帮助理解 python 中的布尔值
- java - 如何创建这样的滑动按钮?
- c# - C# WPF | 找不到资源“mainwindow.xaml”
- android - 如何在 Firebase Crashlytics 控制台中过滤用户 ID 的日志事件和崩溃?
- swift - 防止在swift 5中粘贴到文本字段中
- c# - 如何为 sl5 目标框架构建 SDK 风格的项目?