r - 跨多列在行内比较删除不匹配并创建新行
问题描述
我正在尝试计算他相同的地址并按行分组。我相当接近,但在特定地址之间的列之间存在细微差别。目的是从行中删除任何不匹配的地址,并将它们作为新行添加到 df. 街道号或街区号之间通常存在差异。我已经从代码广告中提取了这些数字,我试图找到那些不匹配的数字,删除它们并创建一个新行并适当地更改计数。计数更改可以在之后进行,只需检查行中的非缺失。
该数据集实际上有 5000 行,一行最多 50 个建筑物。这是一个示例。
df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
bldg2 = c("27 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
bldg3 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA),
bldg5 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
bldg1strnum = c("26",NA, "11"),
bldg2strnum = c("27",NA, "11"),
bldg3strnum = c("26",NA, "11"),
bldg4strnum = c("26",NA, "11"),
bldg5strnum = c("26",NA, "11"),
bldg1blck = c(NA,"8", NA),
bldg2blck = c(NA,"8", NA),
bldg3blck = c(NA,"6", NA),
bldg4blck = c(NA,"8", NA),
bldg5blck = c(NA,"6", NA),
count = (5,5,4))
我正在考虑使用dplyr
and across
withlength(unique)
但不知道如何正确运行它,尤其是如何将mutate
其转换为新行的长格式。
我喜欢的结果如下所示。(突变后不需要街道号码和名称
df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district", "27 this street, big district","block6, fancy estate, small district"),
bldg2 = c(NA, "block8, fancy estate, small district", "11 normal lane, district",NA,"block6, fancy estate, small district"),
bldg3 = c("26 this street, big district",NA, "11 normal lane, district", NA, NA),
bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA,NA,NA),
bldg5 = c("26 this street, big district",NA, "11 normal lane, district",NA,NA),
count = ("4","3","4","1","2"))
解决方案
这是你想要的:
df %>%
select(bldg1, bldg2, bldg3, bldg4, bldg5) %>%
pivot_longer(
cols = everything()
) %>%
arrange(value) %>%
add_count(value)
输出:
name value n
<chr> <chr> <int>
1 bldg1 11 normal lane, district 4
2 bldg2 11 normal lane, district 4
3 bldg3 11 normal lane, district 4
4 bldg5 11 normal lane, district 4
5 bldg1 26 this street, big district 4
6 bldg3 26 this street, big district 4
7 bldg4 26 this street, big district 4
8 bldg5 26 this street, big district 4
9 bldg2 27 this street, big district 1
10 bldg3 block6, fancy estate, small district 2
11 bldg5 block6, fancy estate, small district 2
12 bldg1 block8, fancy estate, small district 3
13 bldg2 block8, fancy estate, small district 3
14 bldg4 block8, fancy estate, small district 3
15 bldg4 NA 1
推荐阅读
- openstack - 我在ovs中配置了两个几乎一样的arp流,但是一个可以工作,一个不工作,我想知道原因?谁能告诉我?谢谢大家!
- memory - 字长和内存地址
- javascript - 图片需要按特定顺序点击
- typescript - Typescript - 如何创建具有多种类型的数组
- python - 升级 pip 后 pip install x 不再工作?
- java - 尝试从文本字段中获取文本时的 NPE?
- powershell - 使用 CSV 文件将环境变量扩展为 Powershell 脚本
- python - 如何为seaborn plot设置相等的轴范围?
- azure - 天蓝色数据工厂中的加入活动和查找有什么区别
- google-cloud-platform - 我应该输入已经存在的群组电子邮件地址吗?