r - 错误:每行输出必须由唯一的键组合标识。密钥共享 2 行:使用 Unnest_token 和 spread 时
问题描述
这是我正在使用的数据框
# A tibble: 268 x 5
Horodateur Gender Age Time Social
<dttm> <chr> <chr> <chr> <chr>
1 2021-04-23 09:59:16 Male [18,24[ 1-5 ho~ Facebook, Instagram, Twitter, Snapcha~
2 2021-04-23 10:11:35 Female [10,18[ 1-5 ho~ Reddit
3 2021-04-23 10:18:24 Male [18,24[ >10 ho~ Facebook, Instagram, Twitter, Linkedi~
4 2021-04-23 10:42:28 Female [18,24[ 5-10 h~ Facebook, Instagram, Twitter, Snapchat
5 2021-04-23 10:42:37 Female [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Snapchat
6 2021-04-23 10:45:35 Female [24,34[ 1-5 ho~ Facebook, Instagram, Twitter, Linkedi~
7 2021-04-23 10:48:09 Male [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Linkedin~
8 2021-04-23 10:49:56 Male [18,24[ 5-10 h~ Facebook, Instagram, Snapchat
9 2021-04-23 10:50:39 Male [24,34[ 0 hours Linkedin, Reddit
10 2021-04-23 10:51:36 Male [18,24[ 5-10 h~ Facebook, Instagram, Twitter, TikTok
# ... with 258 more rows
> str(Survey[1:5])
tibble [268 x 5] (S3: tbl_df/tbl/data.frame)
$ Horodateur: POSIXct[1:268], format: "2021-04-23 09:59:16" "2021-04-23 10:11:35" ...
$ Gender : chr [1:268] "Male" "Female" "Male" "Female" ...
$ Age : chr [1:268] "[18,24[" "[10,18[" "[18,24[" "[18,24[" ...
$ Time : chr [1:268] "1-5 hours" "1-5 hours" ">10 hours" "5-10 hours" ...
$ Social : chr [1:268] "Facebook, Instagram, Twitter, Snapchat, Reddit, Signal" "Reddit" "Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora" "Facebook, Instagram, Twitter, Snapchat" ...
我正在尝试拆分社交列以获得类似的内容
Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 No No No No No No Yes
2 2 Yes Yes No No Yes No Yes
3 3 No Yes No Yes No Yes No
4 4 No Yes No No Yes No No
5 5 No Yes No Yes Yes Yes Yes
6 6 No Yes No No No No No
7 7 No No Yes Yes No Yes Yes
8 8 No No Yes No No No Yes
9 9 No No Yes No Yes Yes No
10 10 No Yes Yes Yes Yes No Yes
所以写了这段代码
Survey %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Survey, Social, to_lower = F) %>%
spread(Survey, HasAccount, fill = "No")
但我得到了标题中提到的错误
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 440, 441
我认为添加 id=row_number() 会修复该错误,但它没有(当我删除它时,同样的错误仍然存在)。有谁知道如何解决这个问题?
解决方案
原因是存在重复的行。所以,我们可以通过row_number
library(dplyr)
library(tidyr)
library(tidytext)
Survey %>%
mutate(HasAccount = "Yes") %>%
unnest_tokens(Survey, Social, to_lower = FALSE) %>%
group_by(Survey) %>%
mutate(Id= row_number()) %>%
ungroup %>%
spread(Survey, HasAccount, fill = "No")
使用可重现的示例
library(janeaustenr)
d <- tibble(txt = prideprejudice)
d %>%
mutate(HasAccount = "Yes") %>%
unnest_tokens(word, txt) %>%
slice(1:50) %>%
group_by(word) %>%
mutate(Id = row_number()) %>%
ungroup %>%
spread(word, HasAccount, fill = "No")
# A tibble: 6 x 41
Id `1` a acknowledged and austen be by chapter entering feelings first fortune good his however `in` is it
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
2 2 No Yes No No No Yes No No No No No No No No No Yes No No
3 3 No Yes No No No No No No No No No No No No No No No No
4 4 No Yes No No No No No No No No No No No No No No No No
5 5 No Yes No No No No No No No No No No No No No No No No
6 6 No Yes No No No No No No No No No No No No No No No No
# … with 22 more variables: jane <chr>, known <chr>, little <chr>, man <chr>, may <chr>, must <chr>, neighbourhood <chr>, of <chr>,
# on <chr>, or <chr>, possession <chr>, prejudice <chr>, pride <chr>, single <chr>, such <chr>, that <chr>, the <chr>, truth <chr>,
# universally <chr>, views <chr>, want <chr>, wife <chr>
推荐阅读
- database-design - 在 Data Vault 2.0 中,我们可以通过另一个链接连接一个链接和一个集线器吗?
- python - 如何在 discord.py 中创建一个可以打印线性方程图的命令?
- html - 为什么是白线,为什么我的按钮没有响应?
- xml - 如何排除带有命名空间的元素 - XSLT?
- javascript - Jest 无法读取未定义的属性“导入”
- python - 将数据框列的条目与列表匹配并基于匹配创建新列
- testing - Robot Framework:如何避免变量值在执行过程中显示在控制台中
- apache-spark - Databricks ConvertToDelta - Parquet 表到 Delta -“AssertionError:断言失败:发现文件名冲突”
- arrays - Mule 4 - 如何将嵌套数组中具有相同 id 字段的数组合并为一个
- r - 在小标题中查找行最小值和列索引