首页 > 解决方案 > 错误:每行输出必须由唯一的键组合标识。密钥共享 2 行:使用 Unnest_token 和 spread 时

问题描述

这是我正在使用的数据框

# A tibble: 268 x 5
   Horodateur          Gender Age     Time    Social                                
   <dttm>              <chr>  <chr>   <chr>   <chr>                                 
 1 2021-04-23 09:59:16 Male   [18,24[ 1-5 ho~ Facebook, Instagram, Twitter, Snapcha~
 2 2021-04-23 10:11:35 Female [10,18[ 1-5 ho~ Reddit                                
 3 2021-04-23 10:18:24 Male   [18,24[ >10 ho~ Facebook, Instagram, Twitter, Linkedi~
 4 2021-04-23 10:42:28 Female [18,24[ 5-10 h~ Facebook, Instagram, Twitter, Snapchat
 5 2021-04-23 10:42:37 Female [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Snapchat 
 6 2021-04-23 10:45:35 Female [24,34[ 1-5 ho~ Facebook, Instagram, Twitter, Linkedi~
 7 2021-04-23 10:48:09 Male   [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Linkedin~
 8 2021-04-23 10:49:56 Male   [18,24[ 5-10 h~ Facebook, Instagram, Snapchat         
 9 2021-04-23 10:50:39 Male   [24,34[ 0 hours Linkedin, Reddit                      
10 2021-04-23 10:51:36 Male   [18,24[ 5-10 h~ Facebook, Instagram, Twitter, TikTok  
# ... with 258 more rows
> str(Survey[1:5])
tibble [268 x 5] (S3: tbl_df/tbl/data.frame)
 $ Horodateur: POSIXct[1:268], format: "2021-04-23 09:59:16" "2021-04-23 10:11:35" ...
 $ Gender    : chr [1:268] "Male" "Female" "Male" "Female" ...
 $ Age       : chr [1:268] "[18,24[" "[10,18[" "[18,24[" "[18,24[" ...
 $ Time      : chr [1:268] "1-5 hours" "1-5 hours" ">10 hours" "5-10 hours" ...
 $ Social    : chr [1:268] "Facebook, Instagram, Twitter, Snapchat, Reddit, Signal" "Reddit" "Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora" "Facebook, Instagram, Twitter, Snapchat" ...

我正在尝试拆分社交列以获得类似的内容

      Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
   <int> <chr>    <chr>     <chr>  <chr>  <chr>    <chr>  <chr>  
 1     1 No       No        No     No     No       No     Yes    
 2     2 Yes      Yes       No     No     Yes      No     Yes    
 3     3 No       Yes       No     Yes    No       Yes    No     
 4     4 No       Yes       No     No     Yes      No     No     
 5     5 No       Yes       No     Yes    Yes      Yes    Yes    
 6     6 No       Yes       No     No     No       No     No     
 7     7 No       No        Yes    Yes    No       Yes    Yes    
 8     8 No       No        Yes    No     No       No     Yes    
 9     9 No       No        Yes    No     Yes      Yes    No     
10    10 No       Yes       Yes    Yes    Yes      No     Yes

所以写了这段代码

Survey %>%
  mutate(Id = row_number(), HasAccount = "Yes") %>%
  unnest_tokens(Survey, Social, to_lower = F) %>%
  spread(Survey, HasAccount, fill = "No")

但我得到了标题中提到的错误

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 440, 441

我认为添加 id=row_number() 会修复该错误,但它没有(当我删除它时,同样的错误仍然存​​在)。有谁知道如何解决这个问题?

标签: rdataframedplyrcompiler-errorsdata-manipulation

解决方案


原因是存在重复的行。所以,我们可以通过row_number

library(dplyr)
library(tidyr)
library(tidytext)
Survey %>%
  mutate(HasAccount = "Yes") %>%
  unnest_tokens(Survey, Social, to_lower = FALSE) %>%
  group_by(Survey) %>%
  mutate(Id= row_number()) %>%
  ungroup %>%
  spread(Survey, HasAccount, fill = "No")

使用可重现的示例

library(janeaustenr)
d <- tibble(txt = prideprejudice)
d %>% 
     mutate(HasAccount = "Yes") %>%
    unnest_tokens(word, txt) %>% 
    slice(1:50) %>% 
    group_by(word) %>% 
    mutate(Id = row_number()) %>%
    ungroup %>%
    spread(word, HasAccount, fill = "No")
# A tibble: 6 x 41
     Id `1`   a     acknowledged and   austen be    by    chapter entering feelings first fortune good  his   however `in`  is    it   
  <int> <chr> <chr> <chr>        <chr> <chr>  <chr> <chr> <chr>   <chr>    <chr>    <chr> <chr>   <chr> <chr> <chr>   <chr> <chr> <chr>
1     1 Yes   Yes   Yes          Yes   Yes    Yes   Yes   Yes     Yes      Yes      Yes   Yes     Yes   Yes   Yes     Yes   Yes   Yes  
2     2 No    Yes   No           No    No     Yes   No    No      No       No       No    No      No    No    No      Yes   No    No   
3     3 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
4     4 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
5     5 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
6     6 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
# … with 22 more variables: jane <chr>, known <chr>, little <chr>, man <chr>, may <chr>, must <chr>, neighbourhood <chr>, of <chr>,
#   on <chr>, or <chr>, possession <chr>, prejudice <chr>, pride <chr>, single <chr>, such <chr>, that <chr>, the <chr>, truth <chr>,
#   universally <chr>, views <chr>, want <chr>, wife <chr>

推荐阅读