首页 > 解决方案 > 根据单列中的唯一值和非唯一值创建表

问题描述

给定具有以下结构的 CSV,

id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales

我希望根据postCode列中的值是否唯一创建两个表。其他列的值对我来说无关紧要,但必须将它们复制到新表中。

我的最终数据应该是这样的,有一个基于 unique postCodes 的表:

id, postCode, someThing, someOtherThing
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales

另一个postCode值重复的地方

id, postCode, someThing, someOtherThing    
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep

到目前为止,我可以加载数据,但我不确定下一步:

myData <- read.csv("path/to/my.csv",
  header=TRUE,
  sep=",",
  stringsAsFactors=FALSE
)

R 新手,非常感谢您的帮助。

dput格式的数据。

df <-
structure(list(id = 1:4, postCode = structure(c(1L, 1L, 2L, 3L
), .Label = c("E3 4AX", "E8 KAK", "VH3 2K2"), class = "factor"), 
someThing = structure(c(1L, 2L, 4L, 3L), .Label = c(" cats", 
" elephants", " humans", " mice"), class = "factor"), 
someOtherThing = structure(c(1L, 3L, 2L, 4L), 
.Label = c(" dogs", " rats", " sheep", " whales               "
), class = "factor")), class = "data.frame", 
row.names = c(NA, -4L))

标签: r

解决方案


如果 df 是您的 data.frame 的名称,它可以形成为:

df <- read.table(header = T, text = "
id, postCode, someThing, someOtherThing
1, E3 4AX, cats, dogs
2, E3 4AX, elephants, sheep
3, E8 KAK, mice, rats
4, VH3 2K2, humans, whales
       ")

然后可以使用n()收集每个观察次数的函数找到唯一性和重复性grouped variable。然后,

uniques = df %>%
  group_by(postCode) %>%
  filter(n() == 1)

dupes = df %>%
  group_by(postCode) %>%
  filter(n() > 1)

不清楚为什么有人编辑了这个回复。也许他们讨厌tribbles


推荐阅读