首页 > 解决方案 > 跨两列创建组键

问题描述

我正在尝试解决以下问题,但我发现很难解释。我想根据两列(颜色和字母)之间的链接分配一个增量值。

Colours <- c("Green","Red","Green","Green","Blue","Red","Brown")
Letters <- c("X","C","Y","A","C","T","P")
df <- data.frame(Colours,Letters)
df

    Colours Letters
1   Green       X
2     Red       C
3   Green       Y
4   Green       A
5    Blue       C
6     Red       T
7   Brown       P

我将为 Group 分配一个值,以便所有相同的颜色以及共享相同字母的任何其他颜色都在同一个组中。例如,鉴于与字母 C 的共享链接,第 2 组包括红色和蓝色。

Group <- c(1,2,1,1,2,2,3)
df <- data.frame(df,Group)
df
    Colours Letters Group
1   Green       X     1
2     Red       C     2
3   Green       Y     1
4   Green       A     1
5    Blue       C     2
6     Red       T     2
7   Brown       P     3

如果添加了额外的行,颜色 = 绿色,字母 = C,则组列将更改为下面。所有绿色将与共享相同字母(红色的情况下为 C)的任何其他颜色(例如红色)组合在一起。此外,任何与红色共享字母的颜色也将被添加到与绿色相同的组中(蓝色就是这种情况,它与红色共享字母 C)。

  Colours Letters Group
1   Green       X     1
2     Red       C     1
3   Green       Y     1
4   Green       A     1
5    Blue       C     1
6     Red       T     1
7   Brown       P     2
8   Green       C     1

任何人都可以帮忙吗?

标签: r

解决方案


正如上面的@Frank 所说,您正在描述一个图形问题,因为您希望您的组标签反映连接的组件——共享一个字母的颜色。通过将列转换为图形对象,您可以找出单独的组件是什么并将它们作为组返回:

Colours <- c("Green","Red","Green","Green","Blue","Red","Brown")
Letters <- c("X","C","Y","A","C","T","P")
df <- data.frame(Colours,Letters)

Group <- c(1,2,1,1,2,2,3)
df <- data.frame(df,Group)

# load the igraph package for working with graphs
library(igraph)
adj.mat <- table(df$Colours, df$Letters) %*% t(table(df$Colours, df$Letters))

# visual inspection makes it clear what the components are
g <- graph_from_adjacency_matrix(adj.mat, mode = 'undirected', diag = F)
plot(g)

# we create a dataframe that matches each color to a component
mdf <- data.frame(Group_test = components(g)$membership,
                  Colours = names(components(g)$membership))

mdf
#>       Group_test Colours
#> Blue           1    Blue
#> Brown          2   Brown
#> Green          3   Green
#> Red            1     Red

# Then we just match them together
dplyr::left_join(df, mdf)
#> Joining, by = "Colours"
#>   Colours Letters Group Group_test
#> 1   Green       X     1          3
#> 2     Red       C     2          1
#> 3   Green       Y     1          3
#> 4   Green       A     1          3
#> 5    Blue       C     2          1
#> 6     Red       T     2          1
#> 7   Brown       P     3          2

显然,这些组的编号不同,但颜色的划分方式相似。

我们可以将扩展案例视为一个健全性检查,我们添加一个链接颜色,将组件集减少到 2 个:

# examining the extended case as a check
df2 <- data.frame(Colours = c(Colours, "Green"), Letters = c(Letters, "C"))
df2
#>   Colours Letters
#> 1   Green       X
#> 2     Red       C
#> 3   Green       Y
#> 4   Green       A
#> 5    Blue       C
#> 6     Red       T
#> 7   Brown       P
#> 8   Green       C

# lets wrap the procedure in a function for convenience
getGroup <- function(col, let, plot = FALSE){
  adj.mat <- table(col, let) %*% table(let, col)
  g <- graph_from_adjacency_matrix(adj.mat, mode = 'undirected',
                                   diag = F)
  if (plot) {plot(g)}
  comps <- components(g)$membership
  mdf <- data.frame(Group = comps, Colours = names(comps))
  mdf
}

# we get our desired group key (which we can merge back to the dataframe)
getGroup(df2$Colours, df2$Letters)
#>       Group Colours
#> Blue      1    Blue
#> Brown     2   Brown
#> Green     1   Green
#> Red       1     Red

reprex 包(v0.2.1)于 2018 年 11 月 7 日创建


推荐阅读