首页 > 解决方案 > r - 查找所有组合,不包括来自同一组的组合

问题描述

我有一组点。我想要一个点与其他所有点的组合,不包括与它在同一组中的点。

考虑示例数据:

Group  Longitude  Latitude
Group1 1.5        4.0
Group1 3.0        5.0
Group1 1.3        3.0
Group2 2.0        8.0
Group2 0.7        6.5
Group3 2.0        3.0

然后我想要的输出将是这样的:

GroupA  LongitudeA  LatitudeA  GroupB  LongitudeB  LatitudeB
Group1  1.5         4.0        Group2  2.0         8.0
Group1  1.5         4.0        Group2  0.7         6.5
Group1  1.5         4.0        Group3  2.0         3.0
Group1  3.0         5.0        Group2  2.0         8.0 
Group1  3.0         5.0        Group2  0.7         6.5
Group1  3.0         5.0        Group3  2.0         3.0

等等

可重现的数据:

structure(list(Group = c("Group 1", "Group 1", "Group 1", "Group 2", 
"Group 2", "Group 3"), Longitude = c(1.5, 3, 1.3, 2, 0.7, 2), 
Latitude = c(4, 5, 3, 8, 6.5, 3)), class = "data.frame", row.names = c(NA, 
-6L), spec = structure(list(cols = list(Group = structure(list(), class = 
c("collector_character", 
"collector")), Longitude = structure(list(), class = c("collector_double", 
"collector")), Latitude = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec")) 

标签: rjoin

解决方案


我们可以通过过滤所有行的笛卡尔 SQL 连接来解决这个问题Group = Group_B。这是对sqldf包的简单操作。

data_a <-
     structure(
          list(
               Group = c("Group 1", "Group 1", "Group 1", "Group 2",
                         "Group 2", "Group 3"),
               Longitude = c(1.5, 3, 1.3, 2, 0.7, 2),
               Latitude = c(4, 5, 3, 8, 6.5, 3)
          ),
          class = "data.frame",
          row.names = c(NA,-6L),
          spec = structure(list(
               cols = list(
                    Group = structure(list(), class =
                                           c("collector_character",
                                             "collector")),
                    Longitude = structure(list(), class = c("collector_double",
                                                            "collector")),
                    Latitude = structure(list(), class = c("collector_double",
                                                           "collector"))
               ),
               default = structure(list(), class = c("collector_guess",
                                                     "collector")),
               skip = 1
          ), class = "col_spec")
     ) 

读取数据后,我们将其复制到第二个数据框中,并通过附加_B到每个列名称来重命名第二个数据框中的列。

data_b <- data_a 

colnames(data_b) <- paste(colnames(data_b),"_B",sep="")

然后我们加载sqldf并执行一个 SQL 查询,该查询创建 和 的笛卡尔积,将 中的每一行data_a与中的每一行data_b匹配。该子句消除了结果集中等于的每一行。data_adata_bwhereGroupGroup_B

library(sqldf)

sqlStmt <- c('select * from data_a, data_b 
             where "Group" <> "Group_B"')
joined <- sqldf(sqlStmt)

...和输出:

> joined

     Group Longitude Latitude Group_B Longitude_B Latitude_B
1  Group 1       1.5      4.0 Group 2         2.0        8.0
2  Group 1       1.5      4.0 Group 2         0.7        6.5
3  Group 1       1.5      4.0 Group 3         2.0        3.0
4  Group 1       3.0      5.0 Group 2         2.0        8.0
5  Group 1       3.0      5.0 Group 2         0.7        6.5
6  Group 1       3.0      5.0 Group 3         2.0        3.0
7  Group 1       1.3      3.0 Group 2         2.0        8.0
8  Group 1       1.3      3.0 Group 2         0.7        6.5
9  Group 1       1.3      3.0 Group 3         2.0        3.0
10 Group 2       2.0      8.0 Group 1         1.5        4.0
11 Group 2       2.0      8.0 Group 1         3.0        5.0
12 Group 2       2.0      8.0 Group 1         1.3        3.0
13 Group 2       2.0      8.0 Group 3         2.0        3.0
14 Group 2       0.7      6.5 Group 1         1.5        4.0
15 Group 2       0.7      6.5 Group 1         3.0        5.0
16 Group 2       0.7      6.5 Group 1         1.3        3.0
17 Group 2       0.7      6.5 Group 3         2.0        3.0
18 Group 3       2.0      3.0 Group 1         1.5        4.0
19 Group 3       2.0      3.0 Group 1         3.0        5.0
20 Group 3       2.0      3.0 Group 1         1.3        3.0
21 Group 3       2.0      3.0 Group 2         2.0        8.0
22 Group 3       2.0      3.0 Group 2         0.7        6.5

如果输出需要消除组被反转的行(即Group值 2 与Group_B等于 1 匹配的点),可以按如下方式执行此操作:

sqlStmt <- c('select * from data_a, data_b 
             where "Group" < "Group_B"')
joined <- sqldf(sqlStmt)

joined

...和输出:

> joined
     Group Longitude Latitude Group_B Longitude_B Latitude_B
1  Group 1       1.5      4.0 Group 2         2.0        8.0
2  Group 1       1.5      4.0 Group 2         0.7        6.5
3  Group 1       1.5      4.0 Group 3         2.0        3.0
4  Group 1       3.0      5.0 Group 2         2.0        8.0
5  Group 1       3.0      5.0 Group 2         0.7        6.5
6  Group 1       3.0      5.0 Group 3         2.0        3.0
7  Group 1       1.3      3.0 Group 2         2.0        8.0
8  Group 1       1.3      3.0 Group 2         0.7        6.5
9  Group 1       1.3      3.0 Group 3         2.0        3.0
10 Group 2       2.0      8.0 Group 3         2.0        3.0
11 Group 2       0.7      6.5 Group 3         2.0        3.0

推荐阅读