首页 > 解决方案 > 查找所有列组合的所有因子组合的频率

问题描述

我有一个包含 n 个变量的数据框,这些变量的值都是因子。现在我想从这个数据框中选择 m 列 (m < n) 并找到所有可能选择的列的所有因子组合的频率。

我已经查过了,但我只发现如果选择了特定的列,如何找到因子组合的频率。在我的情况下,可能有许多列组合,因为 m < n

这是我们的数据,所有变量都有因子值。

company <- data.frame("country" = c("USA", "China", 'France', "Germany"),
                    "category" = c("C-corp", "S-corp", "C-corp", "LLC"),
                    "Type" = c("Public", "Private", "Private", "Private"),
                    "Profit" = c("High", "High", "High", "Low"))

现在我想选择 2 列 (m = 2) 并找出所有可能选择的变量的因子组合的频率

在这种情况下,我可以有“country = USA & category = S-Corp”、“country = USA & category = C-Corp”、“country = China & category = LLC”。但我也可以选择其他列并设置“国家 = 美国 & 利润 = 低”、“国家 = 中国 & 类型 = 公共”。我想知道所有这些组合的频率

编辑:我的预期输出类似于

country = USA, category = C-corp  freq 1
country = USA, category = S-corp  freq 0
country = USA, category = LLC  freq 0
country = China, category = LLC  freq 0
country = France, category = C-corp  freq 1
country = USA, type = Public    freq 1
country = China, type = Public    freq 0
Type = Private, Profit = High   freq 2
Type = Public, category = LLC  freq 0
category = Private, Profit = Low freq 1

如果我需要选择 2 列,我需要所有可能的列组合,顺序无关紧要

标签: rdataframe

解决方案


组合部分听起来像expand.grid()

expand.grid(company[, 1:2])

   country category
1      USA   C-corp
2    China   C-corp
3   France   C-corp
4  Germany   C-corp
5      USA   S-corp
6    China   S-corp
7   France   S-corp
8  Germany   S-corp
9      USA   C-corp
10   China   C-corp
11  France   C-corp
12 Germany   C-corp
13     USA      LLC
14   China      LLC
15  France      LLC
16 Germany      LLC

# or if you want 4 columns with all countries, do a cross join:

merge(company[, 1, drop = F], company[, -1], by = NULL)

#or if you want 4 columns with all possible results, do expand.grid without subsetting:

expand.grid(company)

第二部分听起来像table()。您可以直接在companydata.frame 上执行它:

table(company)

, , Type = Private, Profit = High

         category
country   C-corp LLC S-corp
  China        0   0      1
  France       1   0      0
  Germany      0   0      0
  USA          0   0      0

, , Type = Public, Profit = High

         category
country   C-corp LLC S-corp
  China        0   0      0
  France       0   0      0
  Germany      0   0      0
  USA          1   0      0

, , Type = Private, Profit = Low

         category
country   C-corp LLC S-corp
  China        0   0      0
  France       0   0      0
  Germany      0   1      0
  USA          0   0      0

, , Type = Public, Profit = Low

         category
country   C-corp LLC S-corp
  China        0   0      0
  France       0   0      0
  Germany      0   0      0
  USA          0   0      0

推荐阅读