首页 > 解决方案 > 在 pandas 中查找 3 列的可能唯一组合

问题描述

我试图在 pandas 中找到 3 个变量列的所有可能组合。示例 df 如下所示:

          Variable_Name Variable1 Variable2 Variable3
0                  X      6.0%      8.0%     10.0%
1                  Y      3.0%      4.0%      5.0%
2                  Z      1.0%      3.0%      5.0%

这些组合只能从该列获取值,而不能将值移动到其他列,例如,使用 4.0% 作为“X”是不正确的。

尝试使用itertools.combinations, itertools.productitertools.permutation但这些结果给出了所有可能的组合。

我希望结果看起来像这样,给出 27 种可能的组合:

     Y      X     Z
0   3.0%   6.0%  1.0%
1   3.0%   6.0%  3.0%
2   3.0%   6.0%  5.0%
3   3.0%   8.0%  1.0%
4   3.0%   8.0%  3.0%
5   3.0%   8.0%  5.0%
6   3.0%  10.0%  1.0%
7   3.0%  10.0%  3.0%
8   3.0%  10.0%  5.0%
9   4.0%   8.0%  3.0%
10  4.0%   8.0%  1.0%
11  4.0%   8.0%  5.0%
12  4.0%   6.0%  1.0%
13  4.0%   6.0%  3.0%
14  4.0%   6.0%  5.0%
15  4.0%  10.0%  1.0%
16  4.0%  10.0%  3.0%
17  4.0%  10.0%  5.0%
18  5.0%  10.0%  5.0%
19  5.0%  10.0%  1.0%
20  5.0%  10.0%  3.0%
21  5.0%   8.0%  1.0%
22  5.0%   8.0%  3.0%
23  5.0%   8.0%  5.0%
24  5.0%   6.0%  1.0%
25  5.0%   6.0%  3.0%
26  5.0%   6.0%  5.0%

任何帮助将不胜感激。

标签: pythonpandasdataframe

解决方案


让我们尝试连续交叉合并每个变量的值:

from functools import reduce

import pandas as pd

df = pd.DataFrame({'Variable_Name': {0: 'X', 1: 'Y', 2: 'Z'},
                   'Variable1': {0: '6.0%', 1: '3.0%', 2: '1.0%'},
                   'Variable2': {0: '8.0%', 1: '4.0%', 2: '3.0%'},
                   'Variable3': {0: '10.0%', 1: '5.0%', 2: '5.0%'}})

# Save Var Names for later
var_names = df['Variable_Name']

# Get Variables Options in Own Rows
new_df = df.set_index('Variable_Name').stack() \
    .droplevel(1, 0) \
    .reset_index()

# Get Collection of DataFrames each with its own variable
dfs = tuple(new_df[new_df['Variable_Name'].eq(v)]
            .drop(columns=['Variable_Name']) for v in var_names)
# Successive Cross Merges
new_df = reduce(lambda left, right: pd.merge(left, right, how='cross'), dfs)
# Fix Column Names
new_df.columns = var_names
# Fix Axis Names
new_df = new_df.rename_axis(None, axis=1)

# For Display
print(new_df.to_string())

输出:

        XYZ
0 6.0% 3.0% 1.0%
1 6.0% 3.0% 3.0%
2 6.0% 3.0% 5.0%
3 6.0% 4.0% 1.0%
4 6.0% 4.0% 3.0%
5 6.0% 4.0% 5.0%
6 6.0% 5.0% 1.0%
7 6.0% 5.0% 3.0%
8 6.0% 5.0% 5.0%
9 8.0% 3.0% 1.0%
10 8.0% 3.0% 3.0%
11 8.0% 3.0% 5.0%
12 8.0% 4.0% 1.0%
13 8.0% 4.0% 3.0%
14 8.0% 4.0% 5.0%
15 8.0% 5.0% 1.0%
16 8.0% 5.0% 3.0%
17 8.0% 5.0% 5.0%
18 10.0% 3.0% 1.0%
19 10.0% 3.0% 3.0%
20 10.0% 3.0% 5.0%
21 10.0% 4.0% 1.0%
22 10.0% 4.0% 3.0%
23 10.0% 4.0% 5.0%
24 10.0% 5.0% 1.0%
25 10.0% 5.0% 3.0%
26 10.0% 5.0% 5.0%

推荐阅读