首页 > 解决方案 > 有没有办法确定哪些值组合会导致一列的总和最低?

问题描述

我有一个包含 6 列的 DataFrame,其中 5 列的值为01。我想知道哪些列一起或组合在一起(当每行 ==1 或 ==0 时)会导致 column 中的和最小"Target DF"。例如,对于下面的数据,如果col1 == 1col2 == 1,则 中的值之和 为"Target Df"0.37%,但如果col1 == 1和则之和为 0.08%。有什么好的方法可以测试哪种组合的总和最低?col2 == 1col3 == 1"Target Df"

    Target DF   col1    col2    col3    col4    col5
1   -0.0002908206429779181  1   0   0   0   1
2   -0.00020275213991660657 1   0   0   0   0
3   0.00020279325668326464  1   1   1   1   0
4   0.00010578372517389134  0   0   0   1   1
5   -0.00029087447444264836 0   1   0   0   0
6   0.0002027896806502394   0   1   0   0   1
7   -0.00012341217020306328 1   0   0   0   1
8   -0.0003350172357550196  0   1   1   1   1
9   0.0003439487075465042   0   0   1   1   1
10  -8.816165320779668e-06  0   1   0   0   0
11  0.000440812152309622    1   1   0   0   0
12  0.000890048203600724    0   1   0   0   1
13  0.0007043617602000563   0   1   0   0   0
14  0.00036952964155623214  0   0   1   0   1
15  0.00032541776605099493  0   0   1   1   0
16  -0.00036048075824046677 0   0   0   1   1
17  0.0001407261469179577   1   0   0   0   0
18  -2.638243984798283e-05  1   1   1   0   1
19  0.0003429807666939233   0   1   0   0   1
20  0.00011428772374011409  1   0   1   0   0
21  0.000290081838239864    0   0   1   1   1
22  0.0007293881926990764   1   1   0   0   0
23  -0.00010537685397415952 0   0   1   1   1
24  0.0001229526193300412   0   1   1   0   1
25  0.0007727500241483387   1   0   1   1   1
26  1.754893960548287e-05   0   0   1   1   1
27  -3.509726329098406e-05  1   1   1   0   1
28  -0.00025446408985230207 0   1   0   0   0
29  -0.00046517343069785166 1   0   1   1   0
30  -0.00017561883687644553 1   0   0   0   0
31  -0.00013173726320236945 0   1   1   0   0
32  -0.0004391820673177449  0   0   1   1   0
33  -0.0002812000210901111  0   1   0   0   0
34  0.00014942953079133403  0   0   1   1   0
35  -0.00014061854582836375 1   1   0   0   0
36  0.00020216758814073366  0   0   0   1   1
37  -0.0004745583970471312  1   0   1   1   0
38  -0.00014946894562850588 1   0   1   0   0
39  -0.00010552326348267638 1   1   1   1   0
40  -0.0004836993324949246  0   1   1   0   0
41  0.0003343539928906125   0   1   0   1   1
42  0.00035183393438287247  0   1   1   0   0
43  8.792754770059474e-06   0   0   0   0   0
44  -0.0007737556163227577  1   1   1   0   1
45  -0.0004751722499405364  0   1   0   1   0
46  0.0006074531864879162   1   1   0   0   0
47  0.00013197487198435276  1   0   0   0   1
48  -0.0005982071380187248  0   0   1   0   1
49  0.000668984639760728    1   1   0   1   1
50  0.0007037235773789119   1   1   1   1   1
51  0.0005105588859253274   1   1   1   1   1
52  0.0005278948433471431   0   0   0   1   0
53  0.00041373603640870726  0   1   1   0   0
54  0.0005988445822178168   0   0   0   1   0
55  0.0006512937863052404   1   1   0   0   1
56  6.156877231866531e-05   0   0   1   0   1
57  0.00029906673586244104  1   1   1   0   0

标签: pythonpandasnumpyscipy

解决方案


您需要对所有内容进行分组combinations

从 itertools 导入组合

cols = ['col1','col2','col3','col4','col5']

max_val, max_combine, max_name = None, None, None
for i in range(len(cols)):
    for sub in combinations(cols,i+1):
        k = df.groupby(list(sub))['Target DF'].sum()
        
        # find the maximum
        val = k.max()
        if max_val is None or max_val < val:
            max_val, max_combine, max_name = val, k.idxmax(), sub
        
print('max Target', max_val, 'at', max_name, 'with value',max_combine)

输出:

max Target 0.004834 at ('col2',) with value 1

推荐阅读