首页 > 解决方案 > 从多索引映射到多列

问题描述

我有一个巨大的多索引数据框。我希望根据多索引的部分内容创建新列。这就是我所拥有的:

arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'foo', 'foo','foo','qux', 'qux'],
          ['one', 'two', 'three', 'one', 'four', 'one', 'two', 'eight','one', 'two'],
          ['green', 'green', 'blue', 'blue', 'black', 'black', 'orange', 'green','blue', 'black']  ]
s = pd.DataFrame(np.random.randn(10), index=arrays)
s.index.names = ['p1','p2','p3']

s
                         0
p1  p2    p3              
bar one   green  -0.676472
    two   green  -0.030377
    three blue   -0.957517
baz one   blue    0.710764
    four  black   0.404377
foo one   black  -0.286358
    two   orange -1.620832
    eight green   0.316170
qux one   blue   -0.433310
    two   black   1.127754

这就是我要的:

                         0  x1  x2  x3
p1  p2    p3                          
bar one   green   1.563381   1   0   1
    two   green   0.193622   0   0   0
    three blue    0.046728   1   0   0
baz one   blue    0.098216   0   0   0
          black   1.826574   0   1   0
foo one   black  -0.120856   1   1   1
    two   orange  0.605020   0   0   0
    eight green   0.693606   0   0   0
qux one   blue    0.588244   1   1   1
    two   black  -0.872104   1   1   1

现在,在伪代码中,我想:

if (p1 =='bar') & (p2 == 'one') & (p3 == 'green'): s['x1'] = 1, s['x3'] = 1
if (p1 == 'bar') & (p3 == 'blue'): s['x1'] = 1
if (p1 == 'baz') & (p3 == 'black'): s['x2'] = 1
if (p1 =='foo') & (p2 == 'one') & (p3 == 'black'): s['x1'] = 1, s['x2'] = 1, s['x3'] = 1
if (p1 == 'qux'): s['x1'] = 1, s['x2'] = 1, s['x3'] = 1

即基于多索引列的值,我想将1分配给新的x列。我正在寻找像 numpy.select (condition, choice) 这样的矢量化方法,但我无法让 numpy.select 在每个条件下使用多个选项。

由于我有 14 个索引列,因此我希望明确使用我条件的列的名称(即(p1 == 'bar') & (p2 == 'one')首选而不是['bar','one',])。

任何指导将不胜感激!

谢谢您的帮助!

标签: pythonpandasmappingmulti-index

解决方案


这里可以通过索引切片使用选择并通过如下方式设置列1

idx = pd.IndexSlice
s = s.assign(x1=0, x2=0, x3=0)
s.loc[idx['bar','one','green'], ['x1','x3']] = 1
s.loc[idx['bar',:,'blue'], ['x1']] = 1 
s.loc[idx['baz',:,'black'], ['x2']] = 1 
s.loc[idx['foo','one','black'], ['x1','x2','x3']] = 1
s.loc[idx['qux',:,:],  ['x1','x2','x3']] = 1

print (s)
                         0  x1  x2  x3
p1  p2    p3                          
bar one   green   0.152556   1   0   1
    two   green   0.488762   0   0   0
    three blue    0.037346   1   0   0
baz one   blue    1.903518   0   0   0
    four  black   0.589922   0   1   0
foo one   black   0.871984   1   1   1
    two   orange  0.514062   0   0   0
    eight green  -0.177246   0   0   0
qux one   blue    0.740046   1   1   1
    two   black   0.755664   1   1   1

编辑:

def get_i(lev, val):
    return s.index.get_level_values(lev) == val

s = s.assign(x1=0, x2=0, x3=0)
s.loc[get_i('p1','bar') & get_i('p2','one') & get_i('p3','green'), ['x1','x3']] = 1
s.loc[get_i('p1','bar') & get_i('p3','blue'), ['x1']] = 1 
s.loc[get_i('p1','baz') & get_i('p3','black'), ['x2']] = 1 
s.loc[get_i('p1','foo') & get_i('p2','one') & get_i('p3','black'), ['x1','x2','x3']] = 1
s.loc[get_i('p1','qux'), ['x1','x2','x3']] = 1


print (s)
                         0  x1  x2  x3
p1  p2    p3                          
bar one   green  -0.029773   1   0   1
    two   green  -1.505461   0   0   0
    three blue    1.819085   1   0   0
baz one   blue    0.645498   0   0   0
    four  black  -1.119554   0   1   0
foo one   black   1.002072   1   1   1
    two   orange -0.461030   0   0   0
    eight green  -2.565080   0   0   0
qux one   blue    0.286967   1   1   1
    two   black  -0.522340   1   1   1

推荐阅读