首页 > 解决方案 > 在多索引数据框中获取一定数量的行

问题描述

在以下多索引数据框中...:

import pandas as pd
import numpy as np

sz=100
typ = [np.random.choice(['ABC', 'DEF', 'GHI', 'JKL']) for _ in range(sz)]
sub_typ = [np.random.choice(['Up', 'Down']) for _ in range(sz)]
field1 = [np.random.randint(0, 9) for _ in range(sz)]
field2 = [np.random.uniform() for _ in range(sz)]

df = pd.DataFrame({'typ': typ, 'sub_typ': sub_typ, 'field1': field1, 'field2': field2})

max_rows = {'ABC': 5, 'JKL': 3} # Maximum no of rows to be selected by type

...我想根据这个字典将前几行提取到数据框中:

max_rows = {'ABC': 2, 'JKL': 3} # Maximum no of rows to be selected by typ

我可以使用df.groupby(['typ', 'sub_typ']).apply(lambda g: g.index)命令查看索引,但不知道如何检索所需的行。预期结果将是 ABC.Down 的索引为 42 和 65、ABC.Up 的索引为 27 和 48、JKL.Up 的索引为 2、10 和 20 的行...

typ  sub_typ
ABC  Down             Int64Index([42, 65, 70, 77], dtype='int64')
     Up         Int64Index([27, 48, 54, 57, 63, 83, 89, 98], d...
DEF  Down       Int64Index([3, 4, 5, 8, 12, 13, 16, 23, 28, 36...
     Up         Int64Index([14, 15, 19, 21, 29, 35, 40, 46, 49...
GHI  Down       Int64Index([0, 6, 7, 9, 11, 30, 34, 37, 38, 53...
     Up         Int64Index([1, 17, 41, 43, 52, 56, 59, 62, 71,...
JKL  Down       Int64Index([2, 10, 20, 22, 24, 26, 45, 47, 51,...
     Up         Int64Index([18, 25, 31, 32, 33, 61, 79, 84, 85...
dtype: object

有没有办法做到这一点?

标签: pandas

解决方案


首先过滤匹配字典中键的行Series.isinboolean indexing然后将字典映射到DataFrame.headlambda 函数中:

np.random.seed(20)
    
sz=100
typ = [np.random.choice(['ABC', 'DEF', 'GHI', 'JKL']) for _ in range(sz)]
sub_typ = [np.random.choice(['Up', 'Down']) for _ in range(sz)]
field1 = [np.random.randint(0, 9) for _ in range(sz)]
field2 = [np.random.uniform() for _ in range(sz)]

df = pd.DataFrame({'typ': typ, 'sub_typ': sub_typ, 'field1': field1, 'field2': field2})

max_rows = {'ABC': 2, 'JKL': 3} # Maximum no of rows to be selected by type


mask = df['typ'].isin(max_rows.keys())
df = df[mask].groupby(['typ', 'sub_typ']).apply(lambda x: x.head(max_rows[x.name[0]]))
print (df)
                typ sub_typ  field1    field2
typ sub_typ                                  
ABC Down    7   ABC    Down       1  0.618937
            14  ABC    Down       4  0.415884
    Up      4   ABC      Up       5  0.822250
            12  ABC      Up       3  0.138418
JKL Down    8   JKL    Down       8  0.203591
            18  JKL    Down       4  0.779920
            19  JKL    Down       3  0.843493
    Up      0   JKL      Up       1  0.648625
            2   JKL      Up       3  0.920890
            3   JKL      Up       6  0.437324

对于删除MultiIndex使用:

df = df.reset_index(drop=True)
print (df)
   typ sub_typ  field1    field2
0  ABC    Down       1  0.618937
1  ABC    Down       4  0.415884
2  ABC      Up       5  0.822250
3  ABC      Up       3  0.138418
4  JKL    Down       8  0.203591
5  JKL    Down       4  0.779920
6  JKL    Down       3  0.843493
7  JKL      Up       1  0.648625
8  JKL      Up       3  0.920890
9  JKL      Up       6  0.437324

推荐阅读