首页 > 解决方案 > 根据范围映射数据框中的值

问题描述

我有一个数据框df

import pandas
df = pandas.DataFrame(data=[1,2,3,2,2,2,3,3,4,5,10,11,12,1,2,1,1], columns=['codes'])

 codes
0       1
1       2
2       3
3       2
4       2
5       2
6       3
7       3
8       4
9       5
10     10
11     11
12     12
13      1
14      2
15      1
16      1

我想code 根据特定逻辑对列中的值进行分组:

values == 0 become A
values in the range (1,4) becomes B
values == 5 becomes C
values in the range (6,16) becomes D

有没有办法将逻辑和数据框分开,以便将来轻松更改分组规则?我想避免写

df.loc[df['code']==0,'code']=A
df.loc[(df['code']>=1 & df['code']<=4),'code']=B

标签: pythonpandasdataframegrouping

解决方案


第一个想法是Series.map与合并字典一起使用,第二个想法是cutright=False

df = pd.DataFrame(data=[0,1,2,3,2,2,2,3,3,4,5,10,11,12,16,2,17,1], columns=['codes'])

d1 = {0: 'A', 5:'C'}
d2 = dict.fromkeys(range(1,5), 'B')
d3 = dict.fromkeys(range(6,17), 'D')

d = {**d1, **d2, **d3}
df['codes1'] = df['codes'].map(d)
df['codes2'] = pd.cut(df['codes'], bins=(0,1,5,6,17), labels=list('ABCD'), right=False)
print (df)
    codes codes1 codes2
0       0      A      A
1       1      B      B
2       2      B      B
3       3      B      B
4       2      B      B
5       2      B      B
6       2      B      B
7       3      B      B
8       3      B      B
9       4      B      B
10      5      C      C
11     10      D      D
12     11      D      D
13     12      D      D
14     16      D      D
15      2      B      B
16     17    NaN    NaN
17      1      B      B

推荐阅读