首页 > 解决方案 > 如何在 python pandas 中标记循环数的值

问题描述

我有一个列,迭代 1 到 3。我需要一个循环数,它出现在中间列中。请问如何使用 pandas 获取第二列号?

这是表格:

column  | I need   |Note
-----------------------------------------------------------------------
2       | 1        |first cycle although not starting from 1
3       | 1        |first cycle although not starting from 1
-----------------------------------------------------------------------
1       | 2        |second cycle
2       | 2        |second cycle
3       | 2        |second cycle
-----------------------------------------------------------------------
1       | 3        |
2       | 3        |
3       | 3        |
-----------------------------------------------------------------------
1       | 4        |
2       | 4        |
3       | 4        |
-----------------------------------------------------------------------
1       | 5        |
2       | 5        |
3       | 5        |
-----------------------------------------------------------------------
1       | 6        |
2       | 6        |
3       | 6        |
-----------------------------------------------------------------------
1       | 7        |7th cycle and does have to end in 3
2       | 7        |    

标签: pythonpandas

解决方案


使用您的样本数据的第一个差异Series.diff,比较不太喜欢0和最后的累积总和Series.cumsum

df['new'] = df['column'].diff().lt(0).cumsum() + 1

Series.map如果值是字符串,则可以使用字典将其编码为数字:

df['new'] = df['column'].map({'1':0, '2':2, '3':3}).diff().lt(0).cumsum() + 1

print (df)
    column  I need  new
0        2       1    1
1        3       1    1
2        1       2    2
3        2       2    2
4        3       2    2
5        1       3    3
6        2       3    3
7        3       3    3
8        1       4    4
9        2       4    4
10       3       4    4
11       1       5    5
12       2       5    5
13       3       5    5
14       1       6    6
15       2       6    6
16       3       6    6
17       1       7    7
18       2       7    7

编辑:您可以使用一组中的所有值为地图创建字典enumerate

d = {v:k for k, v in enumerate(['1','2','3'])}
#if possible create groups by all unique values - check order before
#print (df.columns.unique())
#d = {v:k for k, v in enumerate(df.columns.unique()}
df['new'] = df['column'].map(d).diff().lt(0).cumsum() + 1

推荐阅读