python - 如何在 python pandas 中标记循环数的值

问题描述

我有一个列，迭代 1 到 3。我需要一个循环数，它出现在中间列中。请问如何使用 pandas 获取第二列号？

这是表格：

column  | I need   |Note
-----------------------------------------------------------------------
2       | 1        |first cycle although not starting from 1
3       | 1        |first cycle although not starting from 1
-----------------------------------------------------------------------
1       | 2        |second cycle
2       | 2        |second cycle
3       | 2        |second cycle
-----------------------------------------------------------------------
1       | 3        |
2       | 3        |
3       | 3        |
-----------------------------------------------------------------------
1       | 4        |
2       | 4        |
3       | 4        |
-----------------------------------------------------------------------
1       | 5        |
2       | 5        |
3       | 5        |
-----------------------------------------------------------------------
1       | 6        |
2       | 6        |
3       | 6        |
-----------------------------------------------------------------------
1       | 7        |7th cycle and does have to end in 3
2       | 7        |

标签： pythonpandas

使用您的样本数据的第一个差异Series.diff，比较不太喜欢0和最后的累积总和Series.cumsum：

df['new'] = df['column'].diff().lt(0).cumsum() + 1

Series.map如果值是字符串，则可以使用字典将其编码为数字：

df['new'] = df['column'].map({'1':0, '2':2, '3':3}).diff().lt(0).cumsum() + 1

print (df)
    column  I need  new
0        2       1    1
1        3       1    1
2        1       2    2
3        2       2    2
4        3       2    2
5        1       3    3
6        2       3    3
7        3       3    3
8        1       4    4
9        2       4    4
10       3       4    4
11       1       5    5
12       2       5    5
13       3       5    5
14       1       6    6
15       2       6    6
16       3       6    6
17       1       7    7
18       2       7    7

编辑：您可以使用一组中的所有值为地图创建字典enumerate：

d = {v:k for k, v in enumerate(['1','2','3'])}
#if possible create groups by all unique values - check order before
#print (df.columns.unique())
#d = {v:k for k, v in enumerate(df.columns.unique()}
df['new'] = df['column'].map(d).diff().lt(0).cumsum() + 1

python - 如何在 python pandas 中标记循环数的值

问题描述

解决方案

推荐阅读