首页 > 解决方案 > Python获取数字序列中2列之间的最小值最大值

问题描述

我的数据框如下所示:

id  start  end
1   101    102
1   102    104
1   104    110
1   125    128
2   100    102
2   102    104
2   110    115  

我想要输出为:

id  start  end
1   101    110
1   125    128
2   100    104
2   110    115  

标签: pythonpandas

解决方案


这是一种方法:

import numpy as np

a = df[['start', 'end']].values
# check which end is different to the start of the row bellow
m = (a[:-1] != a[1:,::-1]).all(1)
# array([False, False,  True,  True, False,  True])
# Take the cumsum and use it to group the df rows
g = np.cumsum(np.r_[False, m])
# array([0, 0, 0, 1, 2, 2, 3], dtype=int32)
# group the df and take the first an last sample accordingly
out = df.groupby(g).agg({'id':'first', 'start':'first', 'end':'last'})

print(out)

   id  start  end
0   1    101  110
1   1    125  128
2   2    100  104
3   2    110  115

推荐阅读