首页 > 解决方案 > Iterate rows and find sum of rows not exceeding a number

问题描述

Below is a dataframe showing coordinate values from and to, each row having a corresponding value column.

I want to find the range of coordinates where the value column doesn't exceed 5. Below is the dataframe input.

import pandas as pd

From=[10,20,30,40,50,60,70]
to=[20,30,40,50,60,70,80]
value=[2,3,5,6,1,3,1]


df=pd.DataFrame({'from':From, 'to':to, 'value':value})
print(df)

hence I want to convert the following table:

enter image description here

to the following outcome:

enter image description here

Further explanation:

  1. Coordinates from 10 to 30 are joined and the value column changed to 5 as its sum of values from 10 to 30 (not exceeding 5)

  2. Coordinates 30 to 40 equals 5

  3. Coordinate 40 to 50 equals 6 (more than 5, however, it's included as it cannot be divided further)

  4. Remaining coordinates sum up to a value of 5

What code is required to achieve the above?

标签: pythonpandas

解决方案


我们可以在 cumsum 上做一个 groupby:

s = df['value'].ge(5)
(df.groupby([~s, s.cumsum()], as_index=False, sort=False)
   .agg({'from':'min','to':'max', 'value':'sum'})
)

输出:

   from  to  value
0    10  30      5
1    30  40      5
2    40  50      6
3    50  80      5

更新:看起来您想累积值,以便新组不超过5. SO上有几个线程说这只能通过for循环来完成。所以我们可以这样做:

thresh = 5

groups, partial, curr_grp = [], thresh, 0
for v in df['value']:
    if partial + v > thresh:
        curr_grp += 1
        partial = v
    else:
        partial += v

    groups.append(curr_grp)

df.groupby(groups).agg({'from':'min','to':'max', 'value':'sum'})

推荐阅读