python - Iterate rows and find sum of rows not exceeding a number
问题描述
Below is a dataframe showing coordinate values from and to, each row having a corresponding value column.
I want to find the range of coordinates where the value column doesn't exceed 5. Below is the dataframe input.
import pandas as pd
From=[10,20,30,40,50,60,70]
to=[20,30,40,50,60,70,80]
value=[2,3,5,6,1,3,1]
df=pd.DataFrame({'from':From, 'to':to, 'value':value})
print(df)
hence I want to convert the following table:
to the following outcome:
Further explanation:
Coordinates from 10 to 30 are joined and the value column changed to 5 as its sum of values from 10 to 30 (not exceeding 5)
Coordinates 30 to 40 equals 5
Coordinate 40 to 50 equals 6 (more than 5, however, it's included as it cannot be divided further)
- Remaining coordinates sum up to a value of 5
What code is required to achieve the above?
解决方案
我们可以在 cumsum 上做一个 groupby:
s = df['value'].ge(5)
(df.groupby([~s, s.cumsum()], as_index=False, sort=False)
.agg({'from':'min','to':'max', 'value':'sum'})
)
输出:
from to value
0 10 30 5
1 30 40 5
2 40 50 6
3 50 80 5
更新:看起来您想累积值,以便新组不超过5
. SO上有几个线程说这只能通过for循环来完成。所以我们可以这样做:
thresh = 5
groups, partial, curr_grp = [], thresh, 0
for v in df['value']:
if partial + v > thresh:
curr_grp += 1
partial = v
else:
partial += v
groups.append(curr_grp)
df.groupby(groups).agg({'from':'min','to':'max', 'value':'sum'})
推荐阅读
- laravel - 来自两个输入的 laravel 同步
- c# - Xamarin Android - 在片段中访问 textview onclick
- javascript - Javascript在初始化之前无法访问类
- python - 从 excel 导入适用于 ImportExportModelAdmin,但不能通过用户创建的导入视图
- python - Selenium WebDriverWait 返回异常
- asp.net - Automapper 忽略集合属性
- python - 如何在google colab中使用python selenium下载pdf
- infinispan - 对象图和 Infinispan
- php - 查询时乘以 JSON 文件中的动态值
- python - 使用 pyinaturalist 在 iNaturalist 区域或项目中获取分类群