python - 大熊猫的条件加法
问题描述
我在熊猫中有以下数据框
Code Sum Quantity
0 -12 0
1 23 0
2 -10 0
3 -12 0
4 100 0
5 102 201
6 34 0
7 -34 0
8 -23 0
9 100 0
10 100 0
11 102 300
我想要的数据框是
Code Sum Quantity new_sum
0 -12 0 -12
1 23 0 23
2 -10 0 -10
3 -12 0 -12
4 100 0 0
5 102 201 202
6 34 0 34
7 -34 0 -34
8 -23 0 -23
9 100 0 0
10 100 0 0
11 102 300 302
逻辑是
首先,我将检查数量上的非零值,在上面的示例数据中,我们得到了第一个非零出现的数量,index 4 which is 201
然后我想添加 sum 列,直到我在上面的行中得到正值index 4
我已经编写了一个使用 if 循环的代码,但是我需要扫描超过 100 万行并且它没有给我想要的输出。
for i in range(len(final_df)):
if(final_df['Quantity'][i] != 0):
final_df['new_sum'][i] = final_df['Sum'][i].shift(1).sum()
else:
final_df['new_sum'][i] = final_df['Sum'][i]
解决方案
澄清后编辑答案...
由于列表理解和循环,这个会有点慢。
设置:
import pandas as pd
import numpy as np
data = [[ 0, -12, 0],
[ 1, 23, 0],
[ 2, -10, 0],
[ 3, -12, 0],
[ 4, 100, 0],
[ 5, 102, 201],
[ 6, 34, 0],
[ 7, -34, 0],
[ 8, -23, 0],
[ 9, 100, 0],
[ 10, 100, 0],
[ 11, 102, 300]]
df = pd.DataFrame(data, columns=['Code', 'Sum', 'Quantity'])
print(df)
Code Sum Quantity
0 0 -12 0
1 1 23 0
2 2 -10 0
3 3 -12 0
4 4 100 0
5 5 102 201
6 6 34 0
7 7 -34 0
8 8 -23 0
9 9 100 0
10 10 100 0
11 11 102 300
代码:
# copy columns from input dataframe and invert
df1 = df[['Sum', 'Quantity']][::-1].copy()
# make an array to hold result column values
new_sum_array = np.zeros(len(df1)).astype(int)
df_sum = df1.Sum.values
# locate the indices of the pos values in "Quantity".
# these will be used for segmenting the "Sum" column values
posQ = (np.where(df1.Quantity > 0)[0])
# # don't want zero or last index value in posQ for splitting
if posQ[0] == 0:
posQ = posQ[1:]
if posQ[-1] == len(df)-1:
posQ = posQ[:-1]
# se (start-end)
# add first and last index to be used for indexing segments of df_sum
se = posQ.copy()
se = np.insert(se, 0, 0)
se = np.append(se, len(df))
starts = se[:-1]
ends = se[1:]
# keep only positive values from the df_sum array.
# this is used with numpy argmin to find first non-positive number
# within segments
only_pos = np.add(np.zeros(len(df)), np.where(df_sum > 0, df_sum, 0))
# split the only_positive array at Quantity locations
segs = np.array(np.split(only_pos, posQ))
# find the indices of the neg numbers within each segment
tgts = [np.argmin(x) for x in segs]
# use the indices to slice each segment and put the result into
# the result array
i = 0
for seg in segs:
idx = np.arange(starts[i], ends[i])
np.put(new_sum_array, idx[tgts[i]:], df_sum[idx][tgts[i]:])
i += 1
# to add a lookback limit for adding consecutive positive df_sums,
# assign an integer value to max_lookback in next line.
# use "None" to ignore any limit
max_lookback = None
if max_lookback is not None:
tgts = np.clip(tgts, 0, max_lookback)
# add up the values of the positive numbers in the sliced
# df_sum segments
sums = [np.sum(x[:l]) for x, l in zip(segs, tgts)]
# put those totals into the result array at positive "Quality" locations
np.put(new_sum_array, starts, sums)
# add the results to the df as "New Sum"
df1['New Sum'] = new_sum_array
# flip the dataframe back upright
df1 = df1[::-1]
# insert calculated column into original dataframe
df['new sum'] = df1['New Sum']
结果:
print(df)
Code Sum Quantity New Sum
0 0 -12 0 -12
1 1 23 0 23
2 2 -10 0 -10
3 3 -12 0 -12
4 4 100 0 0
5 5 102 201 202
6 6 34 0 34
7 7 -34 0 -34
8 8 -23 0 -23
9 9 100 0 0
10 10 100 0 0
11 11 102 300 302
推荐阅读
- java - Apache Camel 3.0.1 无法启动并消耗路由器到 FTPs 文件夹
- laravel - 可以将多字变量传递给刀片组件吗?
- python - 此代码会导致无效的语法错误。为什么?
- javascript - http模块的后端和前端
- javascript - 为什么我在云函数节点 js 中得到减函数错误?
- asp.net - 无法使用 ASP.NET 4.6.1 应用创建 Azure 存储队列消息
- python - 如何在pygame中用小图像填充背景?
- algorithmic-trading - 使用 MQL5 向智能交易系统添加追踪止损
- java - 为什么我在调用 glMultiDrawElementsIndirect 时收到错误 INVALID_OPERATION
- node.js - Intellisense 不适用于使用 NPM 安装的软件包