首页 > 解决方案 > 大熊猫的条件加法

问题描述

我在熊猫中有以下数据框

   Code      Sum      Quantity
   0         -12      0
   1          23      0
   2         -10      0
   3         -12      0
   4         100      0
   5         102      201
   6          34      0
   7         -34      0
   8         -23      0
   9         100      0
   10        100      0
   11        102      300

我想要的数据框是

  Code      Sum      Quantity    new_sum
   0         -12      0          -12
   1          23      0           23
   2         -10      0          -10
   3         -12      0          -12
   4         100      0           0
   5         102      201         202 
   6          34      0           34
   7         -34      0          -34
   8         -23      0          -23
   9         100      0           0
   10        100      0           0
   11        102      300         302

逻辑是

首先,我将检查数量上的非零值,在上面的示例数据中,我们得到了第一个非零出现的数量,index 4 which is 201然后我想添加 sum 列,直到我在上面的行中得到正值index 4

我已经编写了一个使用 if 循环的代码,但是我需要扫描超过 100 万行并且它没有给我想要的输出。

for i in range(len(final_df)):
   if(final_df['Quantity'][i] != 0):
      final_df['new_sum'][i] = final_df['Sum'][i].shift(1).sum()
   else:
      final_df['new_sum'][i] = final_df['Sum'][i] 

标签: pythonpandas

解决方案


澄清后编辑答案...

由于列表理解和循环,这个会有点慢。

设置:

import pandas as pd
import numpy as np

data = [[  0, -12,   0],
        [  1,  23,   0],
        [  2, -10,   0],
        [  3, -12,   0],
        [  4, 100,   0],
        [  5, 102, 201],
        [  6,  34,   0],
        [  7, -34,   0],
        [  8, -23,   0],
        [  9, 100,   0],
        [ 10, 100,   0],
        [ 11, 102, 300]]

df = pd.DataFrame(data, columns=['Code', 'Sum', 'Quantity'])

print(df)

    Code  Sum  Quantity
0      0  -12         0
1      1   23         0
2      2  -10         0
3      3  -12         0
4      4  100         0
5      5  102       201
6      6   34         0
7      7  -34         0
8      8  -23         0
9      9  100         0
10    10  100         0
11    11  102       300

代码:

# copy columns from input dataframe and invert
df1 = df[['Sum', 'Quantity']][::-1].copy()

# make an array to hold result column values
new_sum_array = np.zeros(len(df1)).astype(int)
df_sum = df1.Sum.values

# locate the indices of the pos values in "Quantity".
# these will be used for segmenting the "Sum" column values
posQ = (np.where(df1.Quantity > 0)[0])

# # don't want zero or last index value in posQ for splitting
if posQ[0] == 0:
    posQ = posQ[1:]
if posQ[-1] == len(df)-1:
    posQ = posQ[:-1]

# se (start-end)
# add first and last index to be used for indexing segments of df_sum
se = posQ.copy()
se = np.insert(se, 0, 0)
se = np.append(se, len(df))

starts = se[:-1]
ends = se[1:]

# keep only positive values from the df_sum array.
# this is used with numpy argmin to find first non-positive number
# within segments
only_pos = np.add(np.zeros(len(df)), np.where(df_sum > 0, df_sum, 0))

# split the only_positive array at Quantity locations
segs = np.array(np.split(only_pos, posQ))

# find the indices of the neg numbers within each segment
tgts = [np.argmin(x) for x in segs]

# use the indices to slice each segment and put the result into
# the result array
i = 0
for seg in segs:
    idx = np.arange(starts[i], ends[i])
    np.put(new_sum_array, idx[tgts[i]:], df_sum[idx][tgts[i]:])
    i += 1

# to add a lookback limit for adding consecutive positive df_sums,
# assign an integer value to max_lookback in next line.
# use "None" to ignore any limit
max_lookback = None
if max_lookback is not None:
    tgts = np.clip(tgts, 0, max_lookback)

# add up the values of the positive numbers in the sliced
# df_sum segments
sums = [np.sum(x[:l]) for x, l in zip(segs, tgts)]

# put those totals into the result array at positive "Quality" locations
np.put(new_sum_array, starts, sums)

# add the results to the df as "New Sum"
df1['New Sum'] = new_sum_array

# flip the dataframe back upright
df1 = df1[::-1]
# insert calculated column into original dataframe
df['new sum'] = df1['New Sum']

结果:

print(df)

    Code  Sum  Quantity  New Sum
0      0  -12         0      -12
1      1   23         0       23
2      2  -10         0      -10
3      3  -12         0      -12
4      4  100         0        0
5      5  102       201      202
6      6   34         0       34
7      7  -34         0      -34
8      8  -23         0      -23
9      9  100         0        0
10    10  100         0        0
11    11  102       300      302

推荐阅读