首页 > 解决方案 > Python Dataframe 循环操作效率

问题描述

我有两个输入,一个包含所有产品现有库存的数据框和一个数据框列表,其中每个数据框包含多个产品的订单。订单数据框可以包含同一产品的多个订单,并且订单可以大于现有的可用库存。

样本输入数据:

包含订单的数据框列表

[    prodID    orderQTY
 0   ABC1      456
 1   ABC2      703
 2   ABC3      359
 3   ABC4      492
 4   ABC5      824
 ..  ...       ...
 [100 rows x 2 columns],
     prodID    orderQTY
 0   ABC10     805
 1   ABC11     860
 2   ABC12     651
 3   ABC13     662
 4   ABC14     802
 ..  ...       ...
 [100 rows x 2 columns],
     prodID    orderQTY
 0   ABC100    111
 1   ABC101    834
 2   ABC102    302
 3   ABC103    386
 4   ABC104    318
 ..  ...       ...
 [100 rows x 2 columns]]

手头有库存的数据框

    prodID    onHand
0   ABC1      37200
1   ABC2      38945
2   ABC3      38085
3   ABC4      43775
4   ABC5      10999
..  ...       ...
 [300 rows x 2 columns]

我编写了一个函数来遍历列表中每个数据框中的订单并更新手头的可用库存。然后,该函数将输出一个包含所有订单的数据框,其中显示了开始库存、订单数量和结束库存。

功能:

def fulfill_orders(order_list, stock):
# create empty dataframe where the results will go
combined = pd.DataFrame(columns=['prodID', 'startSTOCK', 'orderQTY', 'endSTOCK'])

# loop through dataframes in order_list
for orders in order_list:
    # create unique list of product ids
    ids = orders['prodID'].unique()
    # loop through unique list of product ids
    for id in ids:
        # create dataframe that consists of a single product id
        df = orders.groupby('prodID').get_group(id)
        # merge stock on hand onto single product id dataframe
        df = df.merge(stock, how='left', on='prodID')
        # rename stock on hand column
        df.rename(columns={'onHAND': 'startSTOCK'}, inplace=True)
        # create ending stock column and calculate value for first row
        df.loc[0,'endSTOCK'] = df.loc[0, 'startSTOCK'] - df.loc[0, 'orderQTY']
        # check if ending stock for first row is less than zero
        if df.loc[0, 'endSTOCK'] < 0:
            # reset negative first row ending stock to starting stock
            df.loc[0, 'endSTOCK'] = df.loc[0, 'startSTOCK']
        # loop through remaining rows
        for i in range(1, len(df)):
            # set starting stock to previous ending stock
            df.loc[i, 'startSTOCK'] = df.loc[i-1, 'endSTOCK']
            # calculate new ending stock
            df.loc[i, 'endSTOCK'] = df.loc[i, 'startSTOCK'] - df.loc[i, 'orderQTY']
            # check if ending stock is less than zero
            if df.loc[i, 'endSTOCK'] < 0:
                # reset negative ending stock to starting stock
                df.loc[i, 'endSTOCK'] = df.loc[i, 'startSTOCK']
        # append results to output dataframe
        combined = combined.append(df, ignore_index=True)
        # remove extra columns
        df = df[['prodID', 'endSTOCK']]
        # rename columns
        df.rename(columns={'endSTOCK': 'onHAND'}, inplace=True)
        # keep only last value
        df = df.iloc[[-1]]
        # update the stock on hand dataframe
        stock = pd.concat([stock, df]).drop_duplicates(['prodID'], keep='last')

# change column types to integer
combined['startSTOCK'] = combined.startSTOCK.astype(int)
combined['endSTOCK'] = combined.endSTOCK.astype(int)

# reorder columns
combined = combined[['prodID', 'startSTOCK', 'orderQTY', 'endSTOCK']]

# reset index
combined.reset_index(drop=True, inplace=True)

# drop orders not fulfilled
combined.drop(combined[(combined.startSTOCK <= 0) | (combined.orderQTY > combined.startSTOCK)].index, inplace=True)

# reset index
combined.reset_index(drop=True, inplace=True)

return combined.sort_values(['prodID', 'startSTOCK'], ascending=[True, False])

样本输出:

    prodID    startSTOCK    orderQTY    endSTOCK
0   ABC1      37200         456         36744
1   ABC2      38945         703         38242
2   ABC3      38085         359         37726
3   ABC4      43775         492         43283
4   ABC5      10999         824         10175
..  ...       ...           ...         ...
[300 rows x 4 columns]

我担心这个函数在较大的输入数据上可能效率低下。有没有更有效的方法来实现我想要的结果?

标签: pythonpandasperformanceloopsdataframe

解决方案


设置

df_order

输出

  prodID  orderQTY
0   ABC1       456
1   ABC2       703
2   ABC3       359
3   ABC4       492
4   ABC5       824

它将 DataFrames ( dfs) 列表作为您显示的第一个 DataFrame 的第一部分,重复了 3 次。

dfs = [df_order] * 3
df_stock = df_stock.set_index('prodID')

过程

df_orders = pd.concat(dfs).groupby('prodID').sum()
df_all = df_stock.join(df_orders)
df_all['endSTOCK'] = df_all['onHand'] - df_all['orderQTY']
df_all

返回

        onHand  orderQTY  endSTOCK
prodID
ABC1     37200      1368     35832
ABC2     38945      2109     36836
ABC3     38085      1077     37008
ABC4     43775      1476     42299
ABC5     10999      2472      8527

推荐阅读