python - Python Dataframe 循环操作效率
问题描述
我有两个输入,一个包含所有产品现有库存的数据框和一个数据框列表,其中每个数据框包含多个产品的订单。订单数据框可以包含同一产品的多个订单,并且订单可以大于现有的可用库存。
样本输入数据:
包含订单的数据框列表
[ prodID orderQTY
0 ABC1 456
1 ABC2 703
2 ABC3 359
3 ABC4 492
4 ABC5 824
.. ... ...
[100 rows x 2 columns],
prodID orderQTY
0 ABC10 805
1 ABC11 860
2 ABC12 651
3 ABC13 662
4 ABC14 802
.. ... ...
[100 rows x 2 columns],
prodID orderQTY
0 ABC100 111
1 ABC101 834
2 ABC102 302
3 ABC103 386
4 ABC104 318
.. ... ...
[100 rows x 2 columns]]
手头有库存的数据框
prodID onHand
0 ABC1 37200
1 ABC2 38945
2 ABC3 38085
3 ABC4 43775
4 ABC5 10999
.. ... ...
[300 rows x 2 columns]
我编写了一个函数来遍历列表中每个数据框中的订单并更新手头的可用库存。然后,该函数将输出一个包含所有订单的数据框,其中显示了开始库存、订单数量和结束库存。
功能:
def fulfill_orders(order_list, stock):
# create empty dataframe where the results will go
combined = pd.DataFrame(columns=['prodID', 'startSTOCK', 'orderQTY', 'endSTOCK'])
# loop through dataframes in order_list
for orders in order_list:
# create unique list of product ids
ids = orders['prodID'].unique()
# loop through unique list of product ids
for id in ids:
# create dataframe that consists of a single product id
df = orders.groupby('prodID').get_group(id)
# merge stock on hand onto single product id dataframe
df = df.merge(stock, how='left', on='prodID')
# rename stock on hand column
df.rename(columns={'onHAND': 'startSTOCK'}, inplace=True)
# create ending stock column and calculate value for first row
df.loc[0,'endSTOCK'] = df.loc[0, 'startSTOCK'] - df.loc[0, 'orderQTY']
# check if ending stock for first row is less than zero
if df.loc[0, 'endSTOCK'] < 0:
# reset negative first row ending stock to starting stock
df.loc[0, 'endSTOCK'] = df.loc[0, 'startSTOCK']
# loop through remaining rows
for i in range(1, len(df)):
# set starting stock to previous ending stock
df.loc[i, 'startSTOCK'] = df.loc[i-1, 'endSTOCK']
# calculate new ending stock
df.loc[i, 'endSTOCK'] = df.loc[i, 'startSTOCK'] - df.loc[i, 'orderQTY']
# check if ending stock is less than zero
if df.loc[i, 'endSTOCK'] < 0:
# reset negative ending stock to starting stock
df.loc[i, 'endSTOCK'] = df.loc[i, 'startSTOCK']
# append results to output dataframe
combined = combined.append(df, ignore_index=True)
# remove extra columns
df = df[['prodID', 'endSTOCK']]
# rename columns
df.rename(columns={'endSTOCK': 'onHAND'}, inplace=True)
# keep only last value
df = df.iloc[[-1]]
# update the stock on hand dataframe
stock = pd.concat([stock, df]).drop_duplicates(['prodID'], keep='last')
# change column types to integer
combined['startSTOCK'] = combined.startSTOCK.astype(int)
combined['endSTOCK'] = combined.endSTOCK.astype(int)
# reorder columns
combined = combined[['prodID', 'startSTOCK', 'orderQTY', 'endSTOCK']]
# reset index
combined.reset_index(drop=True, inplace=True)
# drop orders not fulfilled
combined.drop(combined[(combined.startSTOCK <= 0) | (combined.orderQTY > combined.startSTOCK)].index, inplace=True)
# reset index
combined.reset_index(drop=True, inplace=True)
return combined.sort_values(['prodID', 'startSTOCK'], ascending=[True, False])
样本输出:
prodID startSTOCK orderQTY endSTOCK
0 ABC1 37200 456 36744
1 ABC2 38945 703 38242
2 ABC3 38085 359 37726
3 ABC4 43775 492 43283
4 ABC5 10999 824 10175
.. ... ... ... ...
[300 rows x 4 columns]
我担心这个函数在较大的输入数据上可能效率低下。有没有更有效的方法来实现我想要的结果?
解决方案
设置
df_order
输出
prodID orderQTY
0 ABC1 456
1 ABC2 703
2 ABC3 359
3 ABC4 492
4 ABC5 824
它将 DataFrames ( dfs
) 列表作为您显示的第一个 DataFrame 的第一部分,重复了 3 次。
dfs = [df_order] * 3
df_stock = df_stock.set_index('prodID')
过程
df_orders = pd.concat(dfs).groupby('prodID').sum()
df_all = df_stock.join(df_orders)
df_all['endSTOCK'] = df_all['onHand'] - df_all['orderQTY']
df_all
返回
onHand orderQTY endSTOCK
prodID
ABC1 37200 1368 35832
ABC2 38945 2109 36836
ABC3 38085 1077 37008
ABC4 43775 1476 42299
ABC5 10999 2472 8527
推荐阅读
- docker - 将 DDEV 数据库挂载到主机文件系统,而不是在 docker 卷内
- python - 如何在固定接近度内获取单词周围的所有单词
- python - https 连接超时导致的 AWS Lambda 函数持续时间峰值
- c++ - 构造函数不初始化类元素
- javascript - 如何为玩家分配 2 位数的唯一 ID
- c# - 当敌人看到玩家时让他攻击
- protocols - 是否可以在 STM32F3 上使用 DMA 通过 SPI 传输无限数据?
- jquery - 作为参数发送到 $.ajax() 函数的 URL 未用于处理 POST 请求
- java - Checkstyle:单行 Java 注释在两个斜杠后是否应该有空格?
- python - Python/Pandas:如何对表进行透视,从而指定多索引或列中的唯一值构成结果列名的一部分?