python - 数据框过滤器仅适用于多个 for 循环,不适用于列表理解
问题描述
我必须遍历 5 个不同的唯一列表,并且我的代码使用 5 个嵌套的 for 循环花费的时间太长。所以我使用列表理解将我的循环压缩为一行代码。但是现在我收到一条错误消息,说它在尝试创建一个名为 df_filter 的新数据框时找不到我的第一个迭代器。创建 df_filter 数据框在使用 5 个 for 循环时可以正常工作,但在使用列表理解时则不行。它一直说找不到第一个迭代器“winning_scenario”。任何想法如何让它与列表理解一起工作?谢谢!
# Original Iteration Code Using 5 For Loops:
# Starting time before Loop:
t1 = time.time()
# Create unique lists of product, geography, segment, winning scenario type, and size:
p_list = list(df_final3['Product'].unique())
g_list = list(df_final3['Geography'].unique())
ctv_list = list(df_final3['Custom_Type_Value'].unique())
wst_list = list(df_final3['Winning Scenario Type'].unique())
cto_list = list(df_final3['Custom Total Ounce Value'].unique())
# Create Empty Dataframe:
df_final4 = pd.DataFrame()
# Create First Loop:
for winning_scenario in wst_list:
# Create Second Loop:
for geography in g_list:
# Create Third Loop:
for segment_type in ctv_list:
# Create Fourth Loop:
for products in p_list:
# Create Fifth Loop:
for sizes in cto_list:
# Create Filter of Dataframe:
df_filter = df_final3[(df_final3['Winning Scenario Type']==winning_scenario) & (df_final3.Geography==geography) & (df_final3.Custom_Type_Value==segment_type) & (df_final3.Product==products) & (df_final3['Custom Total Ounce Value']==sizes)]
##### New Iteration Code Using List Comprehension:
# Starting time before Loop:
t1 = time.time()
# Create Empty Dataframe:
df_final4 = pd.DataFrame()
# Create Unique Lists of Geography, Product, Custom Type Value, Custom Total Ounce Value, Winning Scenario PED Type:
geo_list2 = list(df_final3['Geography'].unique())
product_list2 = list(df_final3['Product'].unique())
segment_list2 = list(df_final3['Custom_Type_Value'].unique())
size_list2 = list(df_final3['Custom Total Ounce Value'].unique())
scenario_list2 = list(df_final3['Winning Scenario Type'].unique())
# Create Loop:
[(winning_scenario, geography, segment_type, products, sizes) for scenario in scenario_list2 for geography in geo_list2 for segment_type in segment_list2 for products in product_list2 for sizes in size_list2]
# Create Filter of Dataframe:
df_filter = df_final3[(df_final3['Winning Scenario Type']==winning_scenario) & (df_final3.Geography==geography) & (df_final3.Custom_Type_Value==segment_type) & (df_final3.Product==products) & (df_final3['Custom Total Ounce Value']==sizes)]
这是我在尝试使用列表理解时遇到的错误“NameError: name 'winning_scenario' is not defined”
编辑:这是我的完整代码:我正在做的是基于 5 个不同的唯一列表进行迭代并计算价格弹性。然后将此信息放入名为 df_final4 的新数据框中。我的代码有效,问题是由于必须使用 5 个 for 循环而花费的时间太长。所以只是试图找到一种更快的方法或将这段代码压缩下来。
# Starting time before Loop:
t1 = time.time()
# Create unique lists of product, geography, segment, winning scenario type, and size:
p_list = list(df_final3['Product'].unique())
g_list = list(df_final3['Geography'].unique())
ctv_list = list(df_final3['Custom_Type_Value'].unique())
wst_list = list(df_final3['Winning Scenario Type'].unique())
cto_list = list(df_final3['Custom Total Ounce Value'].unique())
# Create Empty Dataframe:
df_final4 = pd.DataFrame()
# Create First Loop:
for winning_scenario in wst_list:
# Create Second Loop:
for geography in g_list:
# Create Third Loop:
for segment_type in ctv_list:
# Create Fourth Loop:
for products in p_list:
# Create Fifth Loop:
for sizes in cto_list:
# Create Filter of Dataframe:
df_filter = df_final3[(df_final3['Winning Scenario Type']==winning_scenario) & (df_final3.Geography==geography) & (df_final3.Custom_Type_Value==segment_type) & (df_final3.Product==products) & (df_final3['Custom Total Ounce Value']==sizes)]
# Calculate Price Elasticity of Demand:
df_filter['ped'] = np.divide(df_filter['Delta_Demand'].values,df_filter['Delta_Price'].values)
# Extract Max ped value:
df_filter['ped max'] = df_filter.ped.max()
# Extract Min ped value:
df_filter['ped min'] = df_filter.ped.min()
# Save Columns from df_filter that we only want:
df_filter = df_filter[['Geography','Manufacturer Name', 'Brand Name', 'Product','Custom_Type_Value','Custom Total Ounce Value', 'Winning Scenario Type','Price', 'Unit_Sales','ped', 'ped max', 'ped min']]
# Map values from df_final3 to df_filter:
unmatch2 = df_final3[(~df_final3.Geography.isin(df_filter.Geography)) & (~df_final3['Manufacturer Name'].isin(df_filter['Manufacturer Name'])) & (~df_final3['Brand Name'].isin(df_filter['Brand Name'])) & (~df_final3.Product.isin(df_filter.Product)) & (~df_final3.Custom_Type_Value.isin(df_filter.Custom_Type_Value)) &(~df_final3['Winning Scenario Type'].isin(df_filter['Winning Scenario Type'])) & (~df_final3.Price.isin(df_filter.Price)) & (~df_final3.Unit_Sales.isin(df_filter.Unit_Sales))]
# Concat together:
i2 = pd.concat([df_filter, unmatch2],ignore_index=False)
df_final4 = pd.concat([df_final4, i2], ignore_index=False)
# Time taken by Loop:
t2 = time.time()
df_final4
print("Time taken by Loop: %.6f" %(t2 - t1))
解决方案
正如 tomjn 建议的那样,您可能希望使用groupby来计算 、 、 和 的每种Product
组合Geography
的Custom_Type_Value
价格Winning Scenario Type
弹性Custom Total Ounce Value
。
尝试这样的事情:
def calculate_ped(df: pd.DataFrame):
for _, group in df.groupby(
[
"Product",
"Geography",
"Custom_Type_Value",
"Winning Scenario Type",
"Custom Total Ounce Value",
]
):
ped = np.divide(
group["Delta_Demand"].values,
group["Delta_Price"].values,
)
df.loc[group.index, "ped"] = ped
df.loc[group.index, "ped min"] = ped.min()
df.loc[group.index, "ped max"] = ped.max()
return df
df_final4 = calculate_ped(df_final3)
推荐阅读
- ios - 将粒子从 Maya 导出到 SceneKit iOS
- python - SyntaxError 停止在脚本中验证 python 版本
- sql - 从 Athena 查询中的字典字符串中获取值
- idris - 为什么我不能使用匹配变量而不是变量值?
- javascript - 赛普拉斯 - 我如何获得只有部分名称的元素 ID
- reactjs - 带有功能组件的 ReactJS 中的表单处理
- python - 从scrapy中的href标签中提取完整的URL
- eclipse - 在 nattable eclispe 中使用 MultiLineTextCellEditor 单击 CellEditorDialog 中的输入按钮时出现问题
- python - 使用自定义函数创建新列
- arrays - Julia:初始化不同类型的数值数组