首页 > 解决方案 > 数据框过滤器仅适用于多个 for 循环,不适用于列表理解

问题描述

我必须遍历 5 个不同的唯一列表,并且我的代码使用 5 个嵌套的 for 循环花费的时间太长。所以我使用列表理解将我的循环压缩为一行代码。但是现在我收到一条错误消息,说它在尝试创建一个名为 df_filter 的新数据框时找不到我的第一个迭代器。创建 df_filter 数据框在使用 5 个 for 循环时可以正常工作,但在使用列表理解时则不行。它一直说找不到第一个迭代器“winning_scenario”。任何想法如何让它与列表理解一起工作?谢谢!

# Original Iteration Code Using 5 For Loops:
# Starting time before Loop:
t1 = time.time()

# Create unique lists of product, geography, segment, winning scenario type, and size:
p_list = list(df_final3['Product'].unique())
g_list = list(df_final3['Geography'].unique())
ctv_list = list(df_final3['Custom_Type_Value'].unique())
wst_list = list(df_final3['Winning Scenario Type'].unique())
cto_list = list(df_final3['Custom Total Ounce Value'].unique())

# Create Empty Dataframe:
df_final4 = pd.DataFrame()

# Create First Loop:
for winning_scenario in wst_list:
    # Create Second Loop:
    for geography in g_list:
        # Create Third Loop:
        for segment_type in ctv_list:
            # Create Fourth Loop:
            for products in p_list:
                # Create Fifth Loop:
                for sizes in cto_list:
                    # Create Filter of Dataframe:
                    df_filter = df_final3[(df_final3['Winning Scenario Type']==winning_scenario) & (df_final3.Geography==geography) & (df_final3.Custom_Type_Value==segment_type) & (df_final3.Product==products) & (df_final3['Custom Total Ounce Value']==sizes)]


##### New Iteration Code Using List Comprehension:
# Starting time before Loop:
t1 = time.time()

# Create Empty Dataframe:
df_final4 = pd.DataFrame()

# Create Unique Lists of Geography, Product, Custom Type Value, Custom Total Ounce Value, Winning Scenario PED Type:
geo_list2 = list(df_final3['Geography'].unique())
product_list2 = list(df_final3['Product'].unique())
segment_list2 = list(df_final3['Custom_Type_Value'].unique())
size_list2 = list(df_final3['Custom Total Ounce Value'].unique())
scenario_list2 = list(df_final3['Winning Scenario Type'].unique())

# Create Loop:
[(winning_scenario, geography, segment_type, products, sizes) for scenario in scenario_list2 for geography in geo_list2 for segment_type in segment_list2 for products in product_list2 for sizes in size_list2]
# Create Filter of Dataframe:
df_filter = df_final3[(df_final3['Winning Scenario Type']==winning_scenario) & (df_final3.Geography==geography) & (df_final3.Custom_Type_Value==segment_type) & (df_final3.Product==products) & (df_final3['Custom Total Ounce Value']==sizes)]

这是我在尝试使用列表理解时遇到的错误“NameError: name 'winning_scenario' is not defined”

编辑:这是我的完整代码:我正在做的是基于 5 个不同的唯一列表进行迭代并计算价格弹性。然后将此信息放入名为 df_final4 的新数据框中。我的代码有效,问题是由于必须使用 5 个 for 循环而花费的时间太长。所以只是试图找到一种更快的方法或将这段代码压缩下来。

# Starting time before Loop:
t1 = time.time()

# Create unique lists of product, geography, segment, winning scenario type, and size:
p_list = list(df_final3['Product'].unique())
g_list = list(df_final3['Geography'].unique())
ctv_list = list(df_final3['Custom_Type_Value'].unique())
wst_list = list(df_final3['Winning Scenario Type'].unique())
cto_list = list(df_final3['Custom Total Ounce Value'].unique())

# Create Empty Dataframe:
df_final4 = pd.DataFrame()

# Create First Loop:
for winning_scenario in wst_list:
    # Create Second Loop:
    for geography in g_list:
        # Create Third Loop:
        for segment_type in ctv_list:
            # Create Fourth Loop:
            for products in p_list:
                # Create Fifth Loop:
                for sizes in cto_list:
                    # Create Filter of Dataframe:
                    df_filter = df_final3[(df_final3['Winning Scenario Type']==winning_scenario) & (df_final3.Geography==geography) & (df_final3.Custom_Type_Value==segment_type) & (df_final3.Product==products) & (df_final3['Custom Total Ounce Value']==sizes)]
                    # Calculate Price Elasticity of Demand:
                    df_filter['ped'] = np.divide(df_filter['Delta_Demand'].values,df_filter['Delta_Price'].values)
                    # Extract Max ped value:
                    df_filter['ped max'] = df_filter.ped.max()
                    # Extract Min ped value:
                    df_filter['ped min'] = df_filter.ped.min()
                    
                    # Save Columns from df_filter that we only want:
                    df_filter = df_filter[['Geography','Manufacturer Name', 'Brand Name', 'Product','Custom_Type_Value','Custom Total Ounce Value', 'Winning Scenario Type','Price', 'Unit_Sales','ped', 'ped max', 'ped min']]
                    
                    # Map values from df_final3 to df_filter:
                    unmatch2 = df_final3[(~df_final3.Geography.isin(df_filter.Geography)) & (~df_final3['Manufacturer Name'].isin(df_filter['Manufacturer Name'])) & (~df_final3['Brand Name'].isin(df_filter['Brand Name'])) & (~df_final3.Product.isin(df_filter.Product)) & (~df_final3.Custom_Type_Value.isin(df_filter.Custom_Type_Value)) &(~df_final3['Winning Scenario Type'].isin(df_filter['Winning Scenario Type'])) & (~df_final3.Price.isin(df_filter.Price)) & (~df_final3.Unit_Sales.isin(df_filter.Unit_Sales))]
        
                    # Concat together:
                    i2 = pd.concat([df_filter, unmatch2],ignore_index=False)
                    df_final4 = pd.concat([df_final4, i2], ignore_index=False)

# Time taken by Loop:
t2 = time.time()
df_final4
print("Time taken by Loop: %.6f" %(t2 - t1))

标签: pythonpandaslistloops

解决方案


正如 tomjn 建议的那样,您可能希望使用groupby来计算 、 、 和 的每种Product组合GeographyCustom_Type_Value价格Winning Scenario Type弹性Custom Total Ounce Value

尝试这样的事情:

def calculate_ped(df: pd.DataFrame):
    for _, group in df.groupby(
        [
            "Product",
            "Geography",
            "Custom_Type_Value",
            "Winning Scenario Type",
            "Custom Total Ounce Value",
        ]
    ):
        ped = np.divide(
            group["Delta_Demand"].values,
            group["Delta_Price"].values,
        )
        df.loc[group.index, "ped"] = ped
        df.loc[group.index, "ped min"] = ped.min()
        df.loc[group.index, "ped max"] = ped.max()
    return df

df_final4 = calculate_ped(df_final3)

推荐阅读