首页 > 解决方案 > 数据框特定行的条件

问题描述

这些是我的数据集的属性。

数据集

我的目标是计算巴黎公寓的平均邮政编码价格(总共 20 个区,列名是“邮政编码”)。因为原始数据集没有 avg_zip_price_app 列,所以我必须创建它。

def get_avg_zip_appartment_price(df, zip):
    price = 0
    if np.where(df["Zipcode"] == zip): # this row's zipcode
        price = 12811 
    elif np.where(df["Zipcode"] == zip):
        price = 11623
    elif np.where(df["Zipcode"] == zip):
        price = 12345
    elif np.where(df["Zipcode"] == zip):
        price = 13197
    elif np.where(df["Zipcode"] == zip):
        price = 12335
    elif np.where(df["Zipcode"] == zip):
        price = 14420
    elif np.where(df["Zipcode"] == zip):
        price = 13899
    elif np.where(df["Zipcode"] == zip):
        price = 11673
    elif np.where(df["Zipcode"] == zip):
        price = 10932
    elif np.where(df["Zipcode"] == zip):
        price = 10301
    elif np.where(df["Zipcode"] == zip):
        price = 9244
    elif np.where(df["Zipcode"] == zip):
        price = 9146
    elif np.where(df["Zipcode"] == zip):
        price = 10032
    elif np.where(df["Zipcode"] == zip):
        price = 9951
    elif np.where(df["Zipcode"] == zip):
        price = 9350
    elif np.where(df["Zipcode"] == zip):
        price = 11079
    elif np.where(df["Zipcode"] == zip):
        price = 10687
    elif np.where(df["Zipcode"] == zip):
        price = 9664
    elif np.where(df["Zipcode"] == zip):
        price = 8385
    elif np.where(df["Zipcode"] == zip):
        price = 8744
    return price 

conditions = [
    (df['Zipcode'] == 75001),
    (df['Zipcode'] == 75002),
    (df['Zipcode'] == 75003),
    (df['Zipcode'] == 75004),
    (df['Zipcode'] == 75005),
    (df['Zipcode'] == 75006),
    (df['Zipcode'] == 75007),
    (df['Zipcode'] == 75008),
    (df['Zipcode'] == 75009),
    (df['Zipcode'] == 75010),
    (df['Zipcode'] == 75011),
    (df['Zipcode'] == 75012),
    (df['Zipcode'] == 75013),
    (df['Zipcode'] == 75014),
    (df['Zipcode'] == 75015),
    (df['Zipcode'] == 75016),
    (df['Zipcode'] == 75017),
    (df['Zipcode'] == 75018),
    (df['Zipcode'] == 75019),
    (df['Zipcode'] == 75020)
]
choices = [
    get_avg_zip_appartment_price(user_df, 75001), get_avg_zip_appartment_price(user_df, 75002),get_avg_zip_appartment_price(user_df, 75003),
    get_avg_zip_appartment_price(user_df, 75004), get_avg_zip_appartment_price(user_df, 75005),get_avg_zip_appartment_price(user_df, 75006),
    get_avg_zip_appartment_price(user_df, 75007),get_avg_zip_appartment_price(user_df, 75008),get_avg_zip_appartment_price(user_df, 75009),
    get_avg_zip_appartment_price(user_df, 75010),get_avg_zip_appartment_price(user_df, 75011),get_avg_zip_appartment_price(user_df, 75012),
    get_avg_zip_appartment_price(user_df, 75013),get_avg_zip_appartment_price(user_df, 75014),get_avg_zip_appartment_price(user_df, 75015),
    get_avg_zip_appartment_price(user_df, 75016),get_avg_zip_appartment_price(user_df, 75017),get_avg_zip_appartment_price(user_df, 75018),
    get_avg_zip_appartment_price(user_df, 75019),get_avg_zip_appartment_price(user_df, 75020)]
user_df['avg_zip_price_app'] = np.select(conditions, choices)
print(user_df.head())

但每次观察我总是得到相同的值。是不是因为我的 get_avg_zip_appartment_price(df, zip) 方法中针对行条件的语法不正确,因此每次调用该方法时,它都会检查第一行并且它是真的,所以所有行的价格值总是相同的? 这是我得到的结果:

的输出

标签: pythondataframe

解决方案


您的代码中的错误:

np.where(df["Zipcode"] == zip) #This will return true whenever there is a zip entry in df.

如果zip = -1get_avg_zip_appartment_price(df, zip)则将return 0,因为它不会与 df 中的任何记录匹配。

您可以使用字典键值对为邮政编码提供价格。


推荐阅读