首页 > 解决方案 > 带有 for 循环的 Pandas get_dummies()

问题描述

我想创建一个单热编码器。当我只尝试一个国家/地区时,它可以工作,但是当我执行循环时,它会返回一个错误。这里有几行DF,所以你可以想象。

import pandas as pd
df = pd.DataFrame({"country":["United States", "India, United States", "Italy, India"],
"listed_in":["Comedies", "Action, Comedies", "Adventure, Action"]})

接下来,我隔离了执行以下代码的国家:

countries = list(netflix_clean["country"].apply(str).unique())

only_first_country = []

for country in countries:
    only_first_country.extend(re.findall(r"^(.+?),", country))

countries_clean = list(set(only_first_country))

然后,我尝试仅输入一个国家/地区,以查看该功能是否有效(确实有效)。

def dummy_country_genre(df, country):
    country_1 = df[df["country"].str.contains(country, case=False, regex=False, na=False)]
    
    return country_1[["listed_in"]].unstack().str.get_dummies(sep=", ").sum(level = 0).rename(index={"listed_in":country})

dummy_country_genre(netflix_clean, "United States")

但是当我循环时它返回以下错误:'list' object has no attribute 'upper'

这是失败的代码:

def dummy_country_genre_test2(df, country):
    ctry = [c for c in country]
    country_1 = df[df["country"].str.contains(ctry, case=False, regex=False, na=False)]
    
    return country_1[["listed_in"]].unstack().str.get_dummies(sep=", ").sum(level = 0).rename(index={"listed_in":ctry})

dummy_country_genre_test2(netflix_clean, countries_clean)

编辑

我用该代码解决了这个问题,以防万一其他人需要它:

def dummy_country_genre_test2(df, country):
    list_of = []
    for c in country:
        country_1 = df[df["country"].str.contains(c, case=False, regex=False, na=False)]
        df2 = country_1[["listed_in"]].unstack().str.get_dummies(sep=", ").sum(level = 0).rename(index={"listed_in":c})
        list_of.append(df2)
          
    result = pd.concat(list_of).fillna(0).astype(int)
    
    return result

dummy_country_genre_test2(netflix_clean, countries_clean)

标签: pythonpython-3.xpandas

解决方案


推荐阅读