python - 带有 for 循环的 Pandas get_dummies()
问题描述
我想创建一个单热编码器。当我只尝试一个国家/地区时,它可以工作,但是当我执行循环时,它会返回一个错误。这里有几行DF,所以你可以想象。
import pandas as pd
df = pd.DataFrame({"country":["United States", "India, United States", "Italy, India"],
"listed_in":["Comedies", "Action, Comedies", "Adventure, Action"]})
接下来,我隔离了执行以下代码的国家:
countries = list(netflix_clean["country"].apply(str).unique())
only_first_country = []
for country in countries:
only_first_country.extend(re.findall(r"^(.+?),", country))
countries_clean = list(set(only_first_country))
然后,我尝试仅输入一个国家/地区,以查看该功能是否有效(确实有效)。
def dummy_country_genre(df, country):
country_1 = df[df["country"].str.contains(country, case=False, regex=False, na=False)]
return country_1[["listed_in"]].unstack().str.get_dummies(sep=", ").sum(level = 0).rename(index={"listed_in":country})
dummy_country_genre(netflix_clean, "United States")
但是当我循环时它返回以下错误:'list' object has no attribute 'upper'
这是失败的代码:
def dummy_country_genre_test2(df, country):
ctry = [c for c in country]
country_1 = df[df["country"].str.contains(ctry, case=False, regex=False, na=False)]
return country_1[["listed_in"]].unstack().str.get_dummies(sep=", ").sum(level = 0).rename(index={"listed_in":ctry})
dummy_country_genre_test2(netflix_clean, countries_clean)
编辑
我用该代码解决了这个问题,以防万一其他人需要它:
def dummy_country_genre_test2(df, country):
list_of = []
for c in country:
country_1 = df[df["country"].str.contains(c, case=False, regex=False, na=False)]
df2 = country_1[["listed_in"]].unstack().str.get_dummies(sep=", ").sum(level = 0).rename(index={"listed_in":c})
list_of.append(df2)
result = pd.concat(list_of).fillna(0).astype(int)
return result
dummy_country_genre_test2(netflix_clean, countries_clean)