python - Pandas:从列内的值创建虚拟变量
问题描述
我有一个数据框,其中有一列称为Actors
每个单元格都包含这样的字符串"Abigail Breslin, Greg Kinnear, Paul Dano, Alan Arkin"
。我希望拆分此字符串,(",")
以便单元格包含每个参与者的列表,即["Abigail Breslin", "Greg Kinnear, "Paul Dano, "Alan Arkin"]
我可以为每个唯一参与者创建虚拟变量。我还没有找到一个解决方案,它实际上将字符串分开并将相应的演员名称发送到一个新列中。
任何帮助将不胜感激:)
我的数据框(df)看起来像这样
Title (Object)| Actors (Object) | Year (Object)
Pulp Fiction | Bruce Willis, Amanda Plummer, Laura Lovelace, John Travolta | 1994
Fight Club | Edward Norton, Brad Pitt, Helena Bonham Carter, Meat Loaf | 1999
我的目标是让我的数据框看起来像这样
Title (Object)| Bruce Willis | Amanda Plummer | Laura Lovelace | John Travolta | Edward Norton | Year
Pulp Fiction | 1 | 1 | 1 | 1 | 0 | 1994
Fight Club | 0 | 0 | 0 | 0 | 1 | 1999
我努力了
import pandas as pd
data = 'Imdb_datajson(Cleaned).csv'
df = pd.read_csv(data)
list_of_unique_actors = df.Actors.unique().tolist()
list_of_unique_actors
newlist = []
for actor in list_of_unique_actors:
actor = actor.split(",")
newlist.extend(actor)
并收到此错误
AttributeError Traceback (most recent call last)
<ipython-input-48-ae50a804fe05> in <module>
5 newlist = []
6 for word in list_of_unique_actors:
----> 7 word = word.split(",")
8 newlist.extend(word)
9 return newlist
AttributeError: 'float' object has no attribute 'split'
解决方案
利用pd.get_dummies()
# sample data
s = """Title (Object)|Actors (Object)|Year (Object)
Pulp Fiction|Bruce Willis, Amanda Plummer, Laura Lovelace, John Travolta|1994
Fight Club|Edward Norton, Brad Pitt, Helena Bonham Carter, Meat Loaf|1999"""
# read csv
df = pd.read_csv(StringIO(s), sep='|')
# split your string of actors into a list
df['Actors (Object)'] = df['Actors (Object)'].str.split(', ')
# set the title and year as index
df = df.set_index(['Title (Object)', 'Year (Object)'])
# get_dummies
dummy_df = pd.get_dummies(df['Actors (Object)'].apply(pd.Series).stack()).sum(level=[0,1])
Edward Norton Amanda Plummer Brad Pitt \
Title (Object) Year (Object)
Pulp Fiction 1994 0 1 0
Fight Club 1999 1 0 1
Bruce Willis Helena Bonham Carter \
Title (Object) Year (Object)
Pulp Fiction 1994 1 0
Fight Club 1999 0 1
John Travolta Laura Lovelace Meat Loaf
Title (Object) Year (Object)
Pulp Fiction 1994 1 1 0
Fight Club 1999 0 0 1
推荐阅读
- php - 如何格式化包含 PHP 中不同格式的文件中的日期?
- javascript - 在 Axios 中分配响应
- http - 某些服务如何跟踪每个收件人打开的电子邮件?
- android - 在动态加载的外部类中使用本机库(带反射)
- loops - 每行的sas循环
- reactjs - 如何在 Apollo Query Render Props 中调用 setState?
- javascript - javascript:如何拆分包含两位数整数的字符串?
- android - 为什么我在聊天应用程序中出现超时?
- python-2.7 - 每当我尝试使用 pip 安装模块时,都会发生错误
- python - 无法在生产服务器中连接 Amazon RDS,但在本地服务器上连接