python - 循环遍历数据框并将值直接添加到数据框中
问题描述
我有一个包含三列的 csv:
df.sample(5)
company username esg_company
320 CIBC NaN Canadian Imperial Bank of Commerce
206 Bank of Baroda NaN Bank of Baroda
820 Halliburton halliburton Halliburton Company
112 Luzhou Lao Jiao NaN Lu Zhou Lao Jiao Co.Ltd
144 Rabobank NaN Rabobank Nederland N.V.
现在我是:
遍历公司列中的每个公司
获取 Instagram 用户名(chromedriver/selenium)
附加到用户名列表
将列表变成列
from selenium import webdriver import pandas as pd ## defining companies companies = df['company'] ## creating blank list to house company usernames username_list = [] ## looping though each company for company in companies: do some stuff here with chromedriver and selenium to get the username from instagram ig_handle = username from instagram username_list.append(ig_handle) df['username'] = username_list
这个列表相当长,我想以不同的方式来做,因为任何错误都意味着列表长度和数据帧长度不一样。我想要:
- 循环遍历数据框
- 如果用户名列为空白,则将 ig_handle 直接添加到数据框中的适当位置(正在搜索的公司与公司列中的值匹配)
我不确定这是否可能,任何帮助表示赞赏!
解决方案
逐行循环是一个非常糟糕的主意。特别是对于大型数据集。这将需要永远和一天的时间才能完成。
最好使用 apply() 方法添加一个新列,该列通过函数/方法处理一些数据。
需要注意的另一件事是,您通常会使用公共 ID 加入数据集。这样,您可以在修改之前将 merge() 或 join() 合并到单个数据帧中。
我使用您的示例数据创建了一个示例,并显示了 apply() 和 join() 方法:
import pandas as pd
import io
pd.set_option('display.max_colwidth', 9999)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.precision", 5)
## Create test data
data_csv_str = '''id,name,location,description
320,CIBC,null,Canadian Imperial Bank of Commerce
206,Bank of Baroda,null,Bank of Baroda
820,Halliburton,halliburton,Halliburton Company
112,Luzhou Lao Jiao,null,Lu Zhou Lao Jiao Co.Ltd
144,Rabobank,null,Rabobank Nederland N.V.
'''
df_data = pd.read_csv(io.StringIO(data_csv_str), sep=',', index_col=0)
print(df_data)
'''
name location description
id
320 CIBC NaN Canadian Imperial Bank of Commerce
206 Bank of Baroda NaN Bank of Baroda
820 Halliburton halliburton Halliburton Company
112 Luzhou Lao Jiao NaN Lu Zhou Lao Jiao Co.Ltd
144 Rabobank NaN Rabobank Nederland N.V.
'''
## Create the instagram user data
data_ig_csv_str = '''id,ig_username
320,bubba
206,carla
555,charles
'''
df_ig_usernames = pd.read_csv(io.StringIO(data_ig_csv_str), sep=',', index_col=0)
print(df_ig_usernames)
'''
ig_username
id
320 bubba
206 carla
555 charles
'''
## Join the two dataframes
df_merged = df_data.join(df_ig_usernames)
print(df_merged)
'''
name location description ig_username
id
320 CIBC NaN Canadian Imperial Bank of Commerce bubba
206 Bank of Baroda NaN Bank of Baroda carla
820 Halliburton halliburton Halliburton Company NaN
112 Luzhou Lao Jiao NaN Lu Zhou Lao Jiao Co.Ltd NaN
144 Rabobank NaN Rabobank Nederland N.V. NaN
'''
## Create method/function to process and modify data
def myDataProcess(row):
rowIndex = row.name
name = row['name']
abbrev_name = name[0:3]
ig_name = row['ig_username']
s = '%s_%s_%s' % (rowIndex, abbrev_name, ig_name)
return s
## Apply the changes (via method) as new column
df_applied = df_merged
df_applied['id_nameAbbrev_ig'] = df_merged.apply(myDataProcess, axis=1)
print(df_applied)
'''
name location description ig_username id_nameAbbrev_ig
id
320 CIBC NaN Canadian Imperial Bank of Commerce bubba 320_CIB_bubba
206 Bank of Baroda NaN Bank of Baroda carla 206_Ban_carla
820 Halliburton halliburton Halliburton Company NaN 820_Hal_nan
112 Luzhou Lao Jiao NaN Lu Zhou Lao Jiao Co.Ltd NaN 112_Luz_nan
144 Rabobank NaN Rabobank Nederland N.V. NaN 144_Rab_nan
'''
推荐阅读
- java - 为什么当我旋转屏幕时我的 webview 片段变为空?安卓 - 爪哇
- node.js - 覆盖权限不改变 discord.js v12
- flutter - Revenuecat、Flutter - PurchasePackage 不起作用,但没有例外
- xaml - 我需要使用带有输入的ffimageloading:AdvancedEntry
- r - 使用 for 循环在 R 中创建多个具有不同名称的对象时出错
- ios - 如何在不覆盖以前数据的情况下使用 rxSwift 压缩、合并或连接?
- flutter - 该表达式不计算为函数,所以它不能被调用?
- javascript - 函数接收一个字符数组并返回一个包含字符 az、AZ 和 0-9 的字符串
- jsf - 为什么页面第一次加载时没有连接到数据库?
- java - 如何将单个任务变成并行线程