首页 > 解决方案 > 循环遍历数据框并将值直接添加到数据框中

问题描述

我有一个包含三列的 csv:

df.sample(5)

  company            username     esg_company
320 CIBC             NaN         Canadian Imperial Bank of Commerce
206 Bank of Baroda   NaN         Bank of Baroda
820 Halliburton      halliburton Halliburton Company
112 Luzhou Lao Jiao  NaN         Lu Zhou Lao Jiao Co.Ltd
144 Rabobank         NaN         Rabobank Nederland N.V.

现在我是:

这个列表相当长,我想以不同的方式来做,因为任何错误都意味着列表长度和数据帧长度不一样。我想要:

我不确定这是否可能,任何帮助表示赞赏!

标签: pythonpandas

解决方案


逐行循环是一个非常糟糕的主意。特别是对于大型数据集。这将需要永远和一天的时间才能完成。

最好使用 apply() 方法添加一个新列,该列通过函数/方法处理一些数据。

需要注意的另一件事是,您通常会使用公共 ID 加入数据集。这样,您可以在修改之前将 merge() 或 join() 合并到单个数据帧中。

我使用您的示例数据创建了一个示例,并显示了 apply() 和 join() 方法:

import pandas as pd
import io
pd.set_option('display.max_colwidth', 9999)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.precision", 5)

## Create test data
data_csv_str = '''id,name,location,description
320,CIBC,null,Canadian Imperial Bank of Commerce
206,Bank of Baroda,null,Bank of Baroda
820,Halliburton,halliburton,Halliburton Company
112,Luzhou Lao Jiao,null,Lu Zhou Lao Jiao Co.Ltd
144,Rabobank,null,Rabobank Nederland N.V.
'''
df_data = pd.read_csv(io.StringIO(data_csv_str), sep=',', index_col=0)
print(df_data)
'''
                name     location                         description
id                                                                   
320             CIBC          NaN  Canadian Imperial Bank of Commerce
206   Bank of Baroda          NaN                      Bank of Baroda
820      Halliburton  halliburton                 Halliburton Company
112  Luzhou Lao Jiao          NaN             Lu Zhou Lao Jiao Co.Ltd
144         Rabobank          NaN             Rabobank Nederland N.V.
'''

## Create the instagram user data
data_ig_csv_str = '''id,ig_username
320,bubba
206,carla
555,charles
'''
df_ig_usernames = pd.read_csv(io.StringIO(data_ig_csv_str), sep=',', index_col=0)
print(df_ig_usernames)
'''
    ig_username
id             
320       bubba
206       carla
555     charles
'''

## Join the two dataframes
df_merged = df_data.join(df_ig_usernames)
print(df_merged)
'''
                name     location                         description ig_username
id                                                                               
320             CIBC          NaN  Canadian Imperial Bank of Commerce       bubba
206   Bank of Baroda          NaN                      Bank of Baroda       carla
820      Halliburton  halliburton                 Halliburton Company         NaN
112  Luzhou Lao Jiao          NaN             Lu Zhou Lao Jiao Co.Ltd         NaN
144         Rabobank          NaN             Rabobank Nederland N.V.         NaN
'''

## Create method/function to process and modify data
def myDataProcess(row):
    rowIndex = row.name
    name = row['name']
    abbrev_name = name[0:3]
    ig_name = row['ig_username']
    s = '%s_%s_%s' % (rowIndex, abbrev_name, ig_name)
    return s

## Apply the changes (via method) as new column
df_applied = df_merged
df_applied['id_nameAbbrev_ig'] = df_merged.apply(myDataProcess, axis=1)
print(df_applied)
'''
                name     location                         description ig_username id_nameAbbrev_ig
id                                                                                                
320             CIBC          NaN  Canadian Imperial Bank of Commerce       bubba    320_CIB_bubba
206   Bank of Baroda          NaN                      Bank of Baroda       carla    206_Ban_carla
820      Halliburton  halliburton                 Halliburton Company         NaN      820_Hal_nan
112  Luzhou Lao Jiao          NaN             Lu Zhou Lao Jiao Co.Ltd         NaN      112_Luz_nan
144         Rabobank          NaN             Rabobank Nederland N.V.         NaN      144_Rab_nan
'''

推荐阅读