首页 > 解决方案 > 难以在两列(股票代码和日期)上合并两个数据框

问题描述

我已经查看了对此类问题的许多回复。我正在尝试按列合并两个文件,并且数据框仅共享股票代码和日期,因此如果我设置how='left',它会显示 的数据df_US[US_columns]和标题列的数据,df_SEDOL_ESG[ESG_columns]但不显示 的数据df_SEDOL_ESG[ESG_columns]

同样,如果您设置how = 'right',它显示相反。我尝试过设置how = 'outer',它不会合并数据框,而是单独列出它们。

在代码下方,我附上了示例数据框和数据类型,因为我想确保日期列采用日期时间格式。任何指导表示赞赏。

import numpy as np
import pandas as pd

path0 = 'K:/QuantTest/Data/ESG/'
path1 = 'K:/QuantTest/Data/US/'


def US_ESG():    

    df_US = pd.read_csv(path1 + 'df_US_weekly_expectationt.csv', dtype={'ticker':'str'})
    df_US.rename(columns = {'Date': 'date'}, inplace = True)
    df_US['date'] = pd.to_datetime(df_US['date'], format='%m/%d/%Y', errors = 'coerce')

    df_SEDOL_ESG = pd.read_csv(path0 + 'SEDOL_ESGt.csv', dtype = {'ticker':'str'})
    df_SEDOL_ESG.rename(columns = {'Ticker':'ticker'}, inplace=True)
    df_SEDOL_ESG['date'] = pd.to_datetime(df_SEDOL_ESG['date'], format='%m/%d/%Y', errors = 'coerce')

    US_columns = ['ticker', 'date', 'volume', 'closing_price']
    ESG_columns = ['ticker', 'date','AllCategories_Insight','AllCategories_CategoryVolumeTTM']

    df_US_ESG = df_US[US_columns].merge(df_SEDOL_ESG[ESG_columns], how='left', on = ['ticker', 'date'])

    df_US_ESG.to_csv(path0 + 'US_ESGt.csv', index = False)


if __name__ == "__main__":

   US_ESG()  

标签: pythondataframe

解决方案


如果你想合并 2 个数据框,像这样。on 参数用作连接 kyes。熊猫医生

# before
df_US_ESG = df_US[US_columns].merge(df_SEDOL_ESG[ESG_columns], how='left', on = ['ticker', 'date'])

# after
df_US_ESG = pd.maerge(df_US[US_columns], df_SEDOL_ESG[ESG_columns], on = ['ticker', 'date'])

我的繁殖是这样的。希望这有帮助。

# df1.csv
ticker,date,volume,closing_price
A,12/28/2018,2445101.5,65.96
AABA,12/28/2018,7113085.5,58.35
AAP,12/28/2018,1066813.625,155.46
AAPL,12/28/2018,43182216,156.23
ABC,12/28/2018,1286497.125,73.96

# df2.csv

OrganizationTvlId,ISIN,Ownership,SEDOL,ticker,Company_Name,InstrumentCountry,Sector,Industry,date,AllCategories_Insight,Materiality_Insight,AllCategories_CategoryVolumeTTM,Materiality_CategoryVolumeTTM
0002c46f-98ff-457e-83e0-47b466746572,US55027E1029,Public,2572109,A,Luminex_Corp.,US,Health_Care,Biotechnology,12/28/2018,56.12375097,58.27797253,4,3
0002c46f-98ff-457e-83e0-47b466746572,US55027E1029,Public,2572109,AABA,Luminex_Corp.,US,Health_Care,Biotechnology,12/28/2018,56.37543414,58.48117502,4,3

# python
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv'
df = pd.merge(df1, df2, on=['ticker', 'date'])
print(df)

# output
ticker  date    volume  closing_price   OrganizationTvlId   ISIN    Ownership   SEDOL   Company_Name    InstrumentCountry   Sector  Industry    AllCategories_Insight   Materiality_Insight AllCategories_CategoryVolumeTTM Materiality_CategoryVolumeTTM
0   A   12/28/2018  2445101.5   65.96   0002c46f-98ff-457e-83e0-47b466746572    US55027E1029    Public  2572109 Luminex_Corp.   US  Health_Care Biotechnology   56.123751   58.277973   4   3
1   AABA    12/28/2018  7113085.5   58.35   0002c46f-98ff-457e-83e0-47b466746572    US55027E1029    Public  2572109 Luminex_Corp.   US  Health_Care Biotechnology   56.375434   58.481175   4   3 

推荐阅读