首页 > 解决方案 > 比较 2 个数据框以在 Pandas 中查找公共数据并提取相邻数据

问题描述

events是以日期为索引的数据框,如下所示:

           co_code
co_stkdate  
2009-03-17  11
2010-02-03  11
2011-02-14  363
2015-01-09  363
2010-10-15  365

residual是以日期为索引的另一个数据框,包含数据框 co_code 中的元素events作为列名。残差看起来像这样(有 700 多个列,但我发布了 3 个以供参考):

              11    363    365  
co_stkdate
1997-07-02  NaN -12.134525  NaN 
1997-07-04  NaN -3.663248   -15.703843 
1997-07-07  NaN -30.649876  3.400623
1997-07-08  NaN 17.924305   -6.188777
1997-07-10  NaN -25.828099  -0.615380 

我想比较两个数据框以分别查找数据框每一列的公共日期,并为residual数据框中具有匹配日期的每一列提取特定行及其相邻行events。由于数据集非常大,我想遍历残差的每一列以根据列名(与数据框匹配events)比较日期。因此,我尝试了以下代码:

 orig={}
    extra={}
    #iterating through events index
    for date in events.index:
      #creating index as a new column
      residual['tmp'] = residual.index
      print(type(residual['tmp']))
      date_list = {date}
      #determining the presence of the date in residual index
      if residual['tmp'].isin([date_list]):
        code=events.loc[date, 'co_code']
        #Comparing the co_code
        if code in residual.columns:
          #checks whether the value w.r.t a particular date and co_code as taken from events is present or not
          if pd.notnull(residual.loc[date,code]):
            #Find and add to dictionary top and bottom of values with their dates
            idx=residual.get_loc(residual.loc[date,code])
            extra=residual.iloc[idx-10:idx+10,code]
            orig.update(extra)
      #dropping the new column  
      del residual['tmp']
      #display orig
      orig

但我不断收到以下错误:

TypeError: <class 'set'> is not convertible to datetime

预期输出:例如,residual在日期 2009-03-17 的上方和下方提取 10 行(从数据框中),对应于 'co_code'=11(在数据框中给出events)。并期望日期 2009-03-17 的输出,对应于 'co_code'=11 为:

co_stkdate  11
2009-02-25  4.467442
2009-02-26  4.921655
2009-02-27  -4.875917
2009-03-02  1.895546
2009-03-03  -3.162370
2009-03-06  85.396542
2009-03-09  43.233098
2009-03-12  11.389193
2009-03-13  -68.633160
2009-03-16  0.329175
2009-03-17  -0.049623
2009-03-18  3.584602
2009-03-19  -3.602577
2009-03-20  -1.532591
2009-03-23  2.766331
2009-03-24  0.487590
2009-03-25  -3.541044
2009-03-26  -5.055355
2009-03-27  0.887624
2009-03-30  2.530087

events我想要与数据帧中的索引和特定列(对应于数据帧中的 co_code events)匹配的所有此类日期和 co_code 的类似输出residuals。我怎样才能消除错误?任何有关执行此操作的最佳方法的指导将不胜感激。

标签: pythonpandasloopsdataframedatetime

解决方案


推荐阅读