首页 > 解决方案 > 根据两个条件加入或合并或重塑数据框

问题描述

我有两个要合并或加入的数据框 df 和 df1 。

import pandas as pd

df = pd.DataFrame(columns=['lt1', 'lt2','lt3','lt4','lt5','lt6'])
df['date'] = pd.date_range('2016-1-1', periods=5, freq='D')
df
   lt1  lt2  lt3  lt4  lt5  lt6       date
0  NaN  NaN  NaN  NaN  NaN  NaN 2016-01-01
1  NaN  NaN  NaN  NaN  NaN  NaN 2016-01-02
2  NaN  NaN  NaN  NaN  NaN  NaN 2016-01-03
3  NaN  NaN  NaN  NaN  NaN  NaN 2016-01-04
4  NaN  NaN  NaN  NaN  NaN  NaN 2016-01-05

df1 = pd.DataFrame({'location': ['lt1','lt3', 'lt6', 'lt1','lt2', 'lt3'], \
                   'date': ['2016-01-1', '2016-01-02','2016-01-1','2016-01-03','2016-01-5','2016-01-4'], \
                   'counts': ['2', '1','1','1', '3','1']})

df1.date = pd.to_datetime(df1.date)
df1
  counts       date location
0      2 2016-01-01      lt1
1      1 2016-01-02      lt3
2      1 2016-01-01      lt6
3      2 2016-01-03      lt1
4      3 2016-01-05      lt2
5      1 2016-01-04      lt3

我想根据位置将计数值df1放入df. 合并将基于date列,但要添加的值将来自df2.counts列,并且这些值将正确分配到df. 列名df包含列中存在的所有名称df1.location

仅按日期合并很容易,但由于它不是真正的直接合并,它更像是重塑或加入。任何建议如何获得以下 df 作为输出:

df
        date  lt1  lt2  lt3  lt4  lt5  lt6
0 2016-01-01    2    0    0    0    0    1
1 2016-02-01    0    0    1    0    0    0
2 2016-03-01    1    0    0    0    0    0
3 2016-04-01    0    0    1    0    0    0
4 2016-05-01    0    3    0    0    0    0

标签: pandasjoinmergepython-3.5

解决方案


这是使用pivot_tableand的一种方法combine_first

m=df1.pivot_table(index='date',columns='location',values='counts',aggfunc='sum')
final=df.set_index('date').combine_first(m).fillna(0).reset_index()

要不就:

(df.set_index('date').combine_first(df1.pivot('date','location','counts'))
                                             .fillna(0).reset_index())

        date lt1 lt2 lt3  lt4  lt5 lt6
0 2016-01-01   2   0   0    0    0   1
1 2016-01-02   0   0   1    0    0   0
2 2016-01-03   1   0   0    0    0   0
3 2016-01-04   0   0   1    0    0   0
4 2016-01-05   0   3   0    0    0   0

推荐阅读