首页 > 解决方案 > 更新数据框列并丢失日期索引

问题描述

我有两个数据框

df1:

            col2  col3 dept
date                       
2020-05-06    29    21    A
2020-05-07    56    12    B
2020-05-08    82    15    C
2020-05-09    13     9    D
2020-05-10    35    13    E
2020-05-11    53    87    F
2020-05-12    25     9    G
2020-05-13    23    63    H

df2:

            col2 dept
date                 
2020-05-06    64    A
2020-05-07    41    B
2020-05-08    95    C
2020-05-09    58    D
2020-05-10    89    E
2020-05-11    37    F
2020-05-12    24    G
2020-05-13    67    H

我想用列col2df1的值更新列col2df2所以我的输出如下所示:

            col2  col3 dept
date                       
2020-05-06    64    21    A
2020-05-07    41    12    B
2020-05-08    95    15    C
2020-05-09    58     9    D
2020-05-10    89    13    E
2020-05-11    37    87    F
2020-05-12    24     9    G
2020-05-13    67    63    H

我写了一些看起来像这样的代码:

df1=df1.set_index('dept')
df1.update(df2.set_index('dept'))
df1=df1.reset_index()

但是它将索引重置df1为整数而不是日期,因此我得到的输出如下所示:

  dept  col2  col3
0    A    64    21
1    B    41    12
2    C    95    15
3    D    58     9
4    E    89    13
5    F    37    87
6    G    24     9
7    H    67    63

我的完整代码如下:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import datetime
dept=['A','B','C','D','E','F','G','H']
date_today = datetime.date.today()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1111)
data1 = np.random.randint(1, high=100, size=len(days))
data2 = np.random.randint(1, high=100, size=len(days))
df1 = pd.DataFrame({'date': days, 'dept':dept,'col2': data1, 'col3': data2})
df1 = df1.set_index('date')

print(df1)

dept=['A','B','C','D','E','F','G','H']
date_today = datetime.date.today()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1331)
data3 = np.random.randint(1, high=100, size=len(days))

df2 = pd.DataFrame({'date': days, 'dept':dept,'col2': data3})
df2 = df2.set_index('date')

print(df2)

df1=df1.set_index('dept')
df1.update(df2.set_index('dept'))
df1=df1.reset_index()

print(df1)

如何更新df1df2保持索引日期格式df1

标签: pythonpandas

解决方案


正如我对您的示例所了解的那样,您df1从index 和 column的df2基础上进行更新。您需要添加到索引并调用datedeptdeptupdate

df1 = df1.set_index('dept', append=True)
df1 = df1.update(df2.set_index('dept', append=True))
df1 = df1.reset_index('dept')

Out[35]:
           dept  col2  col3
date
2020-05-06    A    64    21
2020-05-07    B    41    12
2020-05-08    C    95    15
2020-05-09    D    58     9
2020-05-10    E    89    13
2020-05-11    F    37    87
2020-05-12    G    24     9
2020-05-13    H    67    63

推荐阅读