首页 > 解决方案 > 如何优化熊猫中的日期框处理?

问题描述

我有一个以下 DF。


    Date        Time        Open    High    Low     Close
0   2010-01-03  17:00:00    1.4301  1.4304  1.4301  1.4304
1   2010-01-03  17:01:00    1.4303  1.4303  1.4303  1.4303

我需要将每天的价格标准化,因此有必要将每天的价格除以当天的第一个值,所以每天都从 1.0 开始。我已经编写了以下代码,但是它的运行速度非常慢,我该如何改进它?我觉得它太复杂了,有没有优雅的方法?

for year in range(2010, 2021):
    for month in range(1, 13):
        for day in range(1, 31):
            mutdf = dfc.loc[(dfc['Date'].dt.year == year) & (dfc['Date'].dt.month == month) & (dfc['Date'].dt.day == day), 
                            ['Open', 'High', 'Low', 'Close']]
            if mutdf.empty:
                continue
            mutdf['Open'] = mutdf['Open'].divide(mutdf.iloc[0, 0])
            mutdf['High'] = mutdf['High'].divide(mutdf.iloc[0, 1])
            mutdf['Low'] = mutdf['Low'].divide(mutdf.iloc[0, 2])
            mutdf['Close'] = mutdf['Close'].divide(mutdf.iloc[0, 3])
            dfc.loc[(dfc['Date'].dt.year == year) & (dfc['Date'].dt.month == month) & (dfc['Date'].dt.day == day), 
                    ['Open', 'High', 'Low', 'Close']] = mutdf

期望的输出:

    Date        Time        Open    High    Low     Close
0   2010-01-03  17:00:00    1.00000 1.00000 1.00000 1.000000
1   2010-01-03  17:01:00    1.00014 0.99993 1.00014 0.999930
2   2010-01-03  17:02:00    1.00007 0.99993 1.00000 0.999930
3   2010-01-03  17:03:00    1.00007 0.99986 1.00007 0.999860
4   2010-01-03  17:04:00    1.00000 0.99986 0.99979 0.999720
5   2010-01-03  17:06:00    1.00000 0.99979 0.99993 0.999790
6   2010-01-03  17:08:00    0.99993 0.99986 0.99993 0.999790
7   2010-01-03  17:09:00    0.99993 0.99979 0.99979 0.999581
8   2010-01-03  17:10:00    0.99986 0.99979 0.99986 0.999790
9   2010-01-03  17:12:00    1.00007 0.99993 1.00007 0.999930

标签: pandasperformanceoptimizationjupyter-notebook

解决方案


groupby打开Date并除以第一个值:

df["Open"] = df.groupby("Date")["Open"].transform(lambda d: d/d.iat[0])

print (df)

         Date      Time     Open    High     Low   Close
0  2010-01-03  17:00:00  1.00000  1.4304  1.4301  1.4304
1  2010-01-03  17:01:00  1.00014  1.4303  1.4303  1.4303

一口气处理所有列:

col = ['Open', 'High', 'Low', 'Close']

print (df.set_index(["Date","Time"])
         .groupby("Date").apply(lambda d: d[col]/df[col].iloc[0])
         .reset_index())

         Date      Time     Open     High      Low    Close
0  2010-01-03  17:00:00  1.00000  1.00000  1.00000  1.00000
1  2010-01-03  17:01:00  1.00014  0.99993  1.00014  0.99993

推荐阅读