首页 > 解决方案 > Pandas DateTime 索引重采样不起作用

问题描述

我有一个熊猫数据框,如下面的代码所示。我正在尝试“重新采样”数据以获取票列的每日计数。它没有给出任何错误,但重新采样它不会工作。这是一个更大数据集的样本。我希望能够按天、周、月季度等进行计数。但是 .resample 选项并没有给我一个解决方案。我究竟做错了什么?

import pandas as pd
df = pd.DataFrame([['2019-07-30T00:00:00','22:15:00','car'],
                    ['2013-10-12T00:00:00','0:10:00','bus'],
                    ['2014-03-31T00:00:00','9:06:00','ship'],
                    ['2014-03-31T00:00:00','8:15:00','ship'],
                    ['2014-03-31T00:00:00','12:06:00','ship'],
                    ['2014-03-31T00:00:00','9:24:00','ship'],
                    ['2013-10-12T00:00:00','9:06:00','ship'],
                    ['2018-03-31T00:00:00','9:06:00','ship']],
                    columns=['date_field','time_field','transportation'])
df['date_field2'] = pd.to_datetime(df['date_field'])
df['time_field2'] = pd.to_datetime(df['time_field'],unit = 'ns').dt.time
df['date_time_field'] = df.apply(lambda df : pd.datetime.combine(df['date_field2'],df['time_field2']),1)
df.set_index(['date_time_field'],inplace=True)
df.drop(columns=['date_field','time_field','date_field2','time_field2'],inplace=True)
df['tickets']=1
df.sort_index(inplace=True)
df.drop(columns=['transportation'],inplace=True)
df.resample('D').sum()
print('\ndaily resampling:')
print(df)

标签: pythonpandastime-series

解决方案


我认为您忘记将输出分配给变量,例如:

df1 = df.resample('D').sum()
print (df1)

您的代码也应该简化:

#join columns together with space and pop for extract column
df['date_field'] = pd.to_datetime(df['date_field']+ ' ' + df.pop('time_field'))
#create and sorting DatetimeIndex, remove column
df = df.set_index(['date_field']).sort_index().drop(columns=['transportation'])
#resample counts
df1 = df.resample('D').size()
print (df1)
date_field
2013-10-12    2
2013-10-13    0
2013-10-14    0
2013-10-15    0
2013-10-16    0
             ..
2019-07-26    0
2019-07-27    0
2019-07-28    0
2019-07-29    0
2019-07-30    1
Freq: D, Length: 2118, dtype: int64

我也认为inplace这不是好的做法,请检查thisthis


推荐阅读