python-3.x - Pandas 中的错误以秒为单位计算时差
问题描述
我正在尝试通过 Panda 帧提取时间差(秒)。我通过文本文件读取数据。但是在我应用 diff 函数时对数据进行分组后,我得到了一个错误。
#load data
# this format loads file when there is a 'tab' delimiter in the text file
data = pd.read_csv(file, sep='\t', lineterminator='\n')
# filter data by desired field, traded venues are XLON_SET1, _BATE, _CHIX, _TRQX, XOFF_SET1 etc
dataFil = data[data['VENUE'] == "XLON_SET1"]
# then we need to group them by time-stamp to be sure, to clean up the time-series. This will cause TIME_STAMP and PRICE to become index instead of columns with data
dataFil = dataFil.groupby(['TIME_STAMP', 'PRICE']).sum()
#dataFil = dataFil.groupby(['TIME_STAMP']).sum()
dataFil['date'] = dataFil.index.get_level_values('TIME_STAMP')
dataFil['PRICE'] = dataFil.index.get_level_values('PRICE')
dataFil.head() #or dataFil
我得到以下数据
QUANTITY BID ASK MKT_BID MKT_ASK date PRICE TIME_STAMP PRICE 2018-01-22
08:30:01.306 2.769 3409 0.0 0.0 0.0 0.0 2018-01-22 08:30:01.306 2.769 2018-01-22 08:30:04.306 2.769 2691 0.0 0.0 0.0 0.0 2018-01-22 08:30:04.306 2.769 2018-01-22 08:30:11.306 2.769 2000 0.0 0.0 0.0 0.0 2018-01-22 08:30:11.306 2.769 2018-01:8 2.769 572 0.0 0.0 0.0 0.0 2018-01-22 08:30:51.065 2.769 2018-01-22 08:31:26.068 2.768 649 0.0 0.0 0.0 0.0 2018-01-26 08:368:2.6.
但是当我使用时:(检查了这个线程:熊猫计算时差)
df = dataFil
df.assign(seconds=df.date.diff().dt.seconds)
我有以下错误
TypeError Traceback (most recent call last)
<ipython-input-170-3be32e0aad41> in <module>()
1 df = dataFil
----> 2 df.assign(seconds=df.date.diff().dt.seconds)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in diff(self, periods)
1525 diffed : Series
1526 """
-> 1527 result = algorithms.diff(_values_from_object(self), periods)
1528 return self._constructor(result, index=self.index).__finalize__(self)
1529
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in diff(arr, n, axis)
1545 out_arr[res_indexer] = result
1546 else:
-> 1547 out_arr[res_indexer] = arr[res_indexer] - arr[lag_indexer]
1548
1549 if is_timedelta:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
解决方案
我认为需要将列转换date
为datetime
s - 最好read_csv
:
data = pd.read_csv(file, sep='\t', lineterminator='\n', paarse_dates=['TIME_STAMP'])
或按以下方式转换列to_datetime
:
df.assign(seconds=pd.to_datetime(df.date).diff().dt.seconds)
推荐阅读
- java - Apache Tika 无法检测短句中的语言。为什么?
- javascript - How to group same value elements in a Javascript object?
- excel - 让列总数出现在列的末尾。列改变大小
- c - 为什么在传递浮点常量而不是变量时 %f 打印大值?
- r - 在 unnest_wider 之后从提升的向量中命名列
- python - discord.py 使用命令和参数记录名称
- angular - 删除模态 Angular Json 数据和引导程序中的错误
- module - 在 Agda 中为假设添加定义
- settings - 为什么要重置 tmux 设置,以及如何自动重新加载它们?
- java - JavaFX - 从不同的场景将数据添加到 TableView