python - 使用条件在 Python 中进行时间序列分析

问题描述

我有以下数据（样本）

Symbol Sections      iBid     Bid                Date
0    O.U20       O1  99.73167  99.730 2020-06-29 16:32:25
1    O.Z20       O1  99.70250  99.700 2020-06-29 16:32:25
2    O.H21       O1       NaN  99.795 2020-06-29 16:32:25
3    O.M21       O1  99.81167  99.810 2020-06-29 16:32:25
4    O.U21       O2  99.81667  99.815 2020-06-29 16:32:25
5    O.Z21       O2       NaN  99.795 2020-06-29 16:32:25
6    O.H22       O2  99.81000  99.810 2020-06-29 16:32:25
7    O.M22       O2  99.79500  99.795 2020-06-29 16:32:25
16  F3.U26       F3       NaN   1.000 2020-06-29 16:32:25
17  F3.Z26       F3       NaN  -3.000 2020-06-29 16:32:25
18  F3.H27       F3       NaN  -1.000 2020-06-29 16:32:25
19  F6.H26       F6  -1.75000     NaN 2020-06-29 16:32:25
20  F6.M26       F6  -4.50000     NaN 2020-06-29 16:32:25
21  F6.U26       F6  -5.50000     NaN 2020-06-29 16:32:25
22  F9.U20       F9  -8.50000  -9.000 2020-06-29 16:32:25
23   O.U20       O3  99.73167  99.730 2020-06-29 16:32:26
24   O.Z20       O3  99.70250  99.700 2020-06-29 16:32:26
25   O.H21       O3       NaN  99.795 2020-06-29 16:32:26
26   O.M21       O3  99.81167  99.810 2020-06-29 16:32:26
27   O.U21       O4  99.81667  99.815 2020-06-29 16:32:26
28   O.Z21       O4       NaN  99.795 2020-06-29 16:32:26
29   O.H22       O4  99.81000  99.810 2020-06-29 16:32:26
30   O.M22       O4  99.79500  99.795 2020-06-29 16:32:26

我想要做的是绘制散点图或折线图或任何适合这种分析的图表，如果满足条件，可以分析随时间变化的趋势。例如，我想查看每个符号（O、S、F）以及部分（O1、F3 等）的 iBid 比 Bid 加班高多少倍

我知道我需要展示一些工作，但我不确定这样的图表是否可能？到目前为止，我只能根据 Symbol 对数据进行拆分

df_O = df[df['Contract'].str.contains('O')]

并过滤掉类似的结果

IbidgreaterBid = big_frame[(big_frame.iBid > big_frame.Bid)]

是否可以获得可以分析 Ibid > Bid 何时以 Date 列为 x 轴的图表？（日期列有千行，只有秒的差异）

标签： pythonpython-3.xmatplotlibseaborn

当同上>出价时可以分析的图表不清楚您的意思。但是，我可以建议一种基于 Ibid >/< Bid 来区分数据的方法。在以下示例中，红色散点表示 Ibid > Bid 的数据点，蓝色表示其他情况。此外，因为差异仅在秒的范围内，所以我使用mdatesdate-formatter 将 xticks 设置为仅显示 HMS。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from matplotlib.offsetbox import AnchoredText
import matplotlib.dates as mdates
from datetime import timedelta
plt.style.use('seaborn-whitegrid')

n_sections=df['Sections'].nunique()
cols=2
rows=int(round(n_sections/2.0))
#setup the plot
fig, ax = plt.subplots(rows, cols, figsize=(16,8),sharex=False,sharey=False) # if you want to turn off sharing axis.
row=0 #to iterate over rows/cols
col=0 #to iterate over rows/cols


for index, Section in df.groupby('Sections'):
    ax[row][col].scatter(np.array(Section['Datetime']),Section['iBid'] , color='blue')
    ax[row][col].scatter(np.array(Section['Datetime'][Section['iBid']>Section['Bid']]),Section['iBid'][Section['iBid']>Section['Bid']] , color='red')
    ax[row][col].set_xlim([min(Section['Datetime'])-timedelta(seconds=5), max(Section['Datetime'])+timedelta(seconds=5)])
    ax[row][col].set_xlabel('Date Time',fontsize=20)
    ax[row][col].set_ylabel('iBid',fontsize=20)
    anchored_text = AnchoredText("{}".format(Section['Sections'].unique()[0]), loc=4,prop=dict(size=20))
    ax[row][col].add_artist(anchored_text)

    ax[row][col].xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
    ax[row][col].tick_params(axis='both', direction='in', which='major', length=5, width=2,labelsize=16)
    
    row=row+1
    if row==rows:
        row=0
        col=col+1

python - 使用条件在 Python 中进行时间序列分析

问题描述

解决方案

推荐阅读