首页 > 解决方案 > 将数据帧作为时间序列处理时,将新数据帧与旧数据帧进行比较?

问题描述

我有这个数据框 last_bid_vol_price,正在反复更新。

import ccxt
import pandas as pd
import numpy as np


binanceus = ccxt.binanceus({
    'enableRateLimit': True,
})
last_bid_vol_price = pd.DataFrame()
while True:
    #organize the LOB data the way I want it
    orderbook = binanceus.fetch_order_book('BTC/USD')
    orderbook_df = pd.DataFrame(orderbook)
    orderbook_df.drop(["timestamp", "datetime", 'nonce'],axis=1,inplace=True)
    #split bids list
    asks = orderbook_df.asks.apply(pd.Series)
    #split asks list and merge the two lists
    order = orderbook_df.bids.apply(pd.Series).merge(asks, left_index = True, right_index = True)
    #back to df
    orderbook_df = pd.DataFrame(order)
    #rename headers
    orderbook_df.columns = ['bids', 'bids_volume', 'asks', 'asks_volume']

    #extract the bids where volume is over 1 BTC
    bid_vol_price = orderbook_df['bids'].where(orderbook_df['bids_volume'] > 1)
    bid_vol = orderbook_df['bids_volume'].where(orderbook_df['bids_volume'] > 1)
    bid_vol_price.dropna(inplace=True)
    bid_vol.dropna(inplace=True)
    bid_vol_price = pd.concat([bid_vol_price, bid_vol], axis=1)
    bid_vol_price = bid_vol_price.assign(count=0)

    #check if first run
    if last_bid_vol_price.empty == True:
        last_bid_vol_price = bid_vol_price

    #count how many times the bid has remained on the orderbook
    mask = ((last_bid_vol_price['bids'] == bid_vol_price['bids'])
            and (last_bid_vol_price['bids_volume'] == bid_vol_price['bids_volume']))
    bid_vol_price['count'] = bid_vol_price['count'].mask(mask, bid_vol_price['count'] + 1)

    #update last_bid_volume_price for the next go by keeping the new volume rows and droping the
    #rows that don't exist and dropping the rows with the previous counts
    last_bid_vol_price = pd.merge(last_bid_vol_price, bid_vol_price, on=['bids', 'bids_volume'], how='right')


    print(last_bid_vol_price)

我正在寻求实现这样的输出。不是确切的数据,而是风格。

      bids  bids_volume  count
0  6738.23     1.634321      1
1  6733.82     1.607452      1
2  6694.20     9.981800      1

我想将bid_vol_price它存储在一个变量中last_bid_vol_price,并将其与新bid_vol_price出现的数据进行比较,并最终计算具有重复值的行,但也删除新的行last_bid_vol_price中不存在的行bid_vol_price

我首先要解决的问题是比较两个数据框,因为行的索引和数据框的大小随时可能发生变化。帮助比较这两个数据框的变化将不胜感激。

我需要确保last_bid_vol_price bidsandbids_volume列的行匹配bid_vol_price' 即使行的索引可以更改并且可能有更多或更少的行。

mask = ((last_bid_vol_price['bids'] == bid_vol_price['bids'])
            and (last_bid_vol_price['bids_volume'] == 
            bid_vol_price['bids_volume']))

我收到此错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

mask = (last_bid_vol_price['bids'] == bid_vol_price['bids'])

我让它循环直到bid_vol_price改变,然后我得到这个错误:

ValueError: Can only compare identically-labeled Series objects

mask如果没有bids_volume匹配,我也无法使用它。有人可能会取消出价,并且新的出价会以不同的数量出现在订单簿上。如果发生这种情况,计数将是错误的。

标签: pythonpandasdataframe

解决方案


4天后,我想通了。

import ccxt
import pandas as pd
import numpy as np


binanceus = ccxt.binanceus({
    'enableRateLimit': True,
})
last_bid_vol_price = pd.DataFrame()
while True:
    #organize the LOB data the way I want it
    orderbook = binanceus.fetch_order_book('BTC/USD')
    orderbook_df = pd.DataFrame(orderbook)
    orderbook_df.drop(["timestamp", "datetime", 'nonce'],axis=1,inplace=True)
    #split bids list
    asks = orderbook_df.asks.apply(pd.Series)
    #split asks list and merge the two lists
    order = orderbook_df.bids.apply(pd.Series).merge(asks, left_index = True, right_index = True)
    #back to df
    orderbook_df = pd.DataFrame(order)
    #rename headers
    orderbook_df.columns = ['bids', 'bids_volume', 'asks', 'asks_volume']

    #extract the bids where volume is over 1 BTC
    bid_vol_price = orderbook_df['bids'].where(orderbook_df['bids_volume'] > 1)
    bid_vol = orderbook_df['bids_volume'].where(orderbook_df['bids_volume'] > 1)
    bid_vol_price.dropna(inplace=True)
    bid_vol.dropna(inplace=True)
    bid_vol_price = pd.concat([bid_vol_price, bid_vol], axis=1)
    bid_vol_price = bid_vol_price.assign(count=0.0)

    #check if first run
    if last_bid_vol_price.empty == True:
        last_bid_vol_price = bid_vol_price

下面的代码解决了我的问题,它的额外好处是保留了出价位置的索引。

    #update last_bid_volume_price for the next go by keeping the new volume rows and droping the
    #rows that no longer exist and while updating the counts of rows that continue to exist
    bid_vol_price_m = pd.merge(bid_vol_price, last_bid_vol_price, on=['bids','bids_volume'], how='left', indicator='exist')
    #count_y, count_x column's is produced from the merge above 
    bid_vol_price_m['count'] = np.where(bid_vol_price_m.exist =='both', bid_vol_price_m['count_y'] + 1, 0)
    bid_vol_price_m = bid_vol_price_m.drop(['exist', 'count_x', 'count_y'], axis=1)
    #keep the index values and order from bid_vol_price
    bid_vol_price_m.index = bid_vol_price.index
    last_bid_vol_price = bid_vol_price_m
    print(last_bid_vol_price)

这是一些输出

15  6670.97     2.558446   15.0
29  6658.99     1.020400   15.0
42  6650.00     3.052699   15.0
47  6643.85     9.780500   15.0
94  6608.07     2.968100   15.0
       bids  bids_volume  count
15  6670.97     2.558446   16.0
29  6658.99     1.020400   16.0
42  6650.00     3.052699   16.0
47  6643.85     9.780500   16.0
94  6608.07     2.968100   16.0
       bids  bids_volume  count
8   6678.92     2.225173    0.0
16  6670.97     2.558446   17.0
30  6658.99     1.020400   17.0
43  6650.00     3.052699   17.0
48  6643.85     9.780500   17.0
95  6608.07     2.968100   17.0
       bids  bids_volume  count
8   6678.92     2.225173    1.0
16  6670.97     2.558446   18.0
30  6658.99     1.020400   18.0
43  6650.00     3.052699   18.0
48  6643.85     9.780500   18.0
95  6608.07     2.968100   18.0
       bids  bids_volume  count
15  6670.97     2.558446   19.0
29  6658.99     1.020400   19.0
42  6650.00     3.052699   19.0
47  6643.85     9.780500   19.0
94  6608.07     2.968100   19.0

我希望这可以帮助别人。


推荐阅读