python - 将数据帧作为时间序列处理时,将新数据帧与旧数据帧进行比较?
问题描述
我有这个数据框 last_bid_vol_price,正在反复更新。
import ccxt
import pandas as pd
import numpy as np
binanceus = ccxt.binanceus({
'enableRateLimit': True,
})
last_bid_vol_price = pd.DataFrame()
while True:
#organize the LOB data the way I want it
orderbook = binanceus.fetch_order_book('BTC/USD')
orderbook_df = pd.DataFrame(orderbook)
orderbook_df.drop(["timestamp", "datetime", 'nonce'],axis=1,inplace=True)
#split bids list
asks = orderbook_df.asks.apply(pd.Series)
#split asks list and merge the two lists
order = orderbook_df.bids.apply(pd.Series).merge(asks, left_index = True, right_index = True)
#back to df
orderbook_df = pd.DataFrame(order)
#rename headers
orderbook_df.columns = ['bids', 'bids_volume', 'asks', 'asks_volume']
#extract the bids where volume is over 1 BTC
bid_vol_price = orderbook_df['bids'].where(orderbook_df['bids_volume'] > 1)
bid_vol = orderbook_df['bids_volume'].where(orderbook_df['bids_volume'] > 1)
bid_vol_price.dropna(inplace=True)
bid_vol.dropna(inplace=True)
bid_vol_price = pd.concat([bid_vol_price, bid_vol], axis=1)
bid_vol_price = bid_vol_price.assign(count=0)
#check if first run
if last_bid_vol_price.empty == True:
last_bid_vol_price = bid_vol_price
#count how many times the bid has remained on the orderbook
mask = ((last_bid_vol_price['bids'] == bid_vol_price['bids'])
and (last_bid_vol_price['bids_volume'] == bid_vol_price['bids_volume']))
bid_vol_price['count'] = bid_vol_price['count'].mask(mask, bid_vol_price['count'] + 1)
#update last_bid_volume_price for the next go by keeping the new volume rows and droping the
#rows that don't exist and dropping the rows with the previous counts
last_bid_vol_price = pd.merge(last_bid_vol_price, bid_vol_price, on=['bids', 'bids_volume'], how='right')
print(last_bid_vol_price)
我正在寻求实现这样的输出。不是确切的数据,而是风格。
bids bids_volume count
0 6738.23 1.634321 1
1 6733.82 1.607452 1
2 6694.20 9.981800 1
我想将bid_vol_price
它存储在一个变量中last_bid_vol_price
,并将其与新bid_vol_price
出现的数据进行比较,并最终计算具有重复值的行,但也删除新的行last_bid_vol_price
中不存在的行bid_vol_price
。
我首先要解决的问题是比较两个数据框,因为行的索引和数据框的大小随时可能发生变化。帮助比较这两个数据框的变化将不胜感激。
我需要确保last_bid_vol_price
bids
andbids_volume
列的行匹配bid_vol_price
' 即使行的索引可以更改并且可能有更多或更少的行。
和
mask = ((last_bid_vol_price['bids'] == bid_vol_price['bids'])
and (last_bid_vol_price['bids_volume'] ==
bid_vol_price['bids_volume']))
我收到此错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
和
mask = (last_bid_vol_price['bids'] == bid_vol_price['bids'])
我让它循环直到bid_vol_price
改变,然后我得到这个错误:
ValueError: Can only compare identically-labeled Series objects
mask
如果没有bids_volume
匹配,我也无法使用它。有人可能会取消出价,并且新的出价会以不同的数量出现在订单簿上。如果发生这种情况,计数将是错误的。
解决方案
4天后,我想通了。
import ccxt
import pandas as pd
import numpy as np
binanceus = ccxt.binanceus({
'enableRateLimit': True,
})
last_bid_vol_price = pd.DataFrame()
while True:
#organize the LOB data the way I want it
orderbook = binanceus.fetch_order_book('BTC/USD')
orderbook_df = pd.DataFrame(orderbook)
orderbook_df.drop(["timestamp", "datetime", 'nonce'],axis=1,inplace=True)
#split bids list
asks = orderbook_df.asks.apply(pd.Series)
#split asks list and merge the two lists
order = orderbook_df.bids.apply(pd.Series).merge(asks, left_index = True, right_index = True)
#back to df
orderbook_df = pd.DataFrame(order)
#rename headers
orderbook_df.columns = ['bids', 'bids_volume', 'asks', 'asks_volume']
#extract the bids where volume is over 1 BTC
bid_vol_price = orderbook_df['bids'].where(orderbook_df['bids_volume'] > 1)
bid_vol = orderbook_df['bids_volume'].where(orderbook_df['bids_volume'] > 1)
bid_vol_price.dropna(inplace=True)
bid_vol.dropna(inplace=True)
bid_vol_price = pd.concat([bid_vol_price, bid_vol], axis=1)
bid_vol_price = bid_vol_price.assign(count=0.0)
#check if first run
if last_bid_vol_price.empty == True:
last_bid_vol_price = bid_vol_price
下面的代码解决了我的问题,它的额外好处是保留了出价位置的索引。
#update last_bid_volume_price for the next go by keeping the new volume rows and droping the
#rows that no longer exist and while updating the counts of rows that continue to exist
bid_vol_price_m = pd.merge(bid_vol_price, last_bid_vol_price, on=['bids','bids_volume'], how='left', indicator='exist')
#count_y, count_x column's is produced from the merge above
bid_vol_price_m['count'] = np.where(bid_vol_price_m.exist =='both', bid_vol_price_m['count_y'] + 1, 0)
bid_vol_price_m = bid_vol_price_m.drop(['exist', 'count_x', 'count_y'], axis=1)
#keep the index values and order from bid_vol_price
bid_vol_price_m.index = bid_vol_price.index
last_bid_vol_price = bid_vol_price_m
print(last_bid_vol_price)
这是一些输出
15 6670.97 2.558446 15.0
29 6658.99 1.020400 15.0
42 6650.00 3.052699 15.0
47 6643.85 9.780500 15.0
94 6608.07 2.968100 15.0
bids bids_volume count
15 6670.97 2.558446 16.0
29 6658.99 1.020400 16.0
42 6650.00 3.052699 16.0
47 6643.85 9.780500 16.0
94 6608.07 2.968100 16.0
bids bids_volume count
8 6678.92 2.225173 0.0
16 6670.97 2.558446 17.0
30 6658.99 1.020400 17.0
43 6650.00 3.052699 17.0
48 6643.85 9.780500 17.0
95 6608.07 2.968100 17.0
bids bids_volume count
8 6678.92 2.225173 1.0
16 6670.97 2.558446 18.0
30 6658.99 1.020400 18.0
43 6650.00 3.052699 18.0
48 6643.85 9.780500 18.0
95 6608.07 2.968100 18.0
bids bids_volume count
15 6670.97 2.558446 19.0
29 6658.99 1.020400 19.0
42 6650.00 3.052699 19.0
47 6643.85 9.780500 19.0
94 6608.07 2.968100 19.0
我希望这可以帮助别人。
推荐阅读
- ios - 保存到文档目录时如何从空格中删除 % 符号
- leaflet - Buefy 步骤和标签阻止传单地图
- javascript - 音频无法在移动设备上播放,但适用于桌面浏览器
- jquery - 从 JSP 页面获取值到 Java 类
- ios - 如何在 ViewController 启动之前等待 AppDelegate 完成?
- python-3.x - 如何将包含带重音字符的字符串的 pandas 数据框导出到 CSV 文件,而不在输出文件中获取任何其他字符?
- kotlin - Kotlin SDK 与 Java SDK
- c - 为什么根本没有内联函数符号?
- java - Java - 如何创建一个可以处理特定 [restricted] 类型的泛型对象的类?
- http-headers - 如何在 SAP Hybrid Application Toolkit 中将 Content-ID 标头设置为正确的请求?