python - 在 DataFrame 中处理数据的最佳方式
问题描述
我正在寻找有关如何改进此程序并更有效地使用 pandas 的建议。
我有一组来自市场的订单数据。每个订单都有一个代表商品的 type_id,可以是买入或卖出订单,并且有价格和数量。
我想处理市场数据,并创建一个包含每个 type_id 的 DataFrame,以及为该商品购买或出售市场上 n% 的交易量的成本。
这是我的工作代码:
import pandas as pd
type_ids = {
0: 'Item A',
1: 'Item B',
}
market_order_list = [
{'type_id': 0, 'is_buy_order': False, 'price': 80, 'volume': 22},
{'type_id': 0, 'is_buy_order': False, 'price': 70, 'volume': 12},
{'type_id': 0, 'is_buy_order': False, 'price': 60, 'volume': 9},
{'type_id': 0, 'is_buy_order': True, 'price': 50, 'volume': 3},
{'type_id': 0, 'is_buy_order': True, 'price': 40, 'volume': 9},
{'type_id': 0, 'is_buy_order': True, 'price': 30, 'volume': 33},
{'type_id': 1, 'is_buy_order': False, 'price': 30, 'volume': 28},
{'type_id': 1, 'is_buy_order': False, 'price': 25, 'volume': 11},
{'type_id': 1, 'is_buy_order': False, 'price': 20, 'volume': 7},
{'type_id': 1, 'is_buy_order': True, 'price': 15, 'volume': 8},
{'type_id': 1, 'is_buy_order': True, 'price': 10, 'volume': 12},
{'type_id': 1, 'is_buy_order': True, 'price': 5, 'volume': 24}
]
def inner_func(df, tracking):
if tracking['volume_processed'] == tracking['total_volume_to_process']:
# We already filled our total volume, no more processing needed
return
# We need to process this much more volume
needed_volume = tracking['total_volume_to_process'] - tracking['volume_processed']
if df['volume'] >= needed_volume:
# This order can fully fill us
tracking['volume_processed'] += needed_volume
tracking['total_price_paid'] += needed_volume * df['price']
else:
# This order can only partially fill us
tracking['volume_processed'] += df['volume']
tracking['total_price_paid'] += df['volume'] * df['price']
def outer_func(df_orig, result_list, percent):
# Determine if this is a list of buy or sell orders and get the type
is_buy = df_orig['is_buy_order'][0]
type_id = df_orig['type_id'][0]
# Sort price in correct direction for buy/sell, and calculate how much volume is needed
df = df_orig.sort_values('price', ascending=not is_buy, inplace=False).reset_index(drop=True)
total_volume_to_process = int(df['volume'].sum() * percent)
# Make tracking dictionary which will capture results of this set of orders
tracking = {
'type_id': type_id,
'is_buy': is_buy,
'volume_processed': 0,
'total_volume_to_process': total_volume_to_process,
'total_price_paid': 0,
}
# Each inner_func call will be just the buy side, or just the sell side, for a single type_id
df.apply(func=inner_func, axis=1, args=(tracking,))
# Append the results to our list
result_list.append(tracking)
result_list = []
# Load the dataframe
df = pd.DataFrame(market_order_list)
g = df.groupby(['type_id', 'is_buy_order']).apply(outer_func, result_list=result_list, percent=0.33)
# Load the result_list into a dataframe and display
result_frame = pd.DataFrame(result_list)
print('=== Result === ')
print(result_frame)
print('\nWhat is the cost of buying 33% of the volume for type_id = 0?')
total_price_paid = result_frame[(result_frame.type_id == 0) & (result_frame.is_buy == True)]['total_price_paid'].item()
print(total_price_paid)
这是输出:
=== Result ===
type_id is_buy volume_processed total_volume_to_process total_price_paid
0 0 False 14 14 890
1 0 True 14 14 570
2 1 False 15 15 340
3 1 True 14 14 180
What is the cost of buying 33% of the volume for type_id = 0?
570
请给我一些关于我如何做以及如何改进的建议。执行此操作的正确方法是什么?谢谢你。
解决方案
推荐阅读
- json - 带有 Visual Studio 2017 dot net core 的 angular 4 应用程序。如何从 typescript 文件访问 appsetting.json 配置键
- php - 某些电子邮件的 PHPMailer 身份验证失败
- javascript - Node js和socket.io没有任何错误但通过url访问时不起作用,我该怎么办?
- java - JUnit 初始化错误 - 静态方法
- java - openweathermap API 有方括号中的部分,无法在 Android Studio 中调用该 json 数据
- cordova - 用户无法下载数据
- r - 为什么 addPolylines 在 R Shiny 传单地图上的工作方式不同?
- sql-server-2008 - 如何空白具有相同列详细信息的其他行
- javascript - 未捕获的语法错误:在 JS 文件中导入 JavaScript 时出现意外标识符
- java - 扫描仪二维数组 (nxn)