python - 如何根据字符串将具有多种类型信息的数据帧拆分为单独的数据帧?
问题描述
我有以下数据框,其中一个 CSV 文件中有多个数据集。在这种情况下,“现金余额”、“账户订单历史记录”和“股票”(故意留空信息)。我想将现金余额信息创建到一个数据框中,并将帐户订单历史记录创建到另一个数据框中。我的想法是查看第一列的索引,看看它是否等于“现金余额”,然后读取每一行,直到索引 =“账户订单历史”等等,但不确定这是否正确方法。
如何使用python对此进行编码?请帮忙谢谢!
Cash Balance
Date Time Type ID# Commission Amount Balance
11/9/20 9:30am Single 1234 2% $200 $2500
11/9/20 9:40am Single 1234 2% $200 $2500
11/9/20 9:45am Single 2234 2% $200 $2500
Account Order History
Notes Time Spread Side Qty Price Symbol Order
9:30am STOCK BUY 10 $42.87 NIO Filled
9:30am STOCK Sell 10 $43.87 NIO Filled
Equities
解决方案
这是你想要的 ?
import pandas as pd
df = pd.read_csv("new.csv",header=None)
df
0 1 2 3 4 5 6 7
0 Cash Balance NaN NaN NaN NaN NaN NaN NaN
1 Date Time Type ID# Commission Amount Balance NaN
2 11/09/2020 9:30am Single 1234 2% $200 $2,500 NaN
3 11/09/2020 9:40am Single 1234 2% $200 $2,500 NaN
4 11/09/2020 9:45am Single 2234 2% $200 $2,500 NaN
5 Account Order History NaN NaN NaN NaN NaN NaN NaN
6 Notes Time Spread Side Qty Price Symbol Order
7 NaN 9:30am STOCK BUY 10 $42.87 NIO Filled
8 NaN 9:30am STOCK Sell 10 $43.87 NIO Filled
table_names = ["Cash Balance", "Account Order History"]
groups = df[0].isin(table_names).cumsum()
df_combined = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
cash_balance = df_combined['Cash Balance'].reset_index(drop=True)
cash_balance.columns = cash_balance.iloc[0]
cash_balance.drop(cash_balance.index[0], inplace = True)
cash_balance
Date Time Type ID# Commission Amount Balance NaN
1 11/09/2020 9:30am Single 1234 2% $200 $2,500 NaN
2 11/09/2020 9:40am Single 1234 2% $200 $2,500 NaN
3 11/09/2020 9:45am Single 2234 2% $200 $2,500 NaN
acct_order_hist = df_combined['Account Order History'].reset_index(drop=True)
acct_order_hist.columns = acct_order_hist.iloc[0]
acct_order_hist.drop(acct_order_hist.index[0], inplace = True)
acct_order_hist
Notes Time Spread Side Qty Price Symbol Order
1 NaN 9:30am STOCK BUY 10 $42.87 NIO Filled
2 NaN 9:30am STOCK Sell 10 $43.87 NIO Filled
推荐阅读
- java - Spring Boot @Autowired 在变量还是构造函数上?
- python - 以与主模型相同的方式序列化 Django 简单历史项目
- postgresql - postgresQL 外键插入
- angular - vscode 抱怨未知元素(mat-toolbar)
- powershell - 如何使用 Powershell DSC 为 Windows 服务设置自动启动配置
- java - 应用程序在真实设备中运行但在 Android Studio 模拟器中崩溃
- python-3.x - python中的__main__是什么意思?
- python - 如何读取和写入 pandas/pyspark 数据帧到雪花
- node.js - V8 引擎在实现 ECMAScript 模块时使用 Node.js 的哪些功能
- python - 熊猫滚动申请返回 np.nan