首页 > 解决方案 > 如何根据字符串将具有多种类型信息的数据帧拆分为单独的数据帧?

问题描述

我有以下数据框,其中一个 CSV 文件中有多个数据集。在这种情况下,“现金余额”、“账户订单历史记录”和“股票”(故意留空信息)。我想将现金余额信息创建到一个数据框中,并将帐户订单历史记录创建到另一个数据框中。我的想法是查看第一列的索引,看看它是否等于“现金余额”,然后读取每一行,直到索引 =“账户订单历史”等等,但不确定这是否正确方法。

如何使用python对此进行编码?请帮忙谢谢!

Cash Balance
Date        Time     Type    ID#    Commission  Amount   Balance
11/9/20     9:30am   Single  1234   2%          $200     $2500
11/9/20     9:40am   Single  1234   2%          $200     $2500
11/9/20     9:45am   Single  2234   2%          $200     $2500
Account Order History
Notes                Time    Spread Side        Qty      Price   Symbol Order
                     9:30am  STOCK  BUY         10       $42.87  NIO    Filled
                     9:30am  STOCK  Sell        10       $43.87  NIO    Filled
Equities

标签: pythonpandasdataframe

解决方案


这是你想要的 ?

import pandas as pd
df = pd.read_csv("new.csv",header=None)
df

0   1   2   3   4   5   6   7
0   Cash Balance    NaN NaN NaN NaN NaN NaN NaN
1   Date    Time    Type    ID# Commission  Amount  Balance NaN
2   11/09/2020  9:30am  Single  1234    2%  $200    $2,500  NaN
3   11/09/2020  9:40am  Single  1234    2%  $200    $2,500  NaN
4   11/09/2020  9:45am  Single  2234    2%  $200    $2,500  NaN
5   Account Order History   NaN NaN NaN NaN NaN NaN NaN
6   Notes   Time    Spread  Side    Qty Price   Symbol  Order
7   NaN 9:30am  STOCK   BUY 10  $42.87  NIO Filled
8   NaN 9:30am  STOCK   Sell    10  $43.87  NIO Filled

table_names = ["Cash Balance", "Account Order History"]
groups = df[0].isin(table_names).cumsum()
df_combined = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}


cash_balance = df_combined['Cash Balance'].reset_index(drop=True)
cash_balance.columns = cash_balance.iloc[0]
cash_balance.drop(cash_balance.index[0], inplace = True)

cash_balance

Date    Time    Type    ID# Commission  Amount  Balance NaN
1   11/09/2020  9:30am  Single  1234    2%  $200    $2,500  NaN
2   11/09/2020  9:40am  Single  1234    2%  $200    $2,500  NaN
3   11/09/2020  9:45am  Single  2234    2%  $200    $2,500  NaN

acct_order_hist = df_combined['Account Order History'].reset_index(drop=True)
acct_order_hist.columns = acct_order_hist.iloc[0]
acct_order_hist.drop(acct_order_hist.index[0], inplace = True)
acct_order_hist

Notes   Time    Spread  Side    Qty Price   Symbol  Order
1   NaN 9:30am  STOCK   BUY 10  $42.87  NIO Filled
2   NaN 9:30am  STOCK   Sell    10  $43.87  NIO Filled


推荐阅读