首页 > 解决方案 > 在 for 循环中使用 Groupby

问题描述

我有以下数据框 数据框

如果df['Time']df['OrderID']相同,并且df['MessageType']后跟'D''A'则删除包含的行'D'并将值重命名'A''AMEND'。这是我的代码:

import pandas as pd

Instrument = df['Symbol']
Date = df['Date']
Time = df['Time']
RecordType = df['MessageType']
Price = df['Price']
Volume = df['Quantity']
Qualifiers = df['ExchangeOrderType']
OrderID = df['OrderID']
MatchID = df['MatchID']
Side = df['Side']

for i in range(len(Time)-1):
    if((Time[i] == Time[i+1]) & (RecordType[i] == "D") & (RecordType[i+1] == "A")):
        del Instrument[i]
        del Date[i]
        del Time[i]
        del RecordType[i]
        del Price[i]
        del Volume[i]
        del Qualifiers[i]
        del OrderID[i]
        del Side[i]
        RecordType[i+1] = "AMEND" # rename the message type

# creating a new dataframe with updated lists
new_df = pd.DataFrame({'Instrument':Instrument, 'Date':Date, 'Time':Time, 'RecordType':RecordType, 'Price':Price, 'Volume':Volume, 'Qualifiers':Qualifiers, 'OrderID':OrderID, 'MatchID':MatchID, 'Side':Side}).reset_index(drop=True)

new_df['RecordType']=np.where(new_df['RecordType'] =='O', 'CONTROL', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='A', 'ENTER', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='D', 'DELETE', new_df['RecordType'])

但是,我有很多不同Symbol的并且Date希望在 for 循环中使用 groupby。我尝试 grouped = df.groupby(['Symbol', 'Date'])df替换为grouped但它没有用。另外,我意识到我的代码是索引敏感的,即它必须从索引零开始 for 循环才能工作。我不确定 groupby 是否会导致索引问题。

请帮忙。

谢谢你。

标签: pythonpandasdataframeindexing

解决方案


一个好的解决方案是使用np.where()您提到的条件并.shift(-1)与下一行进行比较。您可以添加更多条件(例如df['Symbol']列的条件)。

import pandas as pd, numpy as np
    import pandas as pd, numpy as np
df = pd.DataFrame({'Symbol': ['A2M', 'A2M', 'A2M'],
                   'Time' : ['14:00:02 678544300', '07:00:02 678544300', '07:00:02 678544300'],
                  'MessageType' : ['D', 'D', 'A'],
                  'OrderID' : ['72222771064878939976', '72222771064878939976', '72222771064878939976'],
                  'Date' : ['2020-01-02', '2020-01-02', '2020-01-02']})
df['MessageType'] = np.where((df['MessageType'] == 'D') & (df['MessageType'].shift(-1) == 'A') &
                             (df['Date'] == df['Date'].shift(-1)) & (df['Time'] == df['Time'].shift(-1)) &
                             (df['Symbol'] == df['Symbol'].shift(-1)) &
                            (df['OrderID'] == df['OrderID'].shift(-1)), 'AMEND', df['MessageType'])
df

输出:

    Symbol  Time                MessageType  OrderID                    Date
0   A2M     14:00:02 678544300  D            72222771064878939976   2020-01-02
1   A2M     07:00:02 678544300  AMEND        72222771064878939976   2020-01-02
2   A2M     07:00:02 678544300  A            72222771064878939976   2020-01-02

对于您以后的所有帖子,请考虑以下帖子:如何制作可重现的好的 pandas 示例不应该包含图像。如您所见,我被迫创建了一个示例数据框。您可以简单地将数据复制并粘贴到您的答案中(并且应该这样做),然后对其进行格式化,或者您可以将df.to_dict()其复制并粘贴到您的 SatackOverFlow 问题中。请参阅链接。


推荐阅读