python - 在 for 循环中使用 Groupby
问题描述
如果df['Time']
和df['OrderID']
相同,并且df['MessageType']
后跟'D'
,'A'
则删除包含的行'D'
并将值重命名'A'
为'AMEND'
。这是我的代码:
import pandas as pd
Instrument = df['Symbol']
Date = df['Date']
Time = df['Time']
RecordType = df['MessageType']
Price = df['Price']
Volume = df['Quantity']
Qualifiers = df['ExchangeOrderType']
OrderID = df['OrderID']
MatchID = df['MatchID']
Side = df['Side']
for i in range(len(Time)-1):
if((Time[i] == Time[i+1]) & (RecordType[i] == "D") & (RecordType[i+1] == "A")):
del Instrument[i]
del Date[i]
del Time[i]
del RecordType[i]
del Price[i]
del Volume[i]
del Qualifiers[i]
del OrderID[i]
del Side[i]
RecordType[i+1] = "AMEND" # rename the message type
# creating a new dataframe with updated lists
new_df = pd.DataFrame({'Instrument':Instrument, 'Date':Date, 'Time':Time, 'RecordType':RecordType, 'Price':Price, 'Volume':Volume, 'Qualifiers':Qualifiers, 'OrderID':OrderID, 'MatchID':MatchID, 'Side':Side}).reset_index(drop=True)
new_df['RecordType']=np.where(new_df['RecordType'] =='O', 'CONTROL', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='A', 'ENTER', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='D', 'DELETE', new_df['RecordType'])
但是,我有很多不同Symbol
的并且Date
希望在 for 循环中使用 groupby。我尝试
grouped = df.groupby(['Symbol', 'Date'])
将df替换为grouped但它没有用。另外,我意识到我的代码是索引敏感的,即它必须从索引零开始 for 循环才能工作。我不确定 groupby 是否会导致索引问题。
请帮忙。
谢谢你。
解决方案
一个好的解决方案是使用np.where()
您提到的条件并.shift(-1)
与下一行进行比较。您可以添加更多条件(例如df['Symbol']
列的条件)。
import pandas as pd, numpy as np
import pandas as pd, numpy as np
df = pd.DataFrame({'Symbol': ['A2M', 'A2M', 'A2M'],
'Time' : ['14:00:02 678544300', '07:00:02 678544300', '07:00:02 678544300'],
'MessageType' : ['D', 'D', 'A'],
'OrderID' : ['72222771064878939976', '72222771064878939976', '72222771064878939976'],
'Date' : ['2020-01-02', '2020-01-02', '2020-01-02']})
df['MessageType'] = np.where((df['MessageType'] == 'D') & (df['MessageType'].shift(-1) == 'A') &
(df['Date'] == df['Date'].shift(-1)) & (df['Time'] == df['Time'].shift(-1)) &
(df['Symbol'] == df['Symbol'].shift(-1)) &
(df['OrderID'] == df['OrderID'].shift(-1)), 'AMEND', df['MessageType'])
df
输出:
Symbol Time MessageType OrderID Date
0 A2M 14:00:02 678544300 D 72222771064878939976 2020-01-02
1 A2M 07:00:02 678544300 AMEND 72222771064878939976 2020-01-02
2 A2M 07:00:02 678544300 A 72222771064878939976 2020-01-02
对于您以后的所有帖子,请考虑以下帖子:如何制作可重现的好的 pandas 示例
您不应该包含图像。如您所见,我被迫创建了一个示例数据框。您可以简单地将数据复制并粘贴到您的答案中(并且应该这样做),然后对其进行格式化,或者您可以将df.to_dict()
其复制并粘贴到您的 SatackOverFlow 问题中。请参阅链接。