python - Python帮助优化这个功能
问题描述
data =
Symbol Value Day
0 AACG 1.8708 1
1 AACG 1.8500 2
2 AACG 1.8869 3
3 AACG 1.8200 4
4 AACG 1.8578 5
... ... ... ...
3407024 ZYXI 5.25 1
3407025 ZYXI 4.96 2
3407026 ZYXI 4.99 3
3407027 ZYXI 4.99 4
3407028 ZYXI 4.95 5
... ... ... ...
3407250 ZYXI 8.1500 227
3407251 ZYXI 8.2600 228
3407252 ZYXI 8.3900 229
3407253 ZYXI 8.1200 230
3407254 ZYXI 8.0700 231
import pandas as pd
import numpy as np
for index, row in data.iterrows():
for i in range(1, 91):
cstr = 'day-' + str(i)
val = 'NaN'
try:
val = float(data[np.logical_and(data['Symbol'] == row['Symbol'],
data['Day'] == row['Day'] - i)].Value)
except:
val = 'NaN'
data.loc[index,cstr] = val
该函数循环遍历数据框中的每一行
对于数据框中的每一行,它循环 90 次 (i)
对于每个循环,它会添加一个带有值的列
value 是数据框中的值,其符号与行相同,但天为行中的天减去 i
output =
Symbol Value Day day-1 day-2 day-3 day-4... day-89 day-90
0 AACG 1.8708 1 NaN NaN NaN NaN
1 AACG 1.8500 2 1.8708 NaN NaN NaN
2 AACG 1.8869 3 1.8500 1.8708 NaN NaN
3 AACG 1.8200 4 1.8869 1.8500 1.8708 NaN
4 AACG 1.8578 5 1.8200 1.8869 1.8500 1.8708
5 AACG 1.8709 6 1.8578 1.8200 1.8869 1.8500
6 AACG 1.8700 7 1.8709 1.8578 1.8200 1.8869
7 AACG 1.8800 8 1.8700 1.8709 1.8578 1.8200
8 AACG 1.8000 9 1.8800 1.8700 1.8709 1.8578
9 AACG 1.7900 10 1.8000 1.8800 1.8700 1.8709
解决方案
尝试使用shift
和pd.concat
N = 5
df_new = pd.DataFrame()
for i,grp in df.groupby('Symbol'):
l = pd.concat([grp['Value'].shift(i).rename(f'Day_{i}') for i in range(1,N)], axis=1)
final_df = pd.concat([grp, l], axis=1)
df_new = df_new.append(final_df)
或者
def f(x):
x['Day-0'] = x['Value']
for i in range(1,N+1):
x[f'Day-{i}'] = x[f'Day-{i-1}'].shift()
x.drop('Day-0', inplace=True ,axis=1)
return x
final_df = df.groupby('Symbol').apply(f)
**final_df:"