首页 > 解决方案 > SAS to Python中的Retain Statement、Groupby和IF条件

问题描述

为了给出一个要点,我试图标记一个满足 GROUPBY 和 IF 条件的变量“RUN”。下面是我将 SAS 代码转换为 Python 的代码,并在下面的 SAS 逻辑中实现了

data data_v1;
retain run;
set data;
by plant material;
if first.material then do;
    if sales_quantity = 0 then run = 1; else run = 0;
end;
*keep plant material sales_quantity run;
else do;
    if run > 0 then do;
        if sales_quantity = 0 or (sales_quantity < 0.01 * Annual_Sales and sales_quantity <= 9)
        then run = run + 1; else run = 0;
    end;
    else do;
        if sales_quantity = 0 then run = 1; else run = 0;
    end;
end;

跑;

我在 Python 中创建了一个示例数据框并实现了逻辑,但我无法获得正确的输出。

如何将 GROUPBY 应用于 FOR 循环?还建议是否有更好的方法。

import pandas as pd
import numpy as np
df=pd.DataFrame()
df['plant']=['a','a','a','a','b','b','b','b','b','b','b','c','c','c','c','c','c','c','d','d','d','d','d','d']
df['mater']=['x','x','x','y','x','x','x','y','y','y','y','x','x','x','x','y','y','y','y','y','x','x','x','x']
df['salqty']=[0,0,0,10,11,12,13,0,0,13,0,13,0,0,1,0,0,0,1,2,3,0,0,0] 
df['plantmaterial'] = df["plant"].map(str) + df["mater"]
df['annual_sales']=0.01

df['flag'] = ((df.plantmaterial != df.plantmaterial.shift()) ).astype(int)
df['run']=0

for i in range(0, len(df)):
        if df.loc[i,'flag'].any() == 1:
            if df.loc[i,'sales_quantity'].any() == 0: 
                df.loc[i,'run'] = 1
            else:
                df.loc[i,'run'] = 0
        else:
            if df.loc[i-1,'run'].any() > 0:
                if ((df.loc[i,'sales_quantity'].any() == 0) or ((df.loc[i,'sales_quantity'].any() < 0.01 * df.loc[i,'annual_sales']) and (df.loc[i,'sales_quantity'].any() <= 9))):
                    df.loc[i,'run'] = df.loc[i-1,'run'] + 1
                else:
                    df.loc[i,'run'] = 0
            else:
                if df.loc[i,'sales_quantity'].any() == 0:
                    df.loc[i,'run'] = 1
                else:
                    df.loc[i,'run'] = 0

标签: python-3.xsas

解决方案


推荐阅读