首页 > 解决方案 > 根据其他 col 值填充和移动 col 的值

问题描述

我有一个这样的数据集

number  Shipment Date   service desc    amount
182692345   2/12/19 DUTIES & TAXES      
            IMPORT EXPORT DUTIES    561.01
            IMPORT EXPORT TAXES 600.47
1827975839  2/12/19 DUTIES & TAXES      
            IMPORT EXPORT DUTIES    160.19
3229475633  2/12/19 DUTIES & TAXES      
            IMPORT EXPORT TAXES 600.47
            IMPORT EXPORT DUTIES    561.01
5733894261  29/04/2020  Express     
            DUTIES TAXES PAID   25
            FUEL SURCHARGE  3.28
1826995520  2/12/19 DUTIES & TAXES      
            IMPORT EXPORT TAXES 600.47
            IMPORT EXPORT DUTIES    561.01
2998455062  4/5/20  Express     
            FUEL SURCHARGE  0.72

在 pic 格式中,它如下所示:

在此处输入图像描述

我想要的是对于存在 number 和 shipping_date 的行,我们检查存在“Express”的服务。然后对于这样的行,我想将 desc col 中的“Fuel Surcharge”行拉到与数字相同的行shipment_date以及相应的金额值。

所以像下面这样:

number  Shipment Date   service desc    amount
182692345   2/12/19 DUTIES & TAXES      
            IMPORT EXPORT DUTIES    561.01
            IMPORT EXPORT TAXES 600.47
1827975839  2/12/19 DUTIES & TAXES      
            IMPORT EXPORT DUTIES    160.19
3229475633  2/12/19 DUTIES & TAXES      
            IMPORT EXPORT TAXES 600.47
            IMPORT EXPORT DUTIES    561.01
5733894261  29/04/2020  Express FUEL SURCHARGE  3.28
            DUTIES TAXES PAID   25
                
1826995520  2/12/19 DUTIES & TAXES      
            IMPORT EXPORT TAXES 600.47
            IMPORT EXPORT DUTIES    561.01
2998455062  4/5/20  Express FUEL SURCHARGE  0.72

它看起来像下面的图片格式。

在此处输入图像描述

最后,我只关心服务为“Express”的行,所以如果我们去掉所有服务不表达的行并获得上述格式(仅适用于 Express 值),那将是理想的。

我认为 pandasffill()和 transform 将是主要工具。所以我正在尝试以下内容:

df1=df.copy()

df1[['number', 'shipment_date']]=df1[['number', 'shipment_date']].ffill()
df1.desc=df1.desc.fillna('')
df1.amount= df1.amount.fillna('')

s= df1.groupby(['number', 'shipment_date']).amount.transform(lambda x: ' '.join(str(x)))

df.loc[df.shipment_date.notnull(),'amount']=s
df.loc[df.shipment_date.isnull(),'amount']=''

标签: pythonpandas

解决方案


用 填充空白行fillna(method='ffill'),由服务提取,并由 获取shift(-1)。这符合问题的意图吗?

df['service'] = df['service'].fillna(method='ffill')
df = df[df['service'] == 'Express']
df[['number','Shipment Date']] = df[['number','Shipment Date']].fillna(method='ffill')
df[['desc','amount']] = df[['desc','amount']].shift(-1)
df
    number  Shipment Date   service desc    amount
8   5.733894e+09    29/04/2020  Express DUTIES TAXES PAID   25.00
9   5.733894e+09    29/04/2020  Express FUEL SURCHARGE  3.28
10  5.733894e+09    29/04/2020  Express NaN NaN
14  2.998455e+09    4/5/20  Express FUEL SURCHARGE  0.72
15  2.998455e+09    4/5/20  Express NaN NaN

推荐阅读