首页 > 解决方案 > 如何使用 For 循环修复列并使用 python pandas 放入另一列?

问题描述

粘贴下面的代码。需要固定会计师姓名并使每个变体相同(无论哪个变体,只要每个变体都相同)。我认为有 2 个选项,1)使用字典或 2)尝试根据匹配会计师姓名的前 3 个字母来修复名称。

import pandas as pd
import numpy as np
data = {'Accountant Name':
            ['Sindman Traub LLP', 'Sindman Traub LLC', 'Sindman Traub PLLC',
             'McCrumb & Assoc.', 'McCrumb & Associates LLC', 'Lee & Mike',
             'Lee & Mike LLC', 'Lee & Mike Inc','Sindman Traub Corp'],
        'Cost':[10, 9, 15, 4, 13, 25, 2, 89, 44]}
df = pd.DataFrame(data)
df['AverageCost'] =np.nan
df['Fixed Accountant Name'] =np.nan
df = df.sort_values(by=['Accountant Name'], ascending = True)

输出 =

outputdata = {'Accountant Name':['Sindman Traub LLP', 'Sindman Traub LLC', 'Sindman Traub PLLC',
                                 'McCrumb & Assoc.', 'McCrumb & Associates LLC', 'Lee & Mike',
                                 'Lee & Mike LLC', 'Lee & Mike Inc','Sindman Traub Corp'],
              'Cost':[10, 9, 15, 4, 13, 25, 2, 89, 44],
              'Fixed Accountant Name':['Sindman Traub', 'Sindman Traub','Sindman Traub',
                                       'McCrumb and Associates', 'McCrumb and Associates',
                                       'Lee and Mike','Lee and Mike', 'Lee and Mike', 'Sindman Traub'],
              'AverageCost':[19.500000, 19.500000,19.500000,8.500000,8.500000, 38.666667,38.666667,38.666667,19.500000]}
outputdf = pd.DataFrame(outputdata)

在此处输入图像描述

标签: pythonpandasdataframe

解决方案


不知道你在问什么,所以请发布预期的输出。

也许这个?:

df['Fixed Accountant Name'] = [x[:3] for x in df['Accountant Name']]
df.groupby('Fixed Accountant Name')['Cost'].mean()
Fixed Accountant Name
Lee    38.666667
McC     8.500000
Sin    19.500000

推荐阅读