首页 > 解决方案 > 根据另一个中的值自行创建列

问题描述

这与我昨天提出的问题非常相似。目的是能够添加一个功能,该功能将允许根据另一个显示的值创建列。例如,当它在指定文件中找到一个国家代码时,我希望它创建一个名为“国家代码总计”的列,并将具有相同国家代码的每一行的单位数量相加

这是我的脚本目前输出的内容:

脚本输出

我想看到的:

目标

我的脚本:

df['Sum of Revenue'] = df['Units Sold'] * df['Dealer Price']
    df['AR Revenue'] = df[]
    df = df.sort_values(['End Consumer Country', 'Currency Code'])
    # Sets first value of index by position
    df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
    # Sets first value of index by position
    df.loc[df.index[0], 'Total Revenue'] = df['Sum of Revenue'].sum()
    # Sums the amout of Units with the End Consumer Country AR
    df['AR Total'] = df.loc[df['End Consumer Country'] == 'AR', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country AU
    df['AU Total'] = df.loc[df['End Consumer Country'] == 'AU', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country NZ
    df['NZ Total'] = df.loc[df['End Consumer Country'] == 'NZ', 'Units Sold'].sum()

但是,由于我知道该文件中将出现的国家/地区,因此我已将它们相应地添加到我的脚本中以查找。我将如何编写我的脚本,以便如果它找到另一个国家代码,例如 GB,它将创建一个名为“GB Total”的列,并将国家代码设置为 GB 的每一行的单位相加。

任何帮助将不胜感激!

标签: pythonpandas

解决方案


如果你真的需要这种格式,那么我将如何继续(下面的起始数据):

# Get those first two columns
d = {'Sum of Revenue': 'Total Revenue', 'Units Sold': 'Total Sold'}
for col, newcol in d.items():
    df.loc[df.index[0], newcol] = df[col].sum()

# Add the rest for every country:
s = df.groupby('End Consumer Country')['Units Sold'].sum().to_frame().T.add_suffix(' Total')
s.index = [df.index[0]]

df  = pd.concat([df, s], 1, sort=False)

输出df::

  End Consumer Country  Sum of Revenue  Units Sold  Total Revenue  Total Sold  AR Total  AU Total  NZ Total  US Total
a                   AR       13.486216           1     124.007334        28.0       3.0       7.0      11.0       7.0
b                   AR       25.984073           2            NaN         NaN       NaN       NaN       NaN       NaN
c                   AU       21.697871           3            NaN         NaN       NaN       NaN       NaN       NaN
d                   AU       10.962232           4            NaN         NaN       NaN       NaN       NaN       NaN
e                   NZ       16.528398           5            NaN         NaN       NaN       NaN       NaN       NaN
f                   NZ       29.908619           6            NaN         NaN       NaN       NaN       NaN       NaN
g                   US        5.439925           7            NaN         NaN       NaN       NaN       NaN       NaN

如您所见,pandas添加了一堆NaN值,因为我们只为第一行分配了一些东西,并且 aDataFrame必须是矩形


DataFrame有一个不同的总结总数和每个国家/地区的情况要简单得多。如果这没问题,那么一切都简化为一个.pivot_table

df.pivot_table(index='End Consumer Country', 
               values=['Sum of Revenue', 'Units Sold'],
               margins=True,
               aggfunc='sum').T.add_suffix(' Total)

输出:

End Consumer Country   AR Total   AU Total   NZ Total  US Total   All Total
Sum of Revenue        39.470289  32.660103  46.437018  5.439925  124.007334
Units Sold             3.000000   7.000000  11.000000  7.000000   28.000000

相同的信息,编码更简单。


样本数据:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'End Consumer Country': ['AR', 'AR', 'AU', 'AU', 'NZ', 'NZ', 'US'],
                   'Sum of Revenue': np.random.normal(20,6,7),
                   'Units Sold': np.arange(1,8,1)},
                   index = list('abcdefg'))

  End Consumer Country  Sum of Revenue  Units Sold
a                   AR       13.486216           1
b                   AR       25.984073           2
c                   AU       21.697871           3
d                   AU       10.962232           4
e                   NZ       16.528398           5
f                   NZ       29.908619           6
g                   US        5.439925           7

推荐阅读