首页 > 解决方案 > 使用 python pandas 对值的数量进行操作

问题描述

我有 2 个数据框“交易”和“偏移”

偏移量:

    Contact Account Name    
0   TODD HOWARD 
1   TODD HOWARD 
2   JEFF COX
3   JEFF COX    
4   TODD HOWARD 
5   JEFF COX    
6   MIKE BALDWIN    

交易:

    Contact Account Name    
0   TODD HOWARD 
1   TODD HOWARD     
2   JEFF COX    
3   JEFF COX    
4   TODD HOWARD     
5   JEFF COX    
6   TODD HOWARD     
7   MIKE BALDWIN    
8   MIKE BALDWIN
9   JEFF COX    
10  JC WHITE    

它想要做什么:1)是计算每个唯一值。为此,我使用了:

df1 = offsets.groupby('Contact Account Name').size()
df2 = transactions.groupby('Contact Account Name').size()

我有

df1:

Contact Account Name
TODD HOWARD               3
JEFF COX                  3
MIKE BALDWIN              1

df2:

Contact Account Name
JC WHITE                  1
TODD HOWARD               4
JEFF COX                  4
MIKE BALDWIN              2

2)我想合并两个数据框。我试过merge了,但没有用。

3)我想创建另一个数据框并计算总交易中偏移量的百分比。

最后我想看到什么结果:

Contact Account Name      Offset Percentage
TODD HOWARD               75
JEFF COX                  75
MIKE BALDWIN              50
JC WHITE                  100

提前致谢!

标签: pythonpandas

解决方案


聚合的输出是Series,因此可以div用多个 bymul和 last进行除法reset_index

df = df1.div(df2, fill_value=1).mul(100).reset_index(name='Offset Percentage')
print (df)
  Contact Account Name  Offset Percentage
0             JC WHITE              100.0
1             JEFF COX               75.0
2         MIKE BALDWIN               50.0
3          TODD HOWARD               75.0

类似的解决方案value_counts

df1 = offsets['Contact Account Name'].value_counts()
df2 = transactions['Contact Account Name'].value_counts()

df = (df1.div(df2, fill_value=1)
         .mul(100)
         .rename_axis('Contact Account Name')
         .reset_index(name='Offset Percentage'))
print (df)
  Contact Account Name  Offset Percentage
0             JC WHITE              100.0
1             JEFF COX               75.0
2         MIKE BALDWIN               50.0
3          TODD HOWARD               75.0

如果需要将两个系列一起加入,请致电concat

df = pd.concat([df2, df1], axis=1, keys=('Offset Percentage','b'))
df['Offset Percentage'] = df.b.div(df['Offset Percentage'], fill_value=1).mul(100)
df = df.drop('b', 1).rename_axis('Contact Account Name').reset_index()
print (df)
  Contact Account Name  Offset Percentage
0             JC WHITE              100.0
1             JEFF COX               75.0
2         MIKE BALDWIN               50.0
3          TODD HOWARD               75.0

推荐阅读