首页 > 解决方案 > 熊猫:增长的简单分析(比较)和 Fillna

问题描述

以下是我每个月提供的基本数据。我得到了很多与部门相关的文件,工作变得非常单调和重复。

Month,year,sales,  
January,2017,34400,  
February,2017,35530,  
March,2017,34920,  
April,2017,35950,  
May,2017,36230,  
June,2017,36820,  
July,2017,34590,  
August,2017,36500,  
September,2017,36600,  
October,2017,37140,  
November,2017,36790,  
December,2017,43500,  
January,2018,34900,  
February,2018,37700,  
March,2018,37900,  
April,2018,38100,  
May,2018,37800,  
June,2018,38500,  
July,2018,39400,  
August,2018,39700,  
September,2018,39980,  
October,2018,40600,  
November,2018,39100,  
December,2018,46600,  
January,2019,42500,  

我尝试使用某些函数,如 value_count(遗憾的是,只给出摘要)来实现此输出。并且失败了。(见下面的输出。)

我需要自动填充第 3 列和第 4 列(使用 fillna=True/False)

他们只允许 Apache OpenOffice 完成我们的工作,因此没有 Excel。但我们有 IT 安装 Python 的许可。

链接中的解决方案对我没有帮助,因为它们按两列分组。我的输出中的列是相互依赖的。

import pandas as pd
df = pd.read_csv("Test_1.csv", "a")
df['comparative_position'] = df['sales'].diff().fillna=True
df.loc[df['comparative_position'] > 0.0, 'comparative_position'] = "Profit" 
df.loc[df['comparative_position'] < 0.0, 'comparative_position'] = "Loss" 

Month,Year,Sales,comparative_position,Months_in_P(or)L,Highest_in_12Months  
January,2016,34400,NaN,NaN,NaN  
February,2016,35530,Profit,1,NaN  
March,2016,34920,Loss,1,NaN  
April,2016,35950,Profit,1,NaN  
May,2016,36230,Profit,2,NaN  
June,2016,36820,Profit,3,NaN  
July,2016,34590,Loss,1,NaN  
August,2016,36500,Profit,1,NaN  
September,2016,36600,Profit,2,NaN  
October,2016,37140,Profit,3,NaN  
November,2016,36790,Loss,1,NaN  
December,2016,43500,Profit,1,43500  
January,2017,34900,Loss,1,43500  
February,2017,37700,Profit,1,43500  
March,2017,37900,Profit,2,43500  
April,2017,38100,Profit,3,43500  
May,2017,37800,Loss,1,43500  
June,2017,38500,Profit,1,43500  
July,2017,39400,Profit,2,43500  
August,2017,39700,Profit,3,43500  
September,2017,39980,Profit,4,43500  
October,2017,40600,Profit,5,43500  
November,2017,39100,Loss,1,43500  
December,2017,46600,Profit,1,46600  
January,2018,42500,Loss,1,46600 

标签: pythonpandasdataframecomparison

解决方案


AFAIU 这应该适合你:

# Get difference from previous as True / False
df['P/L'] = df.sales > df.sales.shift()
# Add column counting 'streaks' of P or L
df['streak'] = df['P/L'].groupby(df['P/L'].ne(df['P/L'].shift()).cumsum()).cumcount()
# map True/False to string of Profit/Loss
df['P/L'] = df['P/L'].map({True:'Profit', False:'Loss'})
# max of last n months where n is 12, as in your example, you can change it to any int
df['12_max'] = df.sales.rolling(12).max()

输出:

    Month      year     sales       P/L  streak  12_max
0   January     2017    34400       False   0   NaN
1   February    2017    35530       True    0   NaN
2   March       2017    34920       False   0   NaN
3   April       2017    35950       True    0   NaN
4   May         2017    36230       True    1   NaN
5   June        2017    36820       True    2   NaN
6   July        2017    34590       False   0   NaN
7   August      2017    36500       True    0   NaN
8   September   2017    36600       True    1   NaN
9   October     2017    37140       True    2   NaN
10  November    2017    36790       False   0   NaN
11  December    2017    43500       True    0   43500.0
12  January     2018    34900       False   0   43500.0
13  February    2018    37700       True    0   43500.0
14  March       2018    37900       True    1   43500.0
15  April       2018    38100       True    2   43500.0
16  May         2018    37800       False   0   43500.0
17  June        2018    38500       True    0   43500.0
18  July        2018    39400       True    1   43500.0
19  August      2018    39700       True    2   43500.0
20  September   2018    39980       True    3   43500.0
21  October     2018    40600       True    4   43500.0
22  November    2018    39100       False   0   43500.0
23  December    2018    46600       True    0   46600.0
24  January     2019    42500       False   0   46600.0

推荐阅读