python - 熊猫:增长的简单分析(比较)和 Fillna
问题描述
以下是我每个月提供的基本数据。我得到了很多与部门相关的文件,工作变得非常单调和重复。
Month,year,sales,
January,2017,34400,
February,2017,35530,
March,2017,34920,
April,2017,35950,
May,2017,36230,
June,2017,36820,
July,2017,34590,
August,2017,36500,
September,2017,36600,
October,2017,37140,
November,2017,36790,
December,2017,43500,
January,2018,34900,
February,2018,37700,
March,2018,37900,
April,2018,38100,
May,2018,37800,
June,2018,38500,
July,2018,39400,
August,2018,39700,
September,2018,39980,
October,2018,40600,
November,2018,39100,
December,2018,46600,
January,2019,42500,
我尝试使用某些函数,如 value_count(遗憾的是,只给出摘要)来实现此输出。并且失败了。(见下面的输出。)
我需要自动填充第 3 列和第 4 列(使用 fillna=True/False)
- 第三列只是告诉它是否与上个月相比是盈亏(比如如果四月大于三月,那么它就是利润。)
- 第四列显示了盈亏的顺序,即连续 2 个月或 5 个月的利润(/亏损)。(我的意思是不断地,因为它会为团队带来某些奖项/认可。)
- 第五列是过去“n”个月内实现的最大销售额。
他们只允许 Apache OpenOffice 完成我们的工作,因此没有 Excel。但我们有 IT 安装 Python 的许可。
此链接中的解决方案对我没有帮助,因为它们按两列分组。我的输出中的列是相互依赖的。
import pandas as pd
df = pd.read_csv("Test_1.csv", "a")
df['comparative_position'] = df['sales'].diff().fillna=True
df.loc[df['comparative_position'] > 0.0, 'comparative_position'] = "Profit"
df.loc[df['comparative_position'] < 0.0, 'comparative_position'] = "Loss"
Month,Year,Sales,comparative_position,Months_in_P(or)L,Highest_in_12Months
January,2016,34400,NaN,NaN,NaN
February,2016,35530,Profit,1,NaN
March,2016,34920,Loss,1,NaN
April,2016,35950,Profit,1,NaN
May,2016,36230,Profit,2,NaN
June,2016,36820,Profit,3,NaN
July,2016,34590,Loss,1,NaN
August,2016,36500,Profit,1,NaN
September,2016,36600,Profit,2,NaN
October,2016,37140,Profit,3,NaN
November,2016,36790,Loss,1,NaN
December,2016,43500,Profit,1,43500
January,2017,34900,Loss,1,43500
February,2017,37700,Profit,1,43500
March,2017,37900,Profit,2,43500
April,2017,38100,Profit,3,43500
May,2017,37800,Loss,1,43500
June,2017,38500,Profit,1,43500
July,2017,39400,Profit,2,43500
August,2017,39700,Profit,3,43500
September,2017,39980,Profit,4,43500
October,2017,40600,Profit,5,43500
November,2017,39100,Loss,1,43500
December,2017,46600,Profit,1,46600
January,2018,42500,Loss,1,46600
解决方案
AFAIU 这应该适合你:
# Get difference from previous as True / False
df['P/L'] = df.sales > df.sales.shift()
# Add column counting 'streaks' of P or L
df['streak'] = df['P/L'].groupby(df['P/L'].ne(df['P/L'].shift()).cumsum()).cumcount()
# map True/False to string of Profit/Loss
df['P/L'] = df['P/L'].map({True:'Profit', False:'Loss'})
# max of last n months where n is 12, as in your example, you can change it to any int
df['12_max'] = df.sales.rolling(12).max()
输出:
Month year sales P/L streak 12_max
0 January 2017 34400 False 0 NaN
1 February 2017 35530 True 0 NaN
2 March 2017 34920 False 0 NaN
3 April 2017 35950 True 0 NaN
4 May 2017 36230 True 1 NaN
5 June 2017 36820 True 2 NaN
6 July 2017 34590 False 0 NaN
7 August 2017 36500 True 0 NaN
8 September 2017 36600 True 1 NaN
9 October 2017 37140 True 2 NaN
10 November 2017 36790 False 0 NaN
11 December 2017 43500 True 0 43500.0
12 January 2018 34900 False 0 43500.0
13 February 2018 37700 True 0 43500.0
14 March 2018 37900 True 1 43500.0
15 April 2018 38100 True 2 43500.0
16 May 2018 37800 False 0 43500.0
17 June 2018 38500 True 0 43500.0
18 July 2018 39400 True 1 43500.0
19 August 2018 39700 True 2 43500.0
20 September 2018 39980 True 3 43500.0
21 October 2018 40600 True 4 43500.0
22 November 2018 39100 False 0 43500.0
23 December 2018 46600 True 0 46600.0
24 January 2019 42500 False 0 46600.0
推荐阅读
- json - Angular 更新 json 文件助手
- java - OO 设计——这种设计有缺陷吗?
- javascript - 带有2个值的php下拉列表将第二个值发布到隐藏字段
- scala - 如果列表中存在,则从列中删除单词
- powershell - 使用 powershell 检查文件是否已在特定时间范围内到达
- html - 如何使弹性项目适合两行的高度?
- javascript - 具有多个 $_POST 值的 foreach - PHP
- javascript - 如何在javascript中结合正则表达式和字符串?
- node.js - 超测:检查 res.send() 之后发生的事情
- windows - 仅在启动时的 Windows ASLR