python - 如何在两者之间放置列比较并写入csv
问题描述
我有包含浮动月份数据的csv,它是随机生成的。
我想比较 2 个月并将 % 值放在中间。如果我进行比较,则放在最后。
import pandas as pd
import numpy as np
df = pd.read_csv('Services.csv').fillna(0)
df.drop(df.columns[:1],axis=1,inplace=True)
a = len(df.columns)
for col in range(a-1):
current = (df.iloc[:, col])
col += 1
per = ("per" , col)
previous = (df.iloc[:, col])
per = []
for a, b in zip(current, previous):
try:
x = repr(round((a - b) / a * 100.0))
per.append(x + "%")
except ZeroDivisionError:
per.append(0)
df['Variation %'] = per
print (df) ```
---
What I am getting is :
June July September August Variation %
0 0.000000e+00 0.000000 0.000000 0.000000 0
1 -8.840000e-04 0.137259 1.215444 1.378786 15627%
2 -2.500000e-02 0.697000 0.438000 0.834000 2888%
3 0.000000e+00 0.162507 0.000000 0.000000 0
4 0.000000e+00 0.000000 0.000000 0.000000 0
5 2.000000e-04 0.000855 0.000000 0.000000 -328%
6 -6.661338e-16 0.000000 8.878000 6.405564 100%
7 1.216297e+01 0.029005 0.000000 0.025500 100%
what I am looking for in format like, it should compare between 2 month and place it to next column.
ex (As an image attached):
``` July Variation% August Variation% September Variation% October
0.32 98% 17.58 98% 17.58 98% 17.58
解决方案
import numpy as np
import pandas as pd
pd.options.mode.use_inf_as_na = True # to use `inf` values as np.nan
df = pd.DataFrame({'July':[0.137259,0.697,0.162507,0,0],
'August':[1.378786,0.834,0,0,0],
'September':[16.39,0.068103,0.0255,0,0],
'October':[5.16466,17.58,0,0,0.580464]})
print(df)
July August September October
0 0.137259 1.378786 16.390000 5.164660
1 0.697000 0.834000 0.068103 17.580000
2 0.162507 0.000000 0.025500 0.000000
3 0.000000 0.000000 0.000000 0.000000
4 0.000000 0.000000 0.000000 0.580464
def get_variation(current, previous):
one_val_zero = ((current > 0) & ~(previous > 0)) | (~(current > 0) & (previous > 0)) # is current or previous value zero ?
both_val_zero = ~( (current > 0) | (previous > 0) ) # is both values zero
variation = pd.Series( (current - previous)/current) * 100
variation[one_val_zero] = 100
variation[both_val_zero] = 0
variation = variation.astype(int)
return variation
new_df = df.copy() # its necessary or you can create separate DataFrame for Variation
variation_col_names = []
for col in range(df.shape[1] - 1):
variation_col_name = df.columns[col][:3] + '_' + df.columns[col+1][:3]
variation_col_names.append(variation_col_name)
variation_col = ( get_variation(df.iloc[:,col+1], df.iloc[:,col]) )
new_df.insert( (2*col)+1, variation_col_name, variation_col, allow_duplicates=True )
print(new_df)
July Jul_Aug August Aug_Sep September Sep_Oct October
0 0.137259 90 1.378786 91 16.390000 -217 5.164660
1 0.697000 16 0.834000 -1124 0.068103 99 17.580000
2 0.162507 100 0.000000 100 0.025500 100 0.000000
3 0.000000 0 0.000000 0 0.000000 0 0.000000
4 0.000000 0 0.000000 0 0.000000 100 0.580464
styled_df = new_df.style.format("{}%",subset=variation_col_names)
styled_df
July Jul_Aug August Aug_Sep September Sep_Oct October
0 0.137259 90% 1.378786 91% 16.390000 -217% 5.164660
1 0.697000 16% 0.834000 -1124% 0.068103 99% 17.580000
2 0.162507 100% 0.000000 100% 0.025500 100% 0.000000
3 0.000000 0% 0.000000 0% 0.000000 0% 0.000000
4 0.000000 0% 0.000000 0% 0.000000 100% 0.580464
添加
推荐阅读
- installation - 分离 clickOnce setup.exe
- javascript - localhost : 提供的 postMessage 目标来源与收件人窗口的来源不匹配
- python - 查找和替换 numpy 数组上的浮点数
- c++ - C++ 冒泡排序
- node.js - 如何从 userId 获取用户名以显示在帖子中?
- swift - 无法关闭视图控制器
- python - Ansible:如何使用字符串范围连接变量
- google-cloud-firestore - Firestore - 我的场景的最佳数据结构是什么?
- prolog - 如何在 Prolog 中处理关联数组?
- java - 使用 JPA 而不是手动计算特定的枚举值