首页 > 解决方案 > 如何在两者之间放置列比较并写入csv

问题描述

我有包含浮动月份数据的csv,它是随机生成的。

我想比较 2 个月并将 % 值放在中间。如果我进行比较,则放在最后。

import pandas as pd
import numpy as np

df = pd.read_csv('Services.csv').fillna(0)
df.drop(df.columns[:1],axis=1,inplace=True)
a = len(df.columns)

for col in range(a-1):
    current = (df.iloc[:, col])
    col += 1
    per = ("per" , col)
    previous = (df.iloc[:, col])
    per = []
    for a, b in zip(current, previous):
        try:
            x = repr(round((a - b) / a * 100.0))
            per.append(x + "%")
        except ZeroDivisionError:        
            per.append(0)
    df['Variation %'] = per   
    print (df) ```

---

What I am getting is :
    June      July   September     August Variation %
0   0.000000e+00  0.000000    0.000000   0.000000           0
1  -8.840000e-04  0.137259    1.215444   1.378786      15627%
2  -2.500000e-02  0.697000    0.438000   0.834000       2888%
3   0.000000e+00  0.162507    0.000000   0.000000           0
4   0.000000e+00  0.000000    0.000000   0.000000           0
5   2.000000e-04  0.000855    0.000000   0.000000       -328%  
6  -6.661338e-16  0.000000    8.878000   6.405564        100%
7   1.216297e+01  0.029005    0.000000   0.025500        100%


what I am looking for in format like, it should compare between 2 month and place it to next column.
ex (As an image attached):
``` July    Variation%  August  Variation%  September   Variation%  October
     0.32       98%        17.58      98%     17.58      98%         17.58

预期的

标签: pythonpython-3.xpandasnumpycsv

解决方案


import numpy as np
import pandas as pd
pd.options.mode.use_inf_as_na = True # to use `inf` values as np.nan

df = pd.DataFrame({'July':[0.137259,0.697,0.162507,0,0],
                  'August':[1.378786,0.834,0,0,0],
                  'September':[16.39,0.068103,0.0255,0,0],
                  'October':[5.16466,17.58,0,0,0.580464]})


print(df)
       July    August  September    October
0  0.137259  1.378786  16.390000   5.164660
1  0.697000  0.834000   0.068103  17.580000
2  0.162507  0.000000   0.025500   0.000000
3  0.000000  0.000000   0.000000   0.000000
4  0.000000  0.000000   0.000000   0.580464
def get_variation(current, previous):
    
    one_val_zero = ((current > 0) & ~(previous > 0)) | (~(current > 0) & (previous > 0)) # is current or previous value zero ?
    both_val_zero =  ~( (current > 0) | (previous > 0) ) # is both values zero
    
    variation = pd.Series( (current - previous)/current) * 100
    
    variation[one_val_zero] = 100
    variation[both_val_zero] = 0
    variation = variation.astype(int)

    return  variation



new_df = df.copy() # its necessary or you can create separate DataFrame for Variation 
variation_col_names = []

for col in range(df.shape[1] - 1):
    variation_col_name = df.columns[col][:3] + '_' + df.columns[col+1][:3]
    variation_col_names.append(variation_col_name)

    variation_col = ( get_variation(df.iloc[:,col+1], df.iloc[:,col]) ) 
    
    new_df.insert( (2*col)+1, variation_col_name, variation_col, allow_duplicates=True )

print(new_df)
       July  Jul_Aug    August  Aug_Sep  September  Sep_Oct    October
0  0.137259       90  1.378786       91  16.390000     -217   5.164660
1  0.697000       16  0.834000    -1124   0.068103       99  17.580000
2  0.162507      100  0.000000      100   0.025500      100   0.000000
3  0.000000        0  0.000000        0   0.000000        0   0.000000
4  0.000000        0  0.000000        0   0.000000      100   0.580464
    
styled_df = new_df.style.format("{}%",subset=variation_col_names)
styled_df
       July   Jul_Aug    August   Aug_Sep  September   Sep_Oct    October
0  0.137259       90%  1.378786       91%  16.390000     -217%   5.164660
1  0.697000       16%  0.834000    -1124%   0.068103       99%  17.580000
2  0.162507      100%  0.000000      100%   0.025500      100%   0.000000
3  0.000000        0%  0.000000        0%   0.000000        0%   0.000000
4  0.000000        0%  0.000000        0%   0.000000      100%   0.580464

添加


推荐阅读