首页 > 解决方案 > 使用自定义函数时,Pandas 应用/映射在某些行不起作用

问题描述

我有以下 DataFrame,其中CMO _ ($ / MWH)是一个财务值,需要使用旧的舍入方法进行舍入(例如:754.275 必须舍入为 754.28):

    INIT_DATE  INIT_HOUR   CMO_($/MWH)
0   2020-12-01  00:00:00    754.275
1   2020-12-01  01:00:00    728.130
2   2020-12-01  02:00:00    722.575
3   2020-12-01  03:00:00    722.045
4   2020-12-01  04:00:00    721.950
5   2020-12-01  05:00:00    721.035
6   2020-12-01  06:00:00    722.100
7   2020-12-01  07:00:00    739.925
8   2020-12-01  08:00:00    771.390
9   2020-12-01  09:00:00    797.415
10  2020-12-01  10:00:00    796.585
11  2020-12-01  11:00:00    791.875
12  2020-12-01  12:00:00    782.225
13  2020-12-01  13:00:00    783.540
14  2020-12-01  14:00:00    790.980
15  2020-12-01  15:00:00    815.555
16  2020-12-01  16:00:00    824.760
17  2020-12-01  17:00:00    782.265
18  2020-12-01  18:00:00    779.970
19  2020-12-01  19:00:00    784.640
20  2020-12-01  20:00:00    785.380
21  2020-12-01  21:00:00    785.840
22  2020-12-01  22:00:00    779.775
23  2020-12-01  23:00:00    763.775

由于 Python 3+ 使用银行的舍入规则,我实现了一个自定义舍入函数,该函数使旧的舍入方法:

def _old_round(n):
    """
        Since Python 3+ the rounding is made through the Banker's Round Rule,
        this function uses the old rounding method (round up if 0.5).
        
        Args:
            n (float): A number with 3 thousands values after decimal point
        
        Returns:
            (float)  : The rounded number now with 2 decimal places, but using
                        the old round method
        Example:
            >>> _old_round(782.225)
            >>> 782.23
        Additional:
            For more information about rounding at Python 3, go to:
                https://stackoverflow.com/questions/10825926/python-3-x-rounding-behavior

    """
    # * Multiply ou number by 100 since we want to round the 2 decimal place
    new_n = n * 100
    # * Round using the Decimal lib, but using the old method
    n_dec = Decimal(new_n).quantize(Decimal('1'), rounding=ROUND_HALF_UP)
    # * Roll back to original format
    return float(n_dec / 100)

当我只使用数字时该函数正常工作(参见上面的函数示例),但是当我使用Pandas 应用或映射CMO _ ($ / MWH)列中执行计算时,某些值未正确舍入(索引 12) ,有人知道为什么吗?在下面计算后找到 DF:

df['ROUNDED_CMO'] = df['CMO_($/MWH)'].apply(_old_round)
df

结果:

    INIT_DATE  INIT_HOUR  CMO_($/MWH)   NEW_CMO
0   2020-12-01  00:00:00    754.275     754.28
1   2020-12-01  01:00:00    728.130     728.13
2   2020-12-01  02:00:00    722.575     722.58
3   2020-12-01  03:00:00    722.045     722.05
4   2020-12-01  04:00:00    721.950     721.95
5   2020-12-01  05:00:00    721.035     721.04
6   2020-12-01  06:00:00    722.100     722.10
7   2020-12-01  07:00:00    739.925     739.93
8   2020-12-01  08:00:00    771.390     771.39
9   2020-12-01  09:00:00    797.415     797.42
10  2020-12-01  10:00:00    796.585     796.59
11  2020-12-01  11:00:00    791.875     791.88
12  2020-12-01  12:00:00    782.225     782.22 <
13  2020-12-01  13:00:00    783.540     783.54
14  2020-12-01  14:00:00    790.980     790.98
15  2020-12-01  15:00:00    815.555     815.56
16  2020-12-01  16:00:00    824.760     824.76
17  2020-12-01  17:00:00    782.265     782.27
18  2020-12-01  18:00:00    779.970     779.97
19  2020-12-01  19:00:00    784.640     784.64
20  2020-12-01  20:00:00    785.380     785.38
21  2020-12-01  21:00:00    785.840     785.84
22  2020-12-01  22:00:00    779.775     779.78
23  2020-12-01  23:00:00    763.775     763.78

索引 12处的值应为: 782.23 而不是 782.22,该函数适用于所有数据,但由于某种原因不适用于此特定索引...我正在使用带有 pandas 1.1.3、Python 的 Miniconda 环境(Windows 10) 3.6.12,jupyter-notebook 6.1.4(我在这里运行代码)。

Obs:我已经尝试过使用 Math.Ceil、Numpy.Ceil、Decimal Package Round Options 以及以下解决方案:Question 1Question 2Question 3,它们都不能正常工作。我也已经尝试过(索引 12 仍未四舍五入):

df['ROUNDED_CMO'] = df['CMO_($/MWH)'].map(lambda x : _old_round(x))
df['ROUNDED_CMO'] = df['CMO_($/MWH)'].apply(lambda x : _old_round(x))

标签: pythonpython-3.xpandasdataframe

解决方案


对我来说,您的代码运行良好,也许您的包中有版本问题。我给你看:

  • 我的pandas版本:'1.0.5'。
  • 我的decimal版本:'1.70'

首先,我创建您的数据框:

import pandas as pd
from decimal import Decimal, ROUND_HALF_UP

df = pd.DataFrame({
    "CMO_($/MWH)" :
        [
        754.275,
        728.130,
        722.575,
        722.045,
        721.950,
        721.035,
        722.100,
        739.925,
        771.390,
        797.415,
        796.585,
        791.875,
        782.225,
        783.540,
        790.980,
        815.555,
        824.760,
        782.265,
        779.970,
        784.640,
        785.380,
        785.840,
        779.775,
        763.775
        ]
})

然后我应用你的功能:

df['ROUNDED_CMO'] = df['CMO_($/MWH)'].map(lambda x : _old_round(x))

输出是:

CMO_($/MWH) ROUNDED_CMO
0   754.275 754.28
1   728.130 728.13
2   722.575 722.58
3   722.045 722.05
4   721.950 721.95
5   721.035 721.04
6   722.100 722.10
7   739.925 739.93
8   771.390 771.39
9   797.415 797.42
10  796.585 796.59
11  791.875 791.88
12  782.225 782.23
13  783.540 783.54
14  790.980 790.98
15  815.555 815.56
16  824.760 824.76
17  782.265 782.27
18  779.970 779.97
19  784.640 784.64
20  785.380 785.38
21  785.840 785.84
22  779.775 779.78
23  763.775 763.78

推荐阅读