首页 > 解决方案 > 在 Python 中重现 Excel 平均值和舍入

问题描述

我阅读了几个关于 Python 舍入的问题,但无法重现我的 Excel 结果。

我也有一排数字和NAs:

通过打印表示(reprex.to_dict())

{'Column 1': {0: 0}, 'Column 2': {0: 0}, 'Column 3': {0: 95}, 'Column 4': {0: 2}, 'Column 5': {0: 2}, 'Column 6': {0: 0}, 'Column 7': {0: 83}, 'Column 8': {0: 95}, 'Column 9': {0: 100}, 'Column 10': {0: 90}, 'Column 11': {0: 7}, 'Column 12': {0: 0}, 'Column 13': {0: 98}, 'Column 14': {0: 97}, 'Column 15': {0: 14}, 'Column 16': {0: 1}, 'Column 17': {0: 0}, 'Column 18': {0: 3}, 'Column 19': {0: 7}, 'Column 20': {0: 9}, 'Column 21': {0: 5}, 'Column 22': {0: 6}, 'Column 23': {0: 10}, 'Column 24': {0: 4}, 'Column 25': {0: 7}, 'Column 26': {0: 5}, 'Column 27': {0: 13}, 'Column 28': {0: 3}, 'Column 29': {0: 5}, 'Column 30': {0: 0}, 'Column 31': {0: 97}, 'Column 32': {0: 96}, 'Column 33': {0: 97}, 'Column 34': {0: 98}, 'Column 35': {0: 97}, 'Column 36': {0: 100}, 'Column 37': {0: 97}, 'Column 38': {0: 97}, 'Column 39': {0: 97}, 'Column 40': {0: 91}, 'Column 41': {0: 97}, 'Column 42': {0: 5}, 'Column 43': {0: 10}, 'Column 44': {0: nan}, 'Column 45': {0: 10}, 'Column 46': {0: 7}, 'Column 47': {0: 8}, 'Column 48': {0: 6}, 'Column 49': {0: 14}, 'Column 50': {0: 22}, 'Column 51': {0: 17}, 'Column 52': {0: 8}, 'Column 53': {0: 21}, 'Column 54': {0: 19}, 'Column 55': {0: 20}, 'Column 56': {0: 18}, 'Column 57': {0: 15}, 'Column 58': {0: 19}}

Excel函数Average()给我35.85964912,四舍五入36

我检查了我是否正确地对列进行了多次子集化。

当我做

cols = df.iloc[: , 133:191]
df['score'] = cols.mean(axis = 1)

Python 给了我37.905660.

结果,Excel 中的舍入给了我38,使用

df = df.round({'Overall_mean_procedure_PB_score': 0})

2差别很大。也许NA在这里会造成问题。

如何按照 Excel 正确进行此类计算?

标签: pythonpandasaveragerounding

解决方案


重现数据:

import pandas as pd
import numpy as np
data = {
    "Column 1": {0: 0},
    "Column 2": {0: 0},
    "Column 3": {0: 95},
    "Column 4": {0: 2},
    "Column 5": {0: 2},
    "Column 6": {0: 0},
    "Column 7": {0: 83},
    "Column 8": {0: 95},
    "Column 9": {0: 100},
    "Column 10": {0: 90},
    "Column 11": {0: 7},
    "Column 12": {0: 0},
    "Column 13": {0: 98},
    "Column 14": {0: 97},
    "Column 15": {0: 14},
    "Column 16": {0: 1},
    "Column 17": {0: 0},
    "Column 18": {0: 3},
    "Column 19": {0: 7},
    "Column 20": {0: 9},
    "Column 21": {0: 5},
    "Column 22": {0: 6},
    "Column 23": {0: 10},
    "Column 24": {0: 4},
    "Column 25": {0: 7},
    "Column 26": {0: 5},
    "Column 27": {0: 13},
    "Column 28": {0: 3},
    "Column 29": {0: 5},
    "Column 30": {0: 0},
    "Column 31": {0: 97},
    "Column 32": {0: 96},
    "Column 33": {0: 97},
    "Column 34": {0: 98},
    "Column 35": {0: 97},
    "Column 36": {0: 100},
    "Column 37": {0: 97},
    "Column 38": {0: 97},
    "Column 39": {0: 97},
    "Column 40": {0: 91},
    "Column 41": {0: 97},
    "Column 42": {0: 5},
    "Column 43": {0: 10},
    "Column 44": {0: np.nan},
    "Column 45": {0: 10},
    "Column 46": {0: 7},
    "Column 47": {0: 8},
    "Column 48": {0: 6},
    "Column 49": {0: 14},
    "Column 50": {0: 22},
    "Column 51": {0: 17},
    "Column 52": {0: 8},
    "Column 53": {0: 21},
    "Column 54": {0: 19},
    "Column 55": {0: 20},
    "Column 56": {0: 18},
    "Column 57": {0: 15},
    "Column 58": {0: 19},
}

df = pd.DataFrame(data).T

df.mean()

输出:

0    35.859649
dtype: float64

似乎运作良好。我认为NA是问题所在。将其转换为 python NaN 类型。


推荐阅读