首页 > 解决方案 > 根据 Pandas Dataframe 中的数据计算平均消耗

问题描述

我有一个数据框,我需要计算每个引擎的平均消耗量。

    iterables = [['A123B'], ['2021-03-04 10:10:17', '2021-03-04 11:18:51', '2021-03-04 12:50:24', 
                             '2021-03-04 13:02:02', '2021-03-04 14:37:23']]
    control_id = [1, 2, 3, 4, 5]
    index = pd.MultiIndex.from_product(iterables, names=["ENGINE_ID", "TIME"])
    steps = [354815, 355160, 355428, 357850, 358314]
    quantity = [156.32, 85.49, 100.00, 157.02, 134.00]
    full = [1, 0, 0, 1, 0]
    dict = {'CONTROL_ID':control_id, 'STEPS':steps, 'QUANTITY':quantity, 'FULL':full}
    df = pd.DataFrame(dict, index=index)
ENGINE_ID 时间 CONTROL_ID 脚步 数量 满的
A123B 2021-03-04 10:10:17 1 354815 156.32 1
2021-03-04 11:18:51 2 355160 85.49 0
2021-03-04 12:50:24 3 355428 100.00 0
2021-03-04 13:02:02 4 357850 157.02 1
2021-03-04 14:37:23 5 358314 134.00 0

目标是计算发动机已满的步骤之间的差异,除以数量之和。与上表一样,考虑到 CONTROL_ID = 5,步骤之间的差异是(357850 - 354815) = 3035和数量(85.49 + 100.00 + 157.02) = 342.51,平均消耗量是3035/342.51 = 8.86。在此示例中,预期结果将如下表所示。我有一个包含多个引擎和步骤的数据框。

ENGINE_ID 时间 CONTROL_ID 脚步 数量 满的 平均
A123B 2021-03-04 10:10:17 1 354815 156.32 1 0
2021-03-04 11:18:51 2 355160 85.49 0 0
2021-03-04 12:50:24 3 355428 100.00 0 0
2021-03-04 13:02:02 4 357850 157.02 1 8.86
2021-03-04 14:37:23 5 358314 134.00 0 0

如何计算并插入整个数据框的 AVERAGE 列?我在此处和 Pandas 文档中查找了类似的示例,但我没有找到从哪里开始。

谢谢!

标签: pythonpandas

解决方案


让我们尝试这样的事情:

import pandas as pd
import numpy as np

iterables = [['A123B'], ['2021-03-04 10:10:17', '2021-03-04 11:18:51',
                         '2021-03-04 12:50:24', '2021-03-04 13:02:02',
                         '2021-03-04 14:37:23']]
control_id = [1, 2, 3, 4, 5]
index = pd.MultiIndex.from_product(iterables, names=["ENGINE_ID", "TIME"])
steps = [354815, 355160, 355428, 357850, 358314]
quantity = [156.32, 85.49, 100.00, 157.02, 134.00]
full = [1, 0, 0, 1, 0]
d = {'CONTROL_ID': control_id, 'STEPS': steps, 'QUANTITY': quantity, 'FULL': full}
df = pd.DataFrame(d, index=index)

# Boolean Index for where FULL == 1
full_m = df.FULL.eq(1)
# Get Values Needed For Average For Each Group Between Fulls
sums = df.assign(
    # Difference Between This and Previous FULL == 1 Rows
    STEP_DIFF=df.loc[full_m, 'STEPS'] - df.loc[full_m, 'STEPS'].shift()
).groupby(
    # Create Groups Starting With Row After FULL == 1 ending with next FULL == 1
    df.FULL.shift().cumsum().fillna(0)
)[['STEP_DIFF', 'QUANTITY']].transform('sum')

# Place in the Averages or 0s
df['AVERAGE'] = np.where(full_m, sums.STEP_DIFF / sums.QUANTITY, 0)

# For Display
print(df.to_string())

输出:

                               CONTROL_ID 步数全平均
ENGINE_ID 时间                                                             
A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000
          2021-03-04 11:18:51 2 355160 85.49 0 0.000000
          2021-03-04 12:50:24 3 355428 100.00 0 0.000000
          2021-03-04 13:02:02 4 357850 157.02 1 8.861055
          2021-03-04 14:37:23 5 358314 134.00 0 0.000000

推荐阅读