首页 > 解决方案 > 在函数中包含 groupby 语句 - Python

问题描述

以下函数计算圆的不同段内的点数。在导出单个时间点的计数时,此功能按预期工作。但是,当尝试使用 groupby 调用在不同时间点导出此计数时,它仍然会将所有计数组合到单个输出中。

import pandas as pd
import numpy as np

df = pd.DataFrame({   
        'Time' : ['19:50:10.1','19:50:10.1','19:50:10.1','19:50:10.1','19:50:10.2','19:50:10.2','19:50:10.2','19:50:10.2'],             
        'id' : ['A','B','C','D','A','B','C','D'],                 
        'x' : [1,8,0,-5,1,-1,-6,0],
        'y' : [-5,2,-5,2,5,-5,-2,2],
        'X2' : [0,0,0,0,0,0,0,0],
        'Y2' : [0,0,0,0,0,0,0,0],   
        'Angle' : [0,0,0,0,0,0,0,0],                 
    })

def checkPoint(x, y, rotation_angle, refX, refY, radius = 10):

    section_angle_start = [(i + rotation_angle - 45) for i in [0, 90, 180, 270, 360]]

    Angle = np.arctan2(x-refX, y-refY) * 180 / np.pi
    Angle = Angle % 360

    # adjust range
    if Angle > section_angle_start[-1]:
        Angle -= 360
    elif Angle < section_angle_start[0]:
        Angle += 360

    for i in range(4):
        if section_angle_start[i] < Angle < section_angle_start[i+1]:
            break
    else:
         i = 0

    return i+1  

tmp = []
result = []

以下是我尝试将checkPoint函数传递给Time.

for group in df.groupby('Time'):

    for i, row in df.iterrows():
    
        seg = checkPoint(row.x, row.y, row.Angle, row.X2, row.Y2)

        tmp.append(seg)
    
    result.append([tmp.count(i) for i in [1,2,3,4]])

df = pd.DataFrame(result, columns = ['1','2','3','4'])

出去:

   1  2  3  4
0  2  1  3  2
1  4  2  6  4

意图:

   1  2  3  4
0  0  1  2  1
1  2  0  1  1

标签: pythonpandas

解决方案


您的内部循环正在运行您的整个 DataFrame,并生成您正在观察的重复计数。

正如@Kenan 建议的那样,您可以将内部循环限制为组:

for group in df.groupby('Time'):

    for i, row in group[1].iterrows():

        seg = checkPoint(row.x_live, row.y_live, row.Angle, row.BallX, row.BallY)

        tmp.append(seg)

    result.append([tmp.count(i) for i in [1,2,3,4]])

df_result = pd.DataFrame(result, columns = ['1','2','3','4'])
print(df_result)

导致

   1  2  3  4
0  0  1  2  1
1  2  1  3  2

或者您可以使用 groupby-apply 构造来避免显式循环:

def result(g):
    tmp = []
    for i, row in g.iterrows():
        seg = checkPoint(row.x_live, row.y_live, row.Angle, row.BallX, row.BallY)
        tmp.append(seg)
    return pd.Series([tmp.count(i) for i in [1,2,3,4]], index=[1,2,3,4])

print(df.groupby('Time').apply(result))

这让你:

            1  2  3  4
Time                  
19:50:10.1  0  1  2  1
19:50:10.2  2  0  1  1

推荐阅读