首页 > 解决方案 > 如何找到嵌入在 Pandas 数据框列中的元素列表的平均值

问题描述

我正在清理数据框,一个特定的列包含由列表组成的值。我试图找到这些列表的平均值并用 int 更新现有列,同时保留索引。我可以成功有效地将这些值转换为列表,但在此过程中我丢失了索引值。我在下面编写的代码太占用内存而无法执行。有没有更简单的代码可以工作?

数据:https ://docs.google.com/spreadsheets/d/1Od7AhXn9OwLO-SryT--erqOQl_NNAGNuY4QPSJBbI18/edit?usp=sharing

def Average(lst):
    sum1 = 0
    average = 0
    if len(x) == 1:
        for obj in x:
            sum1 = int(obj)

    if len(x)>1:
        for year in x:
            sum1 += int(year)
        average = sum1/len(x)

    return mean(average) 

hello = hello[hello.apply([lambda x: mean(x) for x in hello])]

这是我用来将值转换为列表的循环:

df_list1 = []

for x in hello:
        sum1 = 0
        average = 0
        if len(x) == 1:
            for obj in x:
                df_list1.append(int(obj))

        if len(x)>1:
            for year in x:
                sum1 += int(year)
                average = sum1/len(x)
            df_list1.append(int(average))

标签: pythonpandasdata-munging

解决方案


使用applynp.mean

import numpy as np

df = pd.DataFrame(data={'listcol': [np.random.randint(1, 10, 5) for _ in range(3)]}, index=['a', 'b', 'c'])

# np.mean will return NaN on empty list
df['listcol'] = df['listcol'].fillna([])

# can use this if all elements in lists are numeric
df['listcol'] = df['listcol'].apply(lambda x: np.mean(x))

# use this instead if list has numbers stored as strings
df['listcol'] = df['listcol'].apply(lambda x: np.mean([int(i) for i in x])) 

输出

>>>df
   listcol
a      5.0
b      5.2
c      4.4

推荐阅读