首页 > 解决方案 > 计算 pandas 数据框行中列表的平均值和标准偏差

问题描述

我正在尝试计算包含浮点数列表的 pandas 数据框列的平均值和标准差。我认为我不需要提取每个列表来计算它,所以我尝试在数据框中进行操作。令人惊讶的是,我找不到关于该特定主题的任何内容。

这是一个玩具示例来说明我的问题:

l = pd.DataFrame({'D' : [[4,5,6,6,6],[6,8,8,3]], 'R' : [[3,5,6,4,6],[6,9,9,3]]})

l1 = l.apply(pd.to_numeric).mean()
l2 = l.apply(pd.to_numeric).std()

我收到以下错误:

Traceback (most recent call last):
  File "pandas/_libs/lib.pyx", line 1892, in pandas._libs.lib.maybe_convert_numeric
TypeError: Invalid object type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pierre/Desktop/Project_inv/pr.py", line 8, in <module>
    l1 = l.apply(pd.to_numeric).mean()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 6487, in apply
    return op.get_result()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 151, in get_result
    return self.apply_standard()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 257, in apply_standard
    self.apply_series_generator()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 286, in apply_series_generator
    results[i] = self.f(v)
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 135, in to_numeric
    coerce_numeric=coerce_numeric)
  File "pandas/_libs/lib.pyx", line 1925, in pandas._libs.lib.maybe_convert_numeric
TypeError: ('Invalid object type at position 0', 'occurred at index D')

我不确定出了什么问题,有人会提示如何继续解决此问题吗?

标签: pythonpandasdataframe

解决方案


首先,我认为list在 pandas 中使用 s 并不是一个好主意

但真的需要它,是否可以通过按元素处理DataFrame.applymap

l1 = l.applymap(lambda x: np.mean(x))
print (l1)
      D     R
0  5.40  4.80
1  6.25  6.75

l2 = l.applymap(lambda x: np.std(x))
print (l2)
          D         R
0  0.800000  1.166190
1  2.046338  2.487469

所以我建议首先展平列表,例如通过DataFrame.explodefor pandas 0.25+ 然后处理:

df = pd.concat([l['D'].explode(), l['R'].explode()], axis=1).astype(int)
print (df)
   D  R
0  4  3
0  5  5
0  6  6
0  6  4
0  6  6
1  6  6
1  8  9
1  8  9

l1 = df.mean(level=0)
print (l1)
      D     R
0  5.40  4.80
1  6.25  6.75

l2 = df.std(level=0)
print (l2)
          D         R
0  0.894427  1.303840
1  2.362908  2.872281

l21 = df.std(level=0, ddof=0)
print (l21)
          D         R
0  0.800000  1.166190
1  2.046338  2.487469

推荐阅读