python - 创建一个列,它是熊猫数据框中多列的平均值
问题描述
因此,我查看了多种潜在的解决方案,但似乎都没有奏效。
基本上,我想在我的数据框中创建一个新列,它是多个其他列的平均值。我希望这个平均值排除 NaN 值,但即使行中有 NaN 值仍然计算平均值。
我有一个看起来像这样的数据框(但实际上是 Q222-229):
ID Q1 Q2 Q3 Q4 Q5
1 4 NaN NaN NaN NaN
2 5 7 8 NaN NaN
3 7 1 2 NaN NaN
4 2 2 3 4 1
5 1 3 NaN NaN NaN
我想创建一个列,它是 Q1、Q2、Q3、Q4、Q5 的平均值,即:
ID Q1 Q2 Q3 Q4 Q5 avg_age
1 4 NaN NaN NaN NaN 4
2 5 7 8 NaN NaN 5.5
3 7 1 2 NaN NaN 3.5
4 2 2 3 4 1 2
5 1 3 NaN NaN NaN 2
(忽略值)
但是,我尝试过的每种方法都会在 avg_age 列中返回 NaN 值,这让我认为当忽略 NaN 值时,pandas 会忽略整行。但我不希望这种情况发生,而是希望返回平均值并忽略 NaN 值。
这是我到目前为止所尝试的:
1.
avg_age = s.loc[: , "Q222":"Q229"]
avg_age = avg_age.mean(axis=1)
s = pd.concat([s, avg_age], axis=1)
2.
s['avg_age'] = s[['Q222', 'Q223', 'Q224', 'Q225', 'Q226', 'Q227', 'Q228', 'Q229']].mean(axis=1)
3.
avg_age = ['Q222', 'Q223', 'Q224', 'Q225', 'Q226', 'Q227', 'Q228', 'Q229']
s.loc[:, 'avg_age'] = s[avg_age].mean(axis=1)
我不确定我最初对值进行编码的方式是否有问题,所以这是我的代码供参考:
#改变年龄变量输入
s['Q222'] = s['Q222'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q223'] = s['Q223'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q224'] = s['Q224'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q225'] = s['Q225'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q226'] = s['Q226'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q227'] = s['Q227'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q228'] = s['Q228'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q229'] = s['Q229'].replace(['18-24', '25-34','35-44', '45-54','55-64', '65-74', '75 or older', "Don't know"],
['2','3','4','5', '6', '7', '8', np.NaN])
s['Q222'] = s['Q222'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q223'] = s['Q223'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q224'] = s['Q224'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q225'] = s['Q225'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q226'] = s['Q226'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q227'] = s['Q227'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q228'] = s['Q228'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
s['Q229'] = s['Q229'].replace(['0-4', '05-11', '12-15', '16-17'], '1')
提前感谢任何能够提供帮助的人!
解决方案
skipna=True
可以使用 alist comprehension
来获得平均列,并mean()
使用:
df['ave_age'] = df[[col for col in df.columns if 'Q' in col]].mean(axis = 1,skipna = True)
推荐阅读
- python - 使用 OpenCV 对 pytesseract OCR 进行图像预处理
- python - 子集根据他们的标签
- python - 如何使用 Django 和 Pandas 将 excel 文件返回给用户
- asp.net - ASP.NET 根据 URL 改变布局
- pandas - 熊猫四舍五入
- angular - 角度 10 翻译字符串 ID 不匹配
- c++ - 带有扩展类的 c++ 读取访问冲突
- r - 在R表中排序和打印最大值和关联的行名
- android - 使用 Theme.MaterialComponents.Light.DarkActionBar 后缺少菜单栏
- python-3.x - 是否有将字符串拆分为单个字符的方法?(Python)