python - Pandas 用特定列的列表替换 NaN 值
问题描述
我有一个有两行的数据框
df = pd.DataFrame({'group' : ['c'] * 2,
'num_column': range(2),
'num_col_2': range(2),
'seq_col': [[1,2,3,4,5]] * 2,
'seq_col_2': [[1,2,3,4,5]] * 2,
'grp_count': [2]*2})
有 8 个空值,它看起来像这样:
df = df.append(pd.DataFrame({'group': group}, index=[0] * size))
group grp_count num_col_2 num_column seq_col seq_col_2
0 c 2.0 0.0 0.0 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
1 c 2.0 1.0 1.0 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
0 c NaN NaN NaN NaN NaN
我想要的是
用我自己的列表替换序列列(seq_col、seq_col_2、seq_col_3 等)中的NaN值。
注意:。
- 在此数据中,只有 2 个序列列,但可能更多。
- 无法替换列中已经存在的先前列表,只能替换NaN
假设我找不到用用户提供的字典中的列表值替换 NaN 的解决方案。
伪代码:
for each key, value in dict,
for each column in df
if column matches key in dict
# here matches means the 'seq_col_n' key of dict matched the df
# column named 'seq_col_n'
replace NaN with value in seq_col_n (which is a list of numbers)
我在下面尝试了此代码,它适用于您传递的第一列,然后适用于它不传递的第二列。这很奇怪。
df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])
上面的工作,但然后在 seq_col_2 上再试一次,它会给出奇怪的结果。
预期输出: 给定参数输入:
my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}
# after executing the code from pseudo code given, it should look like
group grp_count num_col_2 num_column seq_col seq_col_2
0 c 2.0 0.0 0.0 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
1 c 2.0 1.0 1.0 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
0 c NaN NaN NaN [1,2,3] [6,7,8]
解决方案
使用输入数组,您可以使用pd.DataFrame.loc
with pd.Series.isnull
:
import pandas as pd, numpy as np
df = pd.DataFrame({'group' : ['c'] * 2,
'num_column': range(2),
'num_col_2': range(2),
'seq_col': [[1,2,3,4,5]] * 2,
'seq_col_2': [[1,2,3,4,5]] * 2,
'grp_count': [2]*2})
df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))
L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])
df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2
print(df[['seq_col', 'seq_col_2']])
seq_col seq_col_2
0 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
1 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
0 0 10
0 1 11
0 2 12
0 3 13
0 4 14
0 5 15
0 6 16
0 7 17
如果您需要系列中的列表值,则可以在分配之前显式转换为系列:
df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))
推荐阅读
- kubernetes - 动态 URL 的 Kubernetes 入口
- django - 两个外键模型字段之间的区别
- ruby-on-rails - Doorkeeper gem 中的 previous_refresh_token 列如何工作?
- python - 在python中比较四个列表的一种更简单的方法
- python-3.x - 如何组合 2 个字符串列表并仅获得唯一值?
- azure - 使用 Play 框架时使用 Application Insights 启用 Web 请求日志记录
- python - 如何使用 python 从内联样式标签中删除特定的值对?
- python-sphinx - 狮身人面像超链接:打开反引号之前的非空格
- jetbrains-ide - 有没有办法从 Rider 中的 C# Interactive 访问当前的解决方案源?
- javascript - 如何在 Next js 上将数据从快递服务器发送到客户端?