python - 尝试用 np.nan 替换无效值时出现 ValueError
问题描述
我正在尝试从以空格分隔的数据集创建数据框。第 3 列中的某些值缺失,它们被标记为Missing_x
。我试图用它替换这些值,np.nan
但它给我一个 ValueError。
from datetime import datetime
import pandas as pd
import numpy as np
data = ["1/3/2012 16:00:00 Missing_1",
"1/4/2012 16:00:00 27.47",
"1/5/2012 16:00:00 27.728",
"1/6/2012 16:00:00 28.19",
"1/9/2012 16:00:00 28.1",
"1/10/2012 16:00:00 28.15",
"12/13/2012 16:00:00 27.52",
"12/14/2012 16:00:00 Missing_19",
"12/17/2012 16:00:00 27.215",
"12/18/2012 16:00:00 27.63",
"12/19/2012 16:00:00 27.73",
"12/20/2012 16:00:00 Missing_20",
"12/21/2012 16:00:00 27.49",
"12/24/2012 13:00:00 27.25",
"12/26/2012 16:00:00 27.2",
"12/27/2012 16:00:00 27.09",
"12/28/2012 16:00:00 26.9",
"12/31/2012 16:00:00 26.77"]
date_list = []
mrc_list = []
for i in data:
data = i.split('\t')
days_of_data = datetime.strptime(data[0], '%m/%d/%Y %H:%M:%S')
date_list.append(days_of_data)
try:
mrc_list.append(float(data[1]))
except:
mrc_list.append(np.nan)
pass
mrc_df = pd.Series(mrc_list, index=date_list)
mrc_df.index.name = 'Date'
print(mrc_df)
这是错误:
Traceback (most recent call last):
File "/home/onur/Documents/code-signal/mercury.py", line 37, in <module>
days = datetime.strptime(data_list[0], '%m/%d/%Y %H:%M:%S')
File "/home/onur/anaconda3/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/home/onur/anaconda3/lib/python3.7/_strptime.py", line 362, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: Missing_1
我理解错误。我只是不明白为什么我的解决方法不起作用。
解决方案
您会收到错误消息,因为如果您 print out data[0]
,它不会按照您期望的方式进行拆分。事实上,它根本没有分裂。以下是修复代码的方法:
data = ["1/3/2012 16:00:00 Missing_1",
"1/4/2012 16:00:00 27.47",
"1/5/2012 16:00:00 27.728",
"1/6/2012 16:00:00 28.19",
"1/9/2012 16:00:00 28.1",
"1/10/2012 16:00:00 28.15",
"12/13/2012 16:00:00 27.52",
"12/14/2012 16:00:00 Missing_19",
"12/17/2012 16:00:00 27.215",
"12/18/2012 16:00:00 27.63",
"12/19/2012 16:00:00 27.73",
"12/20/2012 16:00:00 Missing_20",
"12/21/2012 16:00:00 27.49",
"12/24/2012 13:00:00 27.25",
"12/26/2012 16:00:00 27.2",
"12/27/2012 16:00:00 27.09",
"12/28/2012 16:00:00 26.9",
"12/31/2012 16:00:00 26.77"]
# Standardize the formatting...
data = [i.split() for i in data]
data = ["{} {} {}".format(i[0].strip(), i[1].strip(), i[2].strip()) for i in data]
date_list = []
mrc_list = []
for i in data:
# split on four spaces instead of tab (special character)
data = i.split(' ')
days_of_data = datetime.strptime(data[0], '%m/%d/%Y %H:%M:%S')
date_list.append(days_of_data)
try:
mrc_list.append(float(data[1]))
except:
mrc_list.append(np.nan)
pass
mrc_df = pd.Series(mrc_list, index=date_list)
mrc_df.index.name = 'Date'
print(mrc_df)
实现此目的的更紧凑的方法是:
from datetime import datetime
import pandas as pd
import numpy as np
from io import StringIO
data = ["1/3/2012 16:00:00 Missing_1",
"1/4/2012 16:00:00 27.47",
"1/5/2012 16:00:00 27.728",
"1/6/2012 16:00:00 28.19",
"1/9/2012 16:00:00 28.1",
"1/10/2012 16:00:00 28.15",
"12/13/2012 16:00:00 27.52",
"12/14/2012 16:00:00 Missing_19",
"12/17/2012 16:00:00 27.215",
"12/18/2012 16:00:00 27.63",
"12/19/2012 16:00:00 27.73",
"12/20/2012 16:00:00 Missing_20",
"12/21/2012 16:00:00 27.49",
"12/24/2012 13:00:00 27.25",
"12/26/2012 16:00:00 27.2",
"12/27/2012 16:00:00 27.09",
"12/28/2012 16:00:00 26.9",
"12/31/2012 16:00:00 26.77"]
data = [i.split() for i in data]
data = ["{} {} {}".format(i[0].strip(), i[1].strip(), i[2].strip()) for i in data]
data = ["Date Val"] + data
mrc_df = pd.read_csv(StringIO("\n".join(data)), sep="\s\s+", engine='python')
mrc_df['Val'] = pd.to_numeric(mrc_df['Val'], errors='coerce')
mrc_df['Date'] = pd.to_datetime(mrc_df['Date'])
mrc_df.set_index('Date', inplace=True)
mrc_df.index.name = 'Date'
推荐阅读
- java - 如何乘以用户输入的数组?
- amazon-web-services - AWS AppSync 解析器 Lambda 函数与速度模板语言 (VTL)
- linux - BASH/LINUX - 循环中的迭代求和问题(十进制数)
- c++ - 在 CMAKE 中使用具有静态依赖项的库
- visual-studio-code - 我用macbookpro在vscode上安装vim ext,但是vim不能工作,{hjkl}不能工作
- javascript - 无法将对象推送到数组中
- sql - 按日期名称转换列中的行
- php - 如何在 laravel 中检查用户角色并显示选择选项
- r - 使用 R 融合基于两行的数据集
- eclipse - 使用 FilteredTypesSelectionDialog 实现特定接口的过滤器类