pandas - 使用 pandas 输出不正确的数据
问题描述
我有一个 csv 文件,如下所示:
start_date,end_date,pollster,sponsor,sample_size,population,party,subject,tracking,text,approve,disapprove,url
2020-02-02,2020-02-04,YouGov,Economist,1500,a,all,Trump,FALSE,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,42,29,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
2020-02-02,2020-02-04,YouGov,Economist,376,a,R,Trump,FALSE,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,75,6,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
2020-02-02,2020-02-04,YouGov,Economist,523,a,D,Trump,TRUE,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,21,51,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
2020-02-02,2020-02-04,YouGov,Economist,599,a,I,Trump,,Do you approve or disapprove of Donald Trump’s handling of the coronavirus outbreak?,39,25,https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf
2020-02-07,2020-02-09,Morning Consult,"",2200,a,all,Trump,TURE,Do you approve or disapprove of the job each of the following is doing in handling the spread of coronavirus in the United States? President Donald Trump,57,22,https://morningconsult.com/wp-content/uploads/2020/02/200214_crosstabs_CORONAVIRUS_Adults_v4_JB.pdf
我对具有值“TURE”、“FALSE”或 NAN 的“跟踪”列感兴趣
出于某种原因,当我用熊猫阅读它时,所有“跟踪”列值都加载为“假”:
data = pd.read_csv("covid_approval_polls.csv")
data.head()
start_date end_date pollster sponsor sample_size population party subject tracking text approve disapprove url
0 2020-02-02 2020-02-04 YouGov Economist 1500.0 a all Trump False Do you approve or disapprove of Donald Trump’s... 42.0 29.0 https://d25d2506sfb94s.cloudfront.net/cumulus_...
1 2020-02-02 2020-02-04 YouGov Economist 376.0 a R Trump False Do you approve or disapprove of Donald Trump’s... 75.0 6.0 https://d25d2506sfb94s.cloudfront.net/cumulus_...
2 2020-02-02 2020-02-04 YouGov Economist 523.0 a D Trump False Do you approve or disapprove of Donald Trump’s... 21.0 51.0 https://d25d2506sfb94s.cloudfront.net/cumulus_...
3 2020-02-02 2020-02-04 YouGov Economist 599.0 a I Trump False Do you approve or disapprove of Donald Trump’s... 39.0 25.0 https://d25d2506sfb94s.cloudfront.net/cumulus_...
4 2020-02-07 2020-02-09 Morning Consult NaN 2200.0 a all Trump False Do you approve or disapprove of the job each o... 57.0 22.0 https://morningconsult.com/wp-content/uploads/.
..
当我使用以下命令搜索该列的唯一值时:
data.tracking.unique()
我得到正确的输出:
array([False, True, nan], dtype=object)
但是当我执行命令时:
print(data[data["tracking"] == "FALSE"])
我得到:
Empty DataFrame
Columns: [start_date, end_date, pollster, sponsor, sample_size, population, party, subject, tracking, text, approve, disapprove, url]
Index: []
我很确定我在这里遗漏了一些东西,但不知道可能导致问题的原因是什么?我想根据列“跟踪”值“FALSE”获取行
解决方案
要强制类型,请使用dtype
参数:
data = pd.read_csv("covid_approval_polls.csv", dtype={"tracking": str})
推荐阅读
- r - 用 Rvest 刮名字
- java - 为什么 setExact() 警报管理器方法忽略了我的间隔时间?
- windows-10 - VS 2019 在使用 Windows 10 版本 1803 时无法显示 UWP 设计器(SDK 可以工作吗?)
- python - 如何诊断 Google App Engine Flask 内存泄漏
- amazon-cloudformation - 使用带有“Ref”的 Fn::Join
- html - 如何更改活动部分的文本颜色
- android - 调用 getItemCount 时 RecyclerView 未调用 onCreateViewHolder 方法
- javascript - 如何在 Cypress 中使用不同的夹具运行相同的测试?
- bash - 如何在curl中使用变量值作为文件名
- python - Python 用线切割提取文本