首页 > 解决方案 > 使用 pandas 加载时更改 csv 文件中的值。(试图理解博客上的代码)

问题描述

我想了解下面的代码。

首先,这些代码取自我目前阅读的关于 google BERT 的博客。

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04

数据集可以在博客或上面的链接中下载。

import pandas as pd
train_df = pd.read_csv('data/train.csv', header=None) 
test_df = pd.read_csv("data/test.csv", header=None)
train_df[0] = (train_df[0] == 2).astype(int) #This is the part that I do not understand. I thought this code "(train_df[0] == 2)" will find all the values with "2" but since they did not specify what it should be converted to then how can everything changed from 2 --> 0?
train_df.head()

当前结果:

    0   1
0   1   Unfortunately, the frustration of being Dr. Go...
1   0   Been going to Dr. Goldberg for over 10 years. ...
2   1   I don't know what Dr. Goldberg was like before...
3   1   I'm writing this review to give you a heads up...
4   0   All the food is great here. But the best thing...

我只是想了解代码的用法以及为什么它会成功,所以我没有任何预期的结果。

标签: python

解决方案


>>>t_df[0]
0    1
1    2
2    1
3    1
4    2
Name: 0, dtype: int64
>>>t_df[0]==2
0    False
1     True
2    False
3    False
4     True
Name: 0, dtype: bool
>>>(t_df[0]==2).astype(int)
0    0
1    1
2    0
3    0
4    1
Name: 0, dtype: int64

该代码将与 2 ( ==2 )进行比较并将 bool(False, True) 值转换为 int (0,1) 值 ( .astype(int) )


推荐阅读