python - Pandas read_csv only first comma
问题描述
I have a csv database that looks like this:
Date,String
2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string
I am trying to use pandas, because I believe it is one of the most widespread libraries to working with this kind of situations. Is there a way to create a DataFrame taking into account only the first comma using the read_csv
function? (regardless of the fact that the string after has "" or '' or nothing to isolate it).
If not, what's the most efficient alternative to do so?
Thanks so much in advance for any help,
解决方案
You can cheat by passing a regex for the sep
argument of read_csv
. The regex I used is ^([^,]+),
which grabs the first comma. I also used the engine
argument in order to avoid a pandas warning (since the default C engine does not support a regex sep) and the usecols
argument to make sure we only get the columns we want (without it we also get an "unnamed" column, I'm not sure why to be honest).
You can get more information about each argument in read_csv
docs.
test.csv
Date,String
2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string
Then
print(pd.read_csv('test.csv', sep='^([^,]+),', engine='python', usecols=['Date', 'String']))
Outputs
Date String
0 2010-12-31 'This, is, an example string'
1 2011-12-31 "This is an, example string"
2 2012-12-31 This is an example, string
This will not work if you will have more than 2 "actual" columns in the CSV file
推荐阅读
- r - 您如何从世界银行 API 将多个指标下载到 R 中的单独列中?
- reactjs - Webpack dist 文件夹无法运行项目
- linux - 使用 docker - su 的身份验证失败
- java - Cplex If-then 二进制约束不更新
- react-native - 如何正确设置导航参数
- tensorflow - 我可以编写一个记录并返回总训练时间的 keras 回调吗?
- excel - 使用 IF 和 Range 代码时输入不匹配错误
- ansible - 如何在ansible中创建一个通用的剧本范围的可写变量?
- sql - 为什么 SQL 子查询中的外部引用会产生不同的结果?
- javascript - 在已经制作的 jquery 代码中的问题之间放置一些 2 秒的休息时间