首页 > 解决方案 > Pandas read_csv only first comma


I have a csv database that looks like this:

2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string

I am trying to use pandas, because I believe it is one of the most widespread libraries to working with this kind of situations. Is there a way to create a DataFrame taking into account only the first comma using the read_csv function? (regardless of the fact that the string after has "" or '' or nothing to isolate it).

If not, what's the most efficient alternative to do so?

Thanks so much in advance for any help,

标签: pythonstringpandascsvmultiple-columns


You can cheat by passing a regex for the sep argument of read_csv. The regex I used is ^([^,]+), which grabs the first comma. I also used the engine argument in order to avoid a pandas warning (since the default C engine does not support a regex sep) and the usecols argument to make sure we only get the columns we want (without it we also get an "unnamed" column, I'm not sure why to be honest).

You can get more information about each argument in read_csv docs.


2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string


print(pd.read_csv('test.csv', sep='^([^,]+),', engine='python', usecols=['Date', 'String']))


         Date                         String
0  2010-12-31  'This, is, an example string'
1  2011-12-31   "This is an, example string"
2  2012-12-31     This is an example, string

This will not work if you will have more than 2 "actual" columns in the CSV file
