首页 > 解决方案 > CSV pandas 阅读器分隔符不起作用

问题描述

我有一个用逗号分隔的 csv 文件。

df = pd.read_csv('data/data_notebook-1_crime.csv', sep= ',')
print(df.head)

不幸的是,如果我打印结果,所有值都在第一列,如图所示

数据框标头

CSV 文件:https ://data.montgomerycountymd.gov/api/views/icn6-v9z3/rows.csv?accessType=DOWNLOAD

标签: pandascsv

解决方案


你必须在df.head()这里运行df.head

输出df.head

<bound method NDFrame.head of         Incident ID Offence Code  CR Number    Dispatch Date / Time  ...   Latitude  Longitude Police District Number             Location
0         201087097         5707   16033232                     NaN  ...  39.078911 -77.080827                     4D  (39.0789, -77.0808)
1         201215730         5311  180058531  11/22/2018 04:58:01 AM  ...  38.973022 -76.996799                     8D   (38.973, -76.9968)
2         201229073         3562  190009928  03/03/2019 04:59:49 AM  ...  38.956840 -77.111362                     2D  (38.9568, -77.1114)
3         201233523         1114  190015440  04/03/2019 11:53:15 AM  ...  39.020392 -77.012776                     3D  (39.0204, -77.0128)
4         201087102         3562   16033238                     NaN  ...  38.991701 -77.024096                     3D  (38.9917, -77.0241)

[225681 rows x 30 columns]

输出df.head()

   Incident ID Offence Code  CR Number    Dispatch Date / Time  ...   Latitude  Longitude Police District Number             Location
0    201087097         5707   16033232                     NaN  ...  39.078911 -77.080827                     4D  (39.0789, -77.0808)
1    201215730         5311  180058531  11/22/2018 04:58:01 AM  ...  38.973022 -76.996799                     8D   (38.973, -76.9968)
2    201229073         3562  190009928  03/03/2019 04:59:49 AM  ...  38.956840 -77.111362                     2D  (38.9568, -77.1114)
3    201233523         1114  190015440  04/03/2019 11:53:15 AM  ...  39.020392 -77.012776                     3D  (39.0204, -77.0128)
4    201087102         3562   16033238                     NaN  ...  38.991701 -77.024096                     3D  (38.9917, -77.0241)

[5 rows x 30 columns]

要检查所有列是否设置正确,我们可以使用以下info函数:

import pandas as pd

df = pd.read_csv("./Crime.csv", sep=",")

print(df.info())

我们可以看到所有列都按预期设置:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225681 entries, 0 to 225680
Data columns (total 30 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   Incident ID             225681 non-null  int64  
 1   Offence Code            225681 non-null  object 
 2   CR Number               225681 non-null  int64  
 3   Dispatch Date / Time    157045 non-null  object 
 4   NIBRS Code              225681 non-null  object 
 5   Victims                 225681 non-null  int64  
 6   Crime Name1             225540 non-null  object 
 7   Crime Name2             225540 non-null  object 
 8   Crime Name3             225540 non-null  object 
 9   Police District Name    225681 non-null  object 
 10  Block Address           205179 non-null  object 
 11  City                    224624 non-null  object 
 12  State                   225681 non-null  object 
 13  Zip Code                222494 non-null  float64
 14  Agency                  225681 non-null  object 
 15  Place                   225681 non-null  object 
 16  Sector                  225622 non-null  object 
 17  Beat                    225622 non-null  object 
 18  PRA                     225640 non-null  object 
 19  Address Number          205253 non-null  float64
 20  Street Prefix           9949 non-null    object 
 21  Street Name             225681 non-null  object 
 22  Street Suffix           4243 non-null    object 
 23  Street Type             225367 non-null  object 
 24  Start_Date_Time         225681 non-null  object 
 25  End_Date_Time           109034 non-null  object 
 26  Latitude                225681 non-null  float64
 27  Longitude               225681 non-null  float64
 28  Police District Number  225681 non-null  object 
 29  Location                225681 non-null  object 
dtypes: float64(4), int64(3), object(23)
memory usage: 51.7+ MB

因此,您似乎正在使用一些不同的数据或在做一些与您的问题不同的事情。


推荐阅读