首页 > 解决方案 > 在 Jupyter Books 中访问 Github 数据

问题描述

当我尝试访问 Jupyter Books 中的 csv 文件时出现标记化错误。看了一些回复,但似乎没有任何帮助。任何帮助,将不胜感激。谢谢。

url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df = pd.read_csv(url, header=0, sep=',', quotechar='"')
insert_df.head()

错误:

---------------------------------------------------------------------------

ParserError                               Traceback (most recent call last)

<ipython-input-21-21c294baaa45> in <module>()
      1 url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
----> 2 insert_df = pd.read_csv(url, header=0, sep=',', quotechar='"')
      3 insert_df.head()

3 frames

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in read(self, nrows)
   2155     def read(self, nrows=None):
   2156         try:
-> 2157             data = self._reader.read(nrows)
   2158         except StopIteration:
   2159             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 79, saw 2

标签: pythonpandascsvgithub

解决方案


两个选项:

第一:读为html

url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df = pd.read_html(url)
insert_df[0].head(2)

第二次读取为 raw,观察其中的 URL“raw”。

url="https://raw.githubusercontent.com/Kallikrates/bde_at2/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df_raw = pd.read_csv(url, header=0, sep=',', quotechar='"')
insert_df_raw.head(2)

输出: 在此处输入图像描述


推荐阅读