python - Pandas IndexError:处理多级文件时列表索引超出范围
问题描述
我有一个加载 json 文件并构造 pandas DataFrame 的函数。大多数文件只有一个级别,但是,其中一些文件有更多、两个或三个级别的数据,如此处附加的这些示例文件中所示。我想在处理这些文件时处理它,但我总是被IndexError: list index out of range
. 我该如何解决?
def fetchFromJson(path):
json_files = [pos_json for pos_json in os.listdir(path) if pos_json.endswith('.json')]
data = pd.DataFrame(columns=['id', 'cod', 'tema'])
# we need both the json and an index number so use enumerate()
for index, js in enumerate(json_files):
with open(os.path.join(path, js)) as json_file:
json_text = json.load(json_file)
id = re.search(r'(\d+).json', js).group(1)
cod = json_text['dados'][0]['codTema']
tema = json_text['dados'][0]['tema']
# push a list of data into a pandas DataFrame at row given by 'index'
data.loc[index] = [id, cod, tema]
return data
一级文件:
{"dados": [{"codTema": 46, "tema": "Educacao", "relevancia": 0}], "links": [{"rel": "self", "href": "https://dadosabertos.camara.leg.br/api/v2/proposicoes/101424/temas"}]}
两级文件:
{"dados": [{"codTema": 64, "tema": "Agricultura, Pecuaria, Pesca e Extrativismo", "relevancia": 0}, {"codTema": 58, "tema": "Trabalho e Emprego", "relevancia": 0}], "links": [{"rel": "self", "href": "https://dadosabertos.camara.leg.br/api/v2/proposicoes/101425/temas"}]}