首页 > 解决方案 > 如何使用 pandas 读取 JSON 数据?

问题描述

我正在尝试使 CNN 适合 huffpost 新闻数据集https://www.kaggle.com/rmisra/news-category-dataset。我使用的数据集是 json 格式。我的数据格式是这样的

[
  {
    "category": "CRIME",
    "headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV",
    "authors": "Melissa Jeltsen",
    "link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89",
    "short_description": "She left her husband. He killed their children. Just another day in America.",
    "date": "2018-05-26"
  },
  {
    "category": "ENTERTAINMENT",
    "headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song",
    "authors": "Andy McDonald",
    "link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201",
    "short_description": "Of course it has a song.",
    "date": "2018-05-26"
  }
]

这是我正在尝试的代码,代码源是https://www.kaggle.com/kredy10/simple-lstm-for-text-classification

import pandas as pd
import json

df = pd.read_json('News_Category_Dataset_v2.json', lines=True)

但是我在数据读取代码行中遇到了这些错误

Traceback (most recent call last):   File "./Hpnews.py", line 37, in <module>
    df = pd.read_json('News_Category_Dataset_v2.json', lines=True)   File "C:\Users\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 214, in wrapper
    return func(*args, **kwargs)   File "C:\Users\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 608, in read_json
    result = json_reader.read()   File "C:\Users\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 729, in read
    obj = self._get_object_parser(self._combine_lines(data.split("\n")))   File "C:\Users\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 753, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()

标签: jsonpandas

解决方案


您必须使用“orient”参数:

pd.read_json(..., orient="records")

请参阅文档


推荐阅读