python - 如何从类似json的文本中提取值
问题描述
我想从类似 json 的文本中提取值,如下所示:
df.head()
budget genres homepage id keywords original_language original_title overview popularity production_companies ... runtime spoken_languages status tagline title vote_average vote_count movie cast crew
0 237000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.avatarmovie.com/ 19995 [{"id": 1463, "name": "culture clash"}, {"id":... en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 [{"name": "Ingenious Film Partners", "id": 289... ... 162.0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800 Avatar [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de...
1 300000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... http://disney.go.com/disneypictures/pirates/ 285 [{"id": 270, "name": "ocean"}, {"id": 726, "na... en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 [{"name": "Walt Disney Pictures", "id": 2}, {"... ... 169.0 [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500 Pirates of the Caribbean: At World's End [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de...
2 245000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.sonypictures.com/movies/spectre/ 206647 [{"id": 470, "name": "spy"}, {"id": 818, "name... en Spectre A cryptic message from Bond’s past sends him o...
我试过了:
# Parse the stringified features into their corresponding python objects
from ast import literal_eval
features = ['cast', 'crew', 'keywords', 'genres', 'original_language']
for feature in features:
df[feature] = df[feature].apply(literal_eval)
...这引发了:
ValueError:错误的节点或字符串:<_ast.Name object at 0x7f5c5a523358>
帮助将不胜感激。
解决方案
我认为问题出在错误的值上,一种可能的解决方案是使用try-except
语句创建自定义函数:
df = pd.DataFrame({'genres':['[{"id": 28, "name": "Action"}]',
'[{"id": 28, "name": "Action"}, {"id": 12, "n]']})
print (df)
genres
0 [{"id": 28, "name": "Action"}]
1 [{"id": 28, "name": "Action"}, {"id": 12, "n]
from ast import literal_eval
def literal_eval_cust(x):
try:
return literal_eval(x)
except Exception:
return {}
features = ['genres']
for feature in features:
df[feature] = df[feature].apply(literal_eval_cust)
print (df)
genres
0 [{'id': 28, 'name': 'Action'}]
1 {}
推荐阅读
- python - 网页变更监控
- javascript - 如何处理服务迟到的数据?
- php - 我在运行我的项目时遇到了这个错误
- c# - 为什么 WebClient DownloadData 可以在 IIS Express 上运行,但不能在 IIS 上运行?
- ios - 删除行后剩余行加倍
- soap - 使用 wso2/soap 模块在芭蕾舞演员中发送基本授权
- javascript - 如何从字符串中删除所有空格和回车符?
- node.js - NodeJS,如何在 Promise 期间发送响应。那么?
- python - 循环输出到列表或 numpy 数组,也是 NoneTypeError
- duplicates - 使用 COUNT 和 MIN 删除查询。微软访问 2010