python - 将多个 yaml 文件读取到 pandas Dataframe
问题描述
我确实意识到这里已经解决了这个问题(例如,Reading csv zipped files in python,How can I parse a YAML file in Python,Retrieving data from a yaml file based on a Python list)。不过,我希望这个问题有所不同。
我知道将YAML
文件加载到熊猫dataframe
import yaml
import pandas as pd
with open(r'1000851.yaml') as file:
df = pd.io.json.json_normalize(yaml.load(file))
df.head()
我想将yaml
目录中的几个文件读入熊猫dataframe
并将它们连接到一个大数据帧中。虽然我一直无法弄清楚......
import pandas as pd
import glob
path = r'../input/cricsheet-a-retrosheet-for-cricket/all' # use your path
all_files = glob.glob(path + "/*.yaml")
li = []
for filename in all_files:
df = pd.json_normalize(yaml.load(filename, Loader=yaml.FullLoader))
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<timed exec> in <module>
/opt/conda/lib/python3.7/site-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
268
269 if record_path is None:
--> 270 if any([isinstance(x, dict) for x in y.values()] for y in data):
271 # naive normalization, this is idempotent for flat records
272 # and potentially will inflate the data considerably for
/opt/conda/lib/python3.7/site-packages/pandas/io/json/_normalize.py in <genexpr>(.0)
268
269 if record_path is None:
--> 270 if any([isinstance(x, dict) for x in y.values()] for y in data):
271 # naive normalization, this is idempotent for flat records
272 # and potentially will inflate the data considerably for
AttributeError: 'str' object has no attribute 'values'
有没有办法做到这一点并有效地读取文件?
解决方案
您的代码的第一部分和您添加的第二部分似乎不同。
第一部分正确读取 yaml 文件,但第二部分已损坏:
for filename in all_files:
# `filename` here is just a string containing the name of the file.
df = pd.json_normalize(yaml.load(filename, Loader=yaml.FullLoader))
li.append(df)
问题是您需要读取文件。目前你只是给出文件名而不是文件内容。改为这样做
li=[]
# Only loading 3 files:
for filename in all_files[:3]:
with open(filename,'r') as fh:
df = pd.json_normalize(yaml.safe_load(fh.read()))
li.append(df)
len(li)
3
pd.concat(li)
output:
innings meta.data_version meta.created meta.revision info.city info.competition ... info.player_of_match info.teams info.toss.decision info.toss.winner info.umpires info.venue
0 [{'1st innings': {'team': 'Glamorgan', 'delive... 0.9 2020-09-01 1 Bristol Vitality Blast ... [AG Salter] [Glamorgan, Gloucestershire] field Gloucestershire [JH Evans, ID Blackwell] County Ground
0 [{'1st innings': {'team': 'Pune Warriors', 'de... 0.9 2013-05-19 1 Pune IPL ... [LJ Wright] [Pune Warriors, Delhi Daredevils] bat Pune Warriors [NJ Llong, SJA Taufel] Subrata Roy Sahara Stadium
0 [{'1st innings': {'team': 'Botswana', 'deliver... 0.9 2020-08-29 1 Gaborone NaN ... [A Rangaswamy] [Botswana, St Helena] bat Botswana [R D'Mello, C Thorburn] Botswana Cricket Association Oval 1
[3 rows x 18 columns]
推荐阅读
- c++ - sql记录集函数的状态设计模式
- django - 如何构建 GraphQL 模式以进行从内容类型模型到常规模型的反向查找?
- javascript - Javascript vs VBScript vs VBA中的OLEDB / ADO查询速度
- c++ - 是否可以使用整数数组初始化双向量?
- css - 如何将 2 个按钮彼此相邻居中
- java - 我的任务是创建一个 Java 应用程序,该应用程序使用 2 个同步线程来计算每月利息并更新帐户余额
- c++ - msgpack:在不知道类型的情况下解压自定义类
- php - 在 SQL 中将字符串从 PHP 转换为 int
- c - 在C中通过http发送图像
- c++ - 下一个排列定义