python - Splitting list of nested json to multiple columns
问题描述
This is sort of an extension on a previous question I asked, but different scope and approach.
I have a dataframe with a column populated by lists of dictionaries in each row
0 [{"date":"0 1 0" firstBoxerRating:[null null] ...
1 [{"date":"2 2 1" firstBoxerRating:[null null] ...
2 [{"date":"2013-10-05" firstBoxerRating:[null n...
This is short sample of some of the info In a given row:
[{"date":"2 2 1" firstBoxerRating:[null null] firstBoxerWeight:201.75 judges:[{"id":404749 name:"David Hudson" scorecard:[]} {"id":477070 name:"Mark Philips" scorecard:[]} {"id":404277 name:"Oren Shellenberger" scorecard:[]}] links:{"bio":1346666 bout:"558867/1346666" event:558867 other:[]} location:"Vanderbilt University Memorial Gymnasium Nashville" metadata:" time: 2:54\n | <span>referee:</span> <a href=\"/en/referee/403887\">Anthony Bryant</a><span> | </span><a href=\"/en/judge/404749\">David Hudson</a> | <a href=\"/en/judge/477070\">Mark Philips</a>
I would like to create a clean dataframe where the key in the dictionary becomes the column and the value, the row related to the particular column.
So here is an example of my desired output using the short sample as the input data:
date firstBoxerRating firstBoxerWeight judges id.......
2 2 1 [null null] 201.75 404749.....
I do not believe the question is a duplicate of this
Have tried every solution in this question, my data also contains lists of nested dictionaries, if anything resembling a json
For example, this solution:
pd.DataFrame.from_dict({(i,j): df[i][j]
for i in df.keys()
for j in df[i].keys()},
orient='index')
produces the exact same output I have
I have also tried unpacking the dicts in the column:
df[0].apply(pd.Series)
However, again this produces the same output
解决方案
使用 regex 和 str.extract 设法解决了这个问题。
我提取两个字符串之间的文本并将所述文本附加到其相关列
例子:
df[0].str.extract('date(?P<date>.*?)firstBoxerRating(?P<firstBoxerRating>.*?)firstBoxerWeight(?P<firstBoxerWeight>.*?)judges(?P<JudgeID>.*?)links(?P<Links>.*?)location(?P<location>.*?)metadata(?P<metadata>.*?)')
推荐阅读
- text - utf-8 编码、zlib 压缩的 UUID 列表的空间最佳格式
- ios - 如何强制 Crashlytics 上传报告并等待
- javascript - 当我将刷新率设置为 0.01 秒时,从数据库获取数据而不刷新页面会出现 500 内部服务器错误
- python - 在python中读取没有openCV的视频文件
- flutter - TextFormField 字符限制不起作用
- matlab - MATLAB 使用 vpa() 时 mod() 的奇怪行为
- python - 如何优化 Python 中嵌套到平面级字典的转换?
- java - 如何在iText7中将背景颜色应用于带有圆角的表格单元格?
- javascript - 如何从地图中的承诺中设置 JS 变量?
- linux - IntelliJ IDEA 在通过 VNC 运行时显示一个彩色矩形(JetBrains 隐私政策对话框)