python - 用几个嵌套的 json 扩展数据框
问题描述
我有一个从网络抓取中获得的 DataFrame,如下所示:
data = [{'StrategicResearchPriorities': {'data': [{'strategicAreaId': 0,
'strategicAreaValue': 'Population',
'strategicGoalId': 1,
'strategicGoalValue': 'Social'}]},
'ScienceAndResearchPriorities': {'data': [{'scienceAndResearchPriorityId': 'Health',
'scienceAndResearchPriorityValue': 'Health',
'practicalResearchChallengeId': 'XXX.',
'practicalResearchChallengeValue': 'YYY'}]},
'IndustrialTransformationPriorities': None,
'FieldsOfResearch': '{"data":[{"guid":1557,"value":"200102 - Communication Technology and Digital Media Studies","code":200102,"percentage":"45"},{"guid":1499,"value":"180119 - Law and Society","code":180119,"percentage":"30"},{"guid":1381,"value":"160104 - Social and Cultural Anthropology","code":160104,"percentage":"15"},{"guid":1444,"value":"160808 - Sociology and Social Studies of Science and Technology","code":160808,"percentage":"10"}]}',
'Title': 'X and Y',
'AdminOrganisationStateName': 'A',
'AdminOrganisation': 'B',
'ProjectCode': '0000001',
'ChiefInvestigators': [{'FamilyName': 'Surname1',
'FirstName': 'Name1',
'SecondName': None,
'Title': 'Mr',
'PersonOrdinal': 1},
{'FamilyName': 'Surname2',
'FirstName': 'Name2',
'SecondName': 'SecondName2',
'Title': 'Ms',
'PersonOrdinal': 3},
],
'OrganisationParticipantSummary': '{"data":[{"id":11111,"guid":"af4","name":"Institute","number":1,"roleName":"Administering Organisation","roleId":1,"inKind":true},{"id":22222,"guid":"af6","name":"University","number":2,"roleName":"Other","roleId":3,"inKind":true}]}',
'Summary': 'Some text',
'AnnouncedDate': '1900-06-10T14:46:54.57',
'AllocatedNumbersCalendarYears': [1,
2,
1,
5,],
'UnnamedAwardSummary': {}},
{'StrategicResearchPriorities': {'data': [{'strategicAreaId': 4,
'strategicAreaValue': 'Productivity',
'strategicGoalId': 11,
'strategicGoalValue': 'Economy'}]},
'ScienceAndResearchPriorities': {'data': [{'scienceAndResearchPriorityId': 'Manufacturing',
'scienceAndResearchPriorityValue': 'Manufacturing',
'practicalResearchChallengeId': 'Technologies.',
'practicalResearchChallengeValue': 'Modern technologies.'}]},
'IndustrialTransformationPriorities': None,
'FieldsOfResearch': '{"data":[{"guid":222,"value":"010101 - Subject1","code":"020202","percentage":"50"},{"guid":555,"value":"020201 - Subject10","code":"020201","percentage":"50"}]}',
'Title': 'A and B and C',
'AdminOrganisationStateName': 'Org',
'AdminOrganisation': 'Institute',
'ProjectCode': 'XX100000',
'ChiefInvestigators': [{'FamilyName': 'Surname3',
'FirstName': 'Name3',
'SecondName': None,
'Title': 'Dr',
'PersonOrdinal': 1},
{'FamilyName': 'Surname4',
'FirstName': 'Name4',
'SecondName': 'SecondName4',
'Title': 'Prof',
'PersonOrdinal': 15}],
'OrganisationParticipantSummary': '{"data":[{"id":10002,"guid":"ab3","name":"University","number":1,"roleName":"Owner","roleId":1,"inKind":true},{"id":50000,"guid":"2a7","name":"University2","number":2,"roleName":"Other","roleId":3,"inKind":true}]}',
'Summary': 'Some text 2.',
'AnnouncedDate': '1800-06-12T15:26:55.003',
'AllocatedNumbersCalendarYears': [5,
1,
3,
2,
9,
20,
10],
'UnnamedAwardSummary': {}},
]
我想将所有不同的单元解包到一个大数据框中。我努力了
json_normalize(data)
但单元格像字符串一样被读取。问题是诸如'StrategicResearchPriorities'之类的字段在('data')中有另一个列表并且无法访问它。
PS:对不起,很长的数组,但我认为最好展示所有这些。它实际上已经被修剪了很多。
解决方案
看看pandas.DataFrame().from_records()
还是.from_dict()
太。
推荐阅读
- generics - 为什么有状态的小部件在flutter中定义为两个类?
- pdf - 使用 CMYK 和透明度将 PDF 转换为 Tiff
- php - 队列的替代方案:在 Laravel 中工作
- javascript - 从引号之间的字符串中选择 >|<
- c# - c# 将文件写入S3,然后删除本地文件
- java - 如何提高 Android Studio 3.0 的构建时间?
- asp.net - 使用 DataSet 从 DataRow 中提取数据
- php - 一些电子邮件未发送 codeigniter
- python - 使用自定义参数在 Python 中多裁剪图像
- nginx - Flask Upstream 过早关闭连接