首页 > 解决方案 > 用几个嵌套的 json 扩展数据框

问题描述

我有一个从网络抓取中获得的 DataFrame,如下所示:

data = [{'StrategicResearchPriorities': {'data': [{'strategicAreaId': 0,
     'strategicAreaValue': 'Population',
     'strategicGoalId': 1,
     'strategicGoalValue': 'Social'}]},
  'ScienceAndResearchPriorities': {'data': [{'scienceAndResearchPriorityId': 'Health',
     'scienceAndResearchPriorityValue': 'Health',
     'practicalResearchChallengeId': 'XXX.',
     'practicalResearchChallengeValue': 'YYY'}]},
  'IndustrialTransformationPriorities': None,
  'FieldsOfResearch': '{"data":[{"guid":1557,"value":"200102 - Communication Technology and Digital Media Studies","code":200102,"percentage":"45"},{"guid":1499,"value":"180119 - Law and Society","code":180119,"percentage":"30"},{"guid":1381,"value":"160104 - Social and Cultural Anthropology","code":160104,"percentage":"15"},{"guid":1444,"value":"160808 - Sociology and Social Studies of Science and Technology","code":160808,"percentage":"10"}]}',
  'Title': 'X and Y',
  'AdminOrganisationStateName': 'A',
  'AdminOrganisation': 'B',
  'ProjectCode': '0000001',
  'ChiefInvestigators': [{'FamilyName': 'Surname1',
    'FirstName': 'Name1',
    'SecondName': None,
    'Title': 'Mr',
    'PersonOrdinal': 1},
   {'FamilyName': 'Surname2',
    'FirstName': 'Name2',
    'SecondName': 'SecondName2',
    'Title': 'Ms',
    'PersonOrdinal': 3},
   ],
  'OrganisationParticipantSummary': '{"data":[{"id":11111,"guid":"af4","name":"Institute","number":1,"roleName":"Administering Organisation","roleId":1,"inKind":true},{"id":22222,"guid":"af6","name":"University","number":2,"roleName":"Other","roleId":3,"inKind":true}]}',
  'Summary': 'Some text',
  'AnnouncedDate': '1900-06-10T14:46:54.57',
  'AllocatedNumbersCalendarYears': [1,
   2,
   1,
   5,],
  'UnnamedAwardSummary': {}},
 {'StrategicResearchPriorities': {'data': [{'strategicAreaId': 4,
     'strategicAreaValue': 'Productivity',
     'strategicGoalId': 11,
     'strategicGoalValue': 'Economy'}]},
  'ScienceAndResearchPriorities': {'data': [{'scienceAndResearchPriorityId': 'Manufacturing',
     'scienceAndResearchPriorityValue': 'Manufacturing',
     'practicalResearchChallengeId': 'Technologies.',
     'practicalResearchChallengeValue': 'Modern technologies.'}]},
  'IndustrialTransformationPriorities': None,
  'FieldsOfResearch': '{"data":[{"guid":222,"value":"010101 - Subject1","code":"020202","percentage":"50"},{"guid":555,"value":"020201 - Subject10","code":"020201","percentage":"50"}]}',
  'Title': 'A and B and C',
  'AdminOrganisationStateName': 'Org',
  'AdminOrganisation': 'Institute',
  'ProjectCode': 'XX100000',
  'ChiefInvestigators': [{'FamilyName': 'Surname3',
    'FirstName': 'Name3',
    'SecondName': None,
    'Title': 'Dr',
    'PersonOrdinal': 1},
   {'FamilyName': 'Surname4',
    'FirstName': 'Name4',
    'SecondName': 'SecondName4',
    'Title': 'Prof',
    'PersonOrdinal': 15}],
  'OrganisationParticipantSummary': '{"data":[{"id":10002,"guid":"ab3","name":"University","number":1,"roleName":"Owner","roleId":1,"inKind":true},{"id":50000,"guid":"2a7","name":"University2","number":2,"roleName":"Other","roleId":3,"inKind":true}]}',
  'Summary': 'Some text 2.',
  'AnnouncedDate': '1800-06-12T15:26:55.003',
  'AllocatedNumbersCalendarYears': [5,
   1,
   3,
   2,
   9,
   20,
   10],
  'UnnamedAwardSummary': {}},
 ]

我想将所有不同的单元解包到一个大数据框中。我努力了

json_normalize(data)

但单元格像字符串一样被读取。问题是诸如'StrategicResearchPriorities'之类的字段在('data')中有另一个列表并且无法访问它。

PS:对不起,很长的数组,但我认为最好展示所有这些。它实际上已经被修剪了很多。

标签: pythonjsonpandas

解决方案



推荐阅读