首页 > 解决方案 > 提取列中的嵌套元素并存储到新列中

问题描述

我有一些数据要扩展到新列中。数据如下:

    id  d
0   403 {'cases': 1, 'suspects': 22, 'negative': 0, 's', ...}
1   402 {'cases': 0, 'suspects': 18, 'negative': 0, 's', ...}
2   401 {'cases': 0, 'suspects': 31, 'negative': 0, 's', ...}

我试图让嵌套列d分散到新列中。我可以通过以下方式获取一些数据d

rows = []
for i, row in myDF.iterrows():
    for stat in row['d']['stats']:
        new_row = {
            **row.to_dict(),
            **stat,
        }
        rows.append(new_row)

但是我无法得到这一切。如何提取对象,以便我有一个新列,其中对应cases的观察值?

预期输出(列名不必准确):

cases   suspects   negative   diag_casesELISA_sex_F   diag_suspects_sex_M   diag_suspects_sex_F   diag_suspectsPCR_sex_F diag_suspectsPCR_sex_M
  1         22         0                1                      11                     10                   1                           NA
  0         18         0                NA                     9                       9                   NA                          NA
  0         31         0                NA                     12                     18                   NA                          1   

数据:

myDF = pd.DataFrame.from_dict({'id': {0: '403', 1: '402', 2: '401'}, 'd': {0: {'cases': 1, 'suspects': 22, 'negative': 0, 'stats': [{'diag': 'casesELISA', 'sex': 'F', 'cases': 1}, {'diag': 'suspects', 'sex': 'M', 'cases': 11}, {'diag': 'suspects', 'sex': 'F', 'cases': 10}, {'diag': 'suspectsPCR', 'sex': 'F', 'cases': 1}]}, 1: {'cases': 0, 'suspects': 18, 'negative': 0, 'stats': [{'diag': 'suspects', 'sex': 'M', 'cases': 9}, {'diag': 'suspects', 'sex': 'F', 'cases': 9}]}, 2: {'cases': 0, 'suspects': 31, 'negative': 0, 'stats': [{'diag': 'suspects', 'sex': 'M', 'cases': 12}, {'diag': 'suspects', 'sex': 'F', 'cases': 18}, {'diag': 'suspectsPCR', 'sex': 'M', 'cases': 1}]}}})

标签: pythonpandasdictionary

解决方案


您可以在此处编写自定义函数并使用pd.Series.apply.

def transform_dict(d):
    new = {}
    for k, v in d.items():
        if isinstance(v, list):
            for _dict in v:
                key = "_".join(
                    [key + "_" + val for key, val in _dict.items() if key != "cases"]
                )
                new[key] = _dict["cases"]
        else:
            new[k] = v
    return pd.Series(new)


out = df["d"].apply(transform_dict)

#out
   cases  suspects  negative  ...  diag_suspects_sex_F  diag_suspectsPCR_sex_F  diag_suspectsPCR_sex_M
0    1.0      22.0       0.0  ...                 10.0                     1.0                     NaN
1    0.0      18.0       0.0  ...                  9.0                     NaN                     NaN
2    0.0      31.0       0.0  ...                 18.0                     NaN                     1.0
#out.columns
Index(
    [
        "cases",
        "suspects",
        "negative",
        "diag_casesELISA_sex_F",
        "diag_suspects_sex_M",
        "diag_suspects_sex_F",
        "diag_suspectsPCR_sex_F",
        "diag_suspectsPCR_sex_M",
    ],
    dtype="object",
)
# out.values
array([[ 1., 22.,  0.,  1., 11., 10.,  1., nan],
       [ 0., 18.,  0., nan,  9.,  9., nan, nan],
       [ 0., 31.,  0., nan, 12., 18., nan,  1.]])

解释:

transform_dict(df['d'][0])

cases                      1
suspects                  22
negative                   0
diag_casesELISA_sex_F      1
diag_suspects_sex_M       11
diag_suspects_sex_F       10
diag_suspectsPCR_sex_F     1
dtype: int64

我们正在将column中的每个dictd转换为Series.


推荐阅读