首页 > 解决方案 > 如何将一个常规格式的字符串列拆分为 Pandas 中的多个列

问题描述

如果我有这样的数据框:

id  s             scene
1   a   kitchen: 0.297, living: 0.515, degree: A
2   b   kitchen: 0.401, study: 0.005, degree: A
3   c   study: 0.913, degree: B
4   d   living: 0.515, degree: B
5   e   others: 0.1, degree: C

如何使用 Pandas 获得如下的新数据框。

到目前为止,我已经尝试过df[['id', 's', 'kitchen', 'living', 'study', 'others', 'degree']] = df['scene'].str.split(',', expand=True)

id   s   kitchen living  study   others    degree
1    a    0.297   0.515   0       0           A 
2    b    0.401   0       0.005   0           A
3    c    0       0       0.913   0           B
4    d    0       0.515   0       0           B
5    e    0       0       0       0.1         C

标签: pythonpandas

解决方案


你可以

In [763]: dff = pd.DataFrame(
              dict(y.split(': ') for y in x.split(', ')) for x in df.scene).fillna(0)

In [764]: dff
Out[764]:
  degree kitchen living others  study
0      A   0.297  0.515      0      0
1      A   0.401      0      0  0.005
2      B       0      0      0  0.913
3      B       0  0.515      0      0
4      C       0      0    0.1      0

然后join

In [766]: df.join(dff)
Out[766]:
   id  s                                     scene degree kitchen living  \
0   1  a  kitchen: 0.297, living: 0.515, degree: A      A   0.297  0.515
1   2  b   kitchen: 0.401, study: 0.005, degree: A      A   0.401      0
2   3  c                   study: 0.913, degree: B      B       0      0
3   4  d                  living: 0.515, degree: B      B       0  0.515
4   5  e                    others: 0.1, degree: C      C       0      0

  others  study
0      0      0
1      0  0.005
2      0  0.913
3      0      0
4    0.1      0

推荐阅读