首页 > 解决方案 > 嵌入在 Pandas 中的列中的列表

问题描述

我的数据结构如下:

id                          text                                                created_at                  public_metrics
0   1372226235380666368     Senator Dianne Feinstein’s husband, Richard Bl...   2021-03-17T16:40:03.000Z    {'retweet_count': 13, 'reply_count': 11, 'like...               
1   1372224061225459713     Police said the suspect said he had a “sexual ...   2021-03-17T16:31:25.000Z    {'retweet_count': 20, 'reply_count': 20, 'like...               
2   1372223437381468166     Organizations that track hate groups and viole...   2021-03-17T16:28:56.000Z    {'retweet_count': 92, 'reply_count': 40, 'like...               
3   1372219606560075776     Breaking News: Biologists grew mice embryos ha...   2021-03-17T16:13:43.000Z    {'retweet_count': 74, 'reply_count': 49, 'like...               
4   1372217785082916873     RT @NickAtNews: Latest on Atlanta:\n• Gunman t...   2021-03-17T16:06:29.000Z    {'retweet_count': 14, 'reply_count': 0, 'like_...               
5   1372216261132845057     Today's Great Read:\n\nA grand estate in Engla...   2021-03-17T16:00:25.000Z    {'retweet_count': 12, 'reply_count': 10, 'like...               

我需要通过提取列表将 public_metrics 列分成 4 个单独的列。

我看过这个:Parsing list of dictionaries in Pandas column但无法解析此文件设置的代码。

标签: pythonjsonpandas

解决方案


设置:

>>> import pandas as pd
>>> df = pd.DataFrame(columns=['id', 'text', 'public_metrics'], data=[[0, 'foo', {'retweet_count': 13, 'reply_count': 11}], [1, 'bar', {'retweet_count': 20, 'reply_count': 20}]])                                                     
>>> df                                                                                                                                                                                                                                 
   id text                            public_metrics
0   0  foo  {'retweet_count': 13, 'reply_count': 11}
1   1  bar  {'retweet_count': 20, 'reply_count': 20}

解决方案:

>>> pd.concat([df.drop('public_metrics', axis=1), pd.DataFrame(df['public_metrics'].tolist())], axis=1)                                                                                                                         
   id text  retweet_count  reply_count
0   0  foo             13           11
1   1  bar             20           20

推荐阅读