python - 如何提取/拆分数据框中的列表列以分隔唯一列?
问题描述
我有一个包含几列的数据框,如下所示:
Age G GS
INDEX1 [27, 25, 22, 30, 30] [76, 79, 80, 76, 77] [76, 79, 80, 76, 77]
INDEX2 [24, 23, 21, 32, 34] [77, 76, 81, 75, 77] [77, 76, 81, 75, 77]
如何将所有列表拆分为各自单独的列?理想情况下,一旦我完成,我的数据框将如下所示:
Age Age1 Age2 Age3 Age4 G G1 G2 G3 G4
INDEX1 27 25 22 30 30 76 79 80 76 77 ...
...
如果它有帮助,我确实将字典转换为这个数据框。我已经尝试在堆栈上搜索和实现几种不同的类似解决方案,但它们似乎都不起作用。此解决方案可以正确转换,但由于某种原因会创建两个 NaN 列。如果有人知道如何在整个数据帧上执行此操作,我可以删除额外的 NaN 列:
df1 = pd.DataFrame(converted['Age'].values.tolist())
df1
0 1 2 3 4 5 6
0 27 25 22 30 30.0 NaN NaN
1 31 29 33 27 33.0 NaN NaN
2 22 21 26 21 33.0 NaN NaN
3 29 24 31 33 27.0 NaN NaN
4 30 21 31 31 32.0 NaN NaN
... ... ... ... ... ... ... ...
1727 28 27 28 20 26.0 NaN NaN
1728 20 29 27 24 20.0 NaN NaN
1729 30 31 34 25 26.0 NaN NaN
1730 31 26 34 21 21.0 NaN NaN
1731 22 24 20 28 25.0 NaN NaN
我尝试过其他一些解决方案,但 Age 列出现错误,它可能与隐藏值有关,但我不确定。
df2 = pd.DataFrame()
for col in converted.columns:
# names of new columns
feature_columns = [ "{col}_feature1".format(col=col), "{col}_feature2".format(col=col), "{col}_feature3".format(col=col)
, "{col}_feature4".format(col=col)
, "{col}_feature5".format(col=col)]
# split current column
df2[ feature_columns ] = df[ col ].apply(lambda s: pd.Series({ feature_columns[0]: s[0],
feature_columns[1]: s[1],
feature_columns[2]: s[2],
feature_columns[3]: s[3],
feature_columns[4]: s[4]} ) )
print (df2)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Age'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-180-53ed0043f9d8> in <module>
7 , "{col}_feature5".format(col=col)]
8 # split current column
----> 9 df2[ feature_columns ] = df[ col ].apply(lambda s: pd.Series({ feature_columns[0]: s[0],
10 feature_columns[1]: s[1],
11 feature_columns[2]: s[2],
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
377 except ValueError:
378 raise KeyError(key)
--> 379 return super().get_loc(key, method=method, tolerance=tolerance)
380
381 @Appender(_index_shared_docs["get_indexer"])
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Age'
编辑:我尝试使用此处列出的解决方案:Pandas split column of lists into multiple columns
它对我不起作用。感谢您的建议!
解决方案
利用:
new_df = pd.concat([pd.DataFrame(col.tolist(), index = df.index).add_prefix(i)
for i, col in df.items()], axis = 1)
print(new_df)
Age0 Age1 Age2 Age3 Age4 G0 G1 G2 G3 G4 GS0 GS1 GS2 GS3 \
INDEX1 27 25 22 30 30 76 79 80 76 77 76 79 80 76
INDEX2 24 23 21 32 34 77 76 81 75 77 77 76 81 75
GS4
INDEX1 77
INDEX2 77
也许最好只设置一次索引
new_df = pd.concat([pd.DataFrame(col.tolist()).add_prefix(i)
for i, col in df.items()], axis = 1)
new_df.index = df.index
推荐阅读
- xampp - XAMPP 表在引擎中不存在
- java - 将 JOOQ 查询与 JDBC 事务混合
- angular - 找不到管道“异步”!在 Spartacus CMS 组件的延迟加载
- python - 如何在 Python 中收听 hookbin webhook?
- python - 使用 Python 中的 Numpy 库获取两个数组 A 和 B 的公共元素的索引
- asp.net - 如何在asp.net中修改具有外键的数据库?
- python - 得到??在将 Netezza 与 Python 连接时作为 unicode 字符的输出
- c++ - 在 C++ 中将浮点数转换为 int 的最快方法
- kubernetes - 如何访问 keycloak.local 入口主机?
- python - 如何更改当前视图的权限,覆盖 django rest-framework 中的 DEFAULT_PERMISSION_CLASSES