首页 > 解决方案 > 使用熊猫数据框列值并粘贴到下一行

问题描述

我对 Python 中的 Panda 数据框非常陌生。我正在编写 csv 文件结构如下所示的代码:

Id, Title, Body, Tags, Date
1, First question, My first question, robot Python, 2015
2, Second question, My second question, C++ Python, 2015
3, Third question, My third question, Selenium, 2016
4, Fourth question, My fourth question, Java C++, 2016

我已使用 Panda 库将此 CSV 导出到我的 python 代码

我正在尝试获取如下数据框:

Id, Title, Body, Tags, Date
1, First question, My first question, robot, 2015
2, First question, My first question, Python, 2015
3, Second question, My second question, C++, 2015
4, Second question, My second question, Python, 2015
.......

请让我知道是否有任何合适的方法来实现这一目标

标签: pythonpandasdataframe

解决方案


你可以这样做:

df = df.drop(["Id"], axis=1)
df2 = pd.DataFrame(columns=df.columns)
for index, row in df.iterrows():
    aux = row
    for tag in row["Tags"].split():
        aux["Tags"] = tag
        df2 = df2.append(aux)
df2.reset_index(drop=True)

其中 df 是您的数据框,而 df2 是更新后的数据框。您遍历数据帧 df 的每一行并将“标签”值拆分为尽可能多的标签(在您的示例中,最大数量为 2,但我想您可以拥有更多标签)。然后,将带有每个单独标记的行附加到新的数据框 df2。(我删除了 id 并重置了索引,因为它保留了原始索引值)

    Title,  Body,   Tags,   Date,
0   First question, My first question,  robot,  2015
1   First question, My first question,  Python, 2015
2   Second question,    My second question, C++,    2015
3   Second question,    My second question, Python, 2015
4   Third question, My third question,  Selenium,   2016
5   Fourth question,    My fourth question, Java,   2016
6   Fourth question,    My fourth question, C++,    2016

推荐阅读