首页 > 解决方案 > 无法理解熊猫行为的 Lambda

问题描述

我有一个数据框:

df2.head(5)
Out[78]: 
    User        Date                   movie
0  User1  2019-07-02  [Bridge to Terabithia]
1  User1  2019-07-04              [Defiance]
2  User1  2019-07-05                 [Click]
3  User1  2019-07-07              [Big Stan]
4  User1  2019-07-14    [Death at a Funeral]

电影列的元素是列表数据类型,现在我正在尝试运行 lambda 函数,如下所示:

df2['movie'] = df2['movie'].apply(lambda x : x[0])

df2.head(5)
Out[79]: 
    User        Date               movie
0  User1  2019-07-02 Bridge to Terabithia
1  User1  2019-07-04                 NaN
2  User1  2019-07-05                 NaN
3  User1  2019-07-07                 NaN
4  User1  2019-07-14                 NaN

虽然所需的输出是

    User        Date               movie
0  User1  2019-07-02              Bridge to Terabithia
1  User1  2019-07-04              Defiance
2  User1  2019-07-05              Click
3  User1  2019-07-07              Big Stan
4  User1  2019-07-14              Death at a Funeral

无法理解为什么它给我这样的输出?

标签: pythonpandasdataframelambda

解决方案


下次请包含一个完全可重现的示例(包括创建数据框的代码),这将为所有审阅者节省时间。

您的代码对我来说运行良好:

import pandas as pd

# data
df2 = pd.DataFrame({'User': ['User1'] * 5,
                    'Date': ['2019-07-02',
                             '2019-07-04',
                             '2019-07-05',
                             '2019-07-07',
                             '2019-07-14'],
                    'movie': [
                        ['Bridge to Terabithia'],
                        ['Defiance'],
                        ['Click'],
                        ['Big Stan'],
                        ['Death at a Funeral']
                    ]})

print(df2.head(5))
print()

df2['movie'] = df2['movie'].apply(lambda x : x[0])
print(df2.head(5))

产生:

         Date   User                   movie
0  2019-07-02  User1  [Bridge to Terabithia]
1  2019-07-04  User1              [Defiance]
2  2019-07-05  User1                 [Click]
3  2019-07-07  User1              [Big Stan]
4  2019-07-14  User1    [Death at a Funeral]

         Date   User                 movie
0  2019-07-02  User1  Bridge to Terabithia
1  2019-07-04  User1              Defiance
2  2019-07-05  User1                 Click
3  2019-07-07  User1              Big Stan
4  2019-07-14  User1    Death at a Funeral

现在,当我个人想.apply使用 lambda 函数进行调试时,我通常会先使用普通函数,我可以在其中放置断点并检查发生了什么。然后当它正确时,我用 lambda 函数替换它。所以这就是我在你的情况下会做的:

def extract_first(x):
    # here you can put breakpoints, print stuff, etc.
    return x[0]

df2['movie'] = df2['movie'].apply(extract_first)

推荐阅读