首页 > 解决方案 > 使用 lambda 函数创建新的数据框字段

问题描述

我正在尝试根据其他列的条件创建一个新列。
(数据框已由用户聚合)
这是数据框的示例:

event_names                          country  
["deleteobject", "getobject"]         ["us"]
["getobject"]                         ["ca"]
["deleteobject", "putobject"]         ["ch"]

我想创建 3 个新列:
数据被删除了吗?
数据下载了吗?
这些事件是否来自我的白名单国家?
WHITELISTED_COUNTRIES = [“我们”,“新加坡”]

像这样:

event_names                      country  was_data_deleted?   was_data_downloaded?  whitelisted_country?
["deleteobject","getobject"]      ["us"]         True                 True                 True             
["getobject"]                     ["ca"]         False                True                 False
["deleteobject","putobject"]      ["ch"]         True                 False                False

这是我到目前为止所尝试的:

result_df['was_data_deleted'] = result_df['event_name'].apply(lambda x:True if any("delete" in x for i in x) else False)

result_df['was_data_downloaded'] = result_df['event_name'].apply(lambda x:True if "getObject" in i for i in x else False)

result_df['strange_countries'] = result_df['country'].apply(lambda x:False if any(x in WHITELISTED_COUNTRIES for x in result_df['country']) else False)

我得到一个错误"SyntaxError: invalid syntax"

有任何想法吗?谢谢!

标签: pythonpandasdataframelambdalist-comprehension

解决方案


df['was_data_deleted'] = df['event_names'].apply(lambda x: 'deleteobject' in x)
df['was_data_downloaded'] = df['event_names'].apply(lambda x: 'getobject' in x)
df['whitelisted_country'] = df['country'].apply(lambda x: x[0] in WHITELISTED_COUNTRIES)

print(df)

印刷:

                 event_names country  was_data_deleted  was_data_downloaded  whitelisted_country
0  [deleteobject, getobject]  [us]    True              True                 True               
1  [getobject]                [ca]    False             True                 False              
2  [deleteobject, putobject]  [ch]    True              False                False              

推荐阅读