首页 > 解决方案 > 仅在找到第一个数字条目时返回 True,在所有其他情况下返回 False

问题描述

dataframe我正在处理的内容如下表所示:

 COLUMN-A      COLUMN-B           COLUMN-C            COLUMN-D
2005-12-23  2.78229429977895    2.59054751268432    
2005-12-28  2.77990953370726    2.59625529291923    
2005-12-29  2.77770141742004    2.60175855794512    
2005-12-30  2.77565686568447    2.60706465870293    
2006-01-03  2.78676377607689    2.61845788272621    
2006-01-04  2.79415905904631    2.62804815466004    
2006-01-05  2.79233986786484    2.63311058575101    
2006-01-06  2.79065543181717    2.63799172343874    
2006-01-09  2.7876513234596 2.64200075091549    
2006-01-10  2.78342529650764    2.64516894228885    
2006-01-11  2.77951230901599    2.64822370776439    
2006-01-12  2.77877806345801    2.65256358425937    
2006-01-13  2.78965376857357    2.66232574953289    
2006-01-16  2.81417572440332    2.67871384606613    
2006-01-17  2.83688123723998    2.69451541833616    
2006-01-18  2.84923078073203    2.70556518000894    
2006-01-19  2.854887762274  2.71343113557577    
2006-01-20  2.86012570781281    2.72101563266667    
2006-01-23  2.8620867671879 2.72693465617535    
2006-01-24  2.85668033821582    2.72915676427006    
2006-01-25  2.85311883059988    2.7319963852241 
2006-01-27  2.84982113851717    2.73473442527192    
2006-01-30  2.84098994077245    2.73458665290639    
2006-01-31  2.83281290615161    2.73444416615124    
2006-02-01  2.82235268854652    2.73291291585375    
2006-02-02  2.79821544736977    2.72446373657389    2.31735945722146
2006-02-03  2.7903180053127 2.72328924609567    2.32165937425023
2006-02-06  2.78300555917914    2.72215675685381    2.32590335299919
2006-02-07  2.77912366526979    2.72245848891773    2.33053900014161
2006-02-08  2.77552931914827    2.72274943166327    2.33511466419111

我正在尝试编写逻辑以在COLUMN-D 第一个数字条目的位置返回 True,在所有其他情况下返回 False

这是我写的引发错误的逻辑 -ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

代码

import pandas as pd

    def has_trail_started(df, df_key):
        return (~pd.isnull(df[df_key])) & (pd.isnull(df[df_key].shift()))
    
    if (has_trail_started(data, 'COLUMN-D') and data['has_changed_status']):
       // Logic

请问我可以得到一些帮助来纠正这个问题吗?

标签: pythonpandasdataframe

解决方案


if您的函数返回系列,出于陈述的目的,不能将其解释为 bool 。但是您可以将“trail start info”添加到 df,如下所示:

def has_trail_started(df, df_key):
    df["has_trail_started"] = (~pd.isnull(df[df_key])) & (pd.isnull(df[df_key].shift()))

has_trail_started(data, 'COLUMN-D')

然后 df 看起来像这样:

      COLUMN-A  COLUMN-B  COLUMN-C  COLUMN-D  has_trail_started
0   2005-12-23  2.782294  2.590548       NaN              False
1   2005-12-28  2.779910  2.596255       NaN              False
2   2005-12-29  2.777701  2.601759       NaN              False
3   2005-12-30  2.775657  2.607065       NaN              False
4   2006-01-03  2.786764  2.618458       NaN              False
5   2006-01-04  2.794159  2.628048       NaN              False
6   2006-01-05  2.792340  2.633111       NaN              False
7   2006-01-06  2.790655  2.637992       NaN              False
8   2006-01-09  2.787651  2.642001       NaN              False
9   2006-01-10  2.783425  2.645169       NaN              False
10  2006-01-11  2.779512  2.648224       NaN              False
11  2006-01-12  2.778778  2.652564       NaN              False
12  2006-01-13  2.789654  2.662326       NaN              False
13  2006-01-16  2.814176  2.678714       NaN              False
14  2006-01-17  2.836881  2.694515       NaN              False
15  2006-01-18  2.849231  2.705565       NaN              False
16  2006-01-19  2.854888  2.713431       NaN              False
17  2006-01-20  2.860126  2.721016       NaN              False
18  2006-01-23  2.862087  2.726935       NaN              False
19  2006-01-24  2.856680  2.729157       NaN              False
20  2006-01-25  2.853119  2.731996       NaN              False
21  2006-01-27  2.849821  2.734734       NaN              False
22  2006-01-30  2.840990  2.734587       NaN              False
23  2006-01-31  2.832813  2.734444       NaN              False
24  2006-02-01  2.822353  2.732913       NaN              False
25  2006-02-02  2.798215  2.724464  2.317359               True
26  2006-02-03  2.790318  2.723289  2.321659              False
27  2006-02-06  2.783006  2.722157  2.325903              False
28  2006-02-07  2.779124  2.722458  2.330539              False
29  2006-02-08  2.775529  2.722749  2.335115              False

现在你可以基于这个新的 bool 应用一些逻辑,如下所示:

data["extra_logic"] = data["has_trail_started"].apply(lambda x: "yay" if x else "boo")

这将添加一个新列,其值是has_trail_started标志的函数。


推荐阅读