首页 > 解决方案 > How can you find when a value changes throughout every row in a data frame?

问题描述

I am attempting to label accounts as new, current, lost, or returning but am having trouble with the logic. The row index is the account and the columns are the years and the values are 1's and 0's representing if the account is active or not. This is what i came up with so far. I'm not sure if this will ever work or if i'm close and I'm not sure how the logic would look for returning customers. df2 is the original data frame and df3 = df2.shift(periods=1,axis=1)

def differences():
    if df2 != df3 & df2 == 1:
        return "New"
    elif df2 != df3 & df2 ==0:
        return "Lost"
    elif df2 == df3 & df2 ==0:
        return ""
    else:
        return "Continuing"
differences() 

`

And when I run this code i get the following error:

couldn't find matching opcode for 'and_bdl'

标签: python-3.xpandasjupyter-notebook

解决方案


以下代码逻辑可能适用于您的情况。

编辑:根据您的评论,我修改了代码,以便检查除最后​​一列之外的所有列。

import pandas as pd

str="""account  2019  2018  2017  2016  2015
alex 1 0 0 0 0
joe  0 0 1 0 0
boss  1 1 1 1 1
smith 1 1 0 1 0"""
df = pd.read_csv(pd.io.common.StringIO(str), sep='\s+', index_col='account')
df
#Out[46]: 
#         2019  2018  2017  2016  2015
#account                              
#alex        1     0     0     0     0
#joe         0     0     1     0     0
#boss        1     1     1     1     1
#smith       1     1     0     1     0

# find account status per-year
def account_status(x):
    status = []
    n = x.size
    for i in range(n-1):
        if x.iloc[i] == 1: 
            # if all rest are '0'
            if x.iloc[i+1:].eq(0).all():
                status.extend(['new'] + [None]*(n-i-2))
                break            
            # if the previous year is '0'
            elif x.iloc[i+1] == 0:
                status.append('returning')
            else:
                status.append('continuing')
        else:
            # at least one '1' in previous years
            if x.iloc[i+1:].eq(1).any():
                status.append('lost')
            else:
                status.extend([None] * (n-i-1))
                break
    return status    

s = df.apply(account_status, axis=1).apply(pd.Series)
s.columns = df.columns[:-1]
s                                                                                                                   
#Out[57]: 
#               2019        2018        2017        2016
#account                                                
#alex            new        None        None        None
#joe            lost        lost         new        None
#boss     continuing  continuing  continuing  continuing
#smith    continuing   returning        lost         new

推荐阅读