首页 > 解决方案 > Comparing values on x-axis & referring to values on row before in Dataframes

问题描述

# I have a dataframe that Looks like this:
df = pandas.DataFrame({"R1": [8,2,3], "R2": [-21,-24,4], "R3": [-9,46,6],"R4": [16,-14,-1],"R5": [-3,36,76]})

Input

I want to compare every value within one row against each other, to then apply a function (if value 1 in row x is bigger than value 2 in row x). I am trying to apply something like this:

If value1 in row1 > value2 in row 1:
    return based_on_previous_value(value1) # trying to put results in a new dataframe
Else:
    return previous_row(value1) # trying to put results in a new dataframe

def based_on_previous_value(x):
        x in row_before + 1

def previous_row(x):
            x in row_before

--> this Code doesn't work (just trying to Show what I am trying to do in Code)

# results put in a new dataframe
df_new = pandas.DataFrame({"R1": [8,10,11], "R2": [-21,-21,-19], "R3": [-9,-5,-2],"R4": [16,17,17],"R5": [-3,0,4]})

Output

--> "R1" in 2nd row: 2 > -24, 2 > -14 --> value("R1" in first row) + 2 = 10 --> "R2" in 2nd row: -21 < all the other 4 values --> value("R2" in first row) + 0 = -21 --> "R3" in 2nd row: 46 > all the other 4 values --> value("R3" in first row) + 4 = -5

标签: pythonpandasdataframe

解决方案


Here's some code that solves your problem. I have included both the expected output and the produced one with a comparison so assert equality. The code creates a middleman dataframe with the changes needed for each row using a helper function (skipping the first row!), then applies it to the initial one row by row.

import pandas as pd

df = pd.DataFrame({"R1": [8,2,3], "R2": [-21,-24,4], "R3": [-9,46,6],"R4": [16,-14,-1],"R5": [-3,36,76]})
expected_df = pd.DataFrame({"R1": [8,10,11], "R2": [-21,-21,-19], "R3": [-9,-5,-2],"R4": [16,17,17],"R5": [-3,0,4]})

def reevaluate(series):
    return series.apply(lambda x: sum(series<x))

df_changes = df.iloc[1:,:].apply(reevaluate, axis=1)
df_changes.reset_index(drop=True, inplace=True)

produced_df = df.copy()
for row in df_changes.index:
    produced_df.iloc[row+1, :] = produced_df.iloc[row, :] + df_changes.iloc[row, :]

print(expected_df.equals(produced_df))

True

推荐阅读