首页 > 解决方案 > How to extract a value from a Pandas data frame from a reference in the frame, then "walk up" the frame to another specified value?

问题描述

I have the following toy data set:

import pandas as pd
from StringIO import StringIO

# read the data
df = pd.read_csv(StringIO("""
    Date         Return
    1/28/2009   -0.825148
    1/29/2009   -0.859997
    1/30/2009   0.000000
    2/2/2009    -0.909546
    2/3/2009    0.000000
    2/4/2009    -0.899110
    2/5/2009    -0.866104
    2/6/2009    0.000000
    2/9/2009    -0.830099
    2/10/2009   -0.885111
    2/11/2009   -0.878320
    2/12/2009   -0.881853
    2/13/2009   -0.884432
    2/17/2009   -0.947781
    2/18/2009   -0.966414
    2/19/2009   -1.016344
    2/20/2009   -1.029667
    2/23/2009   -1.087432
    2/24/2009   -1.050808
    2/25/2009   -1.089594
    2/26/2009   -1.121556
    2/27/2009   -1.105873
    3/2/2009    -1.205019
    3/3/2009    -1.191488
    3/4/2009    -1.059311
    3/5/2009    -1.135962
    3/6/2009    -1.147031
    3/9/2009    -1.117328
    3/10/2009   -1.009050"""), sep="\s+").reset_index()

My goals are to:

a) find the most negative value in the "Return" column

b) find the date the this value occurred

c) then "walk up" the "Return" column to find the first instance a specific value (in this case, 0.000000).

d) find the date associated with the value returned in step "c"

The results I'm looking for are:

a) -1.20519

b) March 2, 2009

c) 0.000000

d) February 6, 2009

I can find "a" with the following code:

max_dd = df['Maximum_Drawdown'].min()

To get "b", I tried to use the following code:

df.loc[df['Return'] == max_dd, 'Date']

But, the error message says:

KeyError: 'Date'

Note: I can get "b" to work in this toy example, but the actual data throws the error message. Here is actual code used to import the data from the csv file:

df = pd.read_csv(FILE_NAME, parse_dates=True).reset_index()

df.set_index('Date', inplace = True)  <<--- this is causing the problem

标签: pythonpandas

解决方案


Filter your dataframe for all rows less than the minimum value in Return and also Return equals zero, than show the last value.

df.loc[(df.index < df.Return.idxmin()) & (df['Return'] == 0), "Date"].tail(1)

推荐阅读