首页 > 解决方案 > Offset pandas dates step by step

问题描述

I am trying to modify my pandas dataframe, but still I haven't gotten any results. I have a dataframe of this form:

fc.iloc[:,:2][:10]
Out[309]: 
             DATE  TOTAL_REVENUE_TTM
TIC                                 
AMZN.O 2018-01-01       3.769385e+11
AMZN.O 2018-01-01       3.775364e+11
AMZN.O 2018-01-01       3.772219e+11
AMZN.O 2018-01-01       3.766596e+11
AMZN.O 2018-01-01       3.757391e+11
AMZN.O 2018-02-01       3.931331e+11
AMZN.O 2018-02-01       3.935390e+11
AMZN.O 2018-02-01       3.932100e+11
AMZN.O 2018-02-01       3.926973e+11
AMZN.O 2018-02-01       3.917282e+11

I need to properly modify the column DATE, which does not reflect the real date. The hard thing is that I need to modify it following a sort of pattern. Here I printed only the 10 columns, but I have many observation, that follow the same rule.

To summarize, these are the outputs of size 5 of an LSTM network (I printed just two outputs) and the printed date is only the last date of the output. I need to recover all the other dates, knowing that they are all shifted back in time by 12 months. Practically speaking I should offset the date in 4th row by 1 year, the date in 3rd row by 2 year and so on. This should be applied for every output (say, every 5 row of the dataframe), and for every company in the dataframe.

The way the code works should be also generalized, if for example I would change the size of the output, so that I could have 3 dates repeating instead of 5, and also if I change the offsetting back in time.

I was trying to write a function that takes those 2 parameters and returns the entire dataframe with the dates properly shifted, but I am having a hard time find the way to reproduce the pattern correctly.

Here there is a piece of code for my function that should take every piece of dataframe referring to a company, elaborate it and then concatenate again to form the new dataframe:

def modify_forecasts_dates(fc,offset,output_size):


    #extract the company names
    companies = fc.index.get_level_values(0).unique()
    cmp_to_concat = []

    for tick in companies:

        cmp = fc.loc[tick]      
        cmp.insert(0, 'ROW_NUM', range(1,len(cmp)+1))
        cmp['BOOL'] = np.where((np.mod(cmp['ROW_NUM'],output_size)==0), 'OK', 'MODIFY')   



        #cmp.drop(labels=['ROW_NUM','BOOL'],inplace=True)
        cmp_to_concat.append(cmp)

    final_df = pd.concat(cmp_to_concat)

    return final_df

I firstly tried to insert a count row number, to identify which are the row that do not need to be changed, namely every last row of a piece of output. The problem still remains because then I don't know how to update all observation before the last output row.

Is there an easy way to take the dataframe by a step of 5 row (or equivalently by a step of output size) and properly offset all the preceding dates, obtaining t, t-offset, t-2*offset and so on?

I know it is a bit complicated to udnerstand what I need to do, but I am really struggling. I could do it by hand, but it's preferable to automatize the process.

标签: pythonpandasdatedatetime

解决方案


我假设您的数据将出现在一组num_year行中(在这种情况下num_year=5)。此外,我发现拥有非唯一索引是不安全的,这使得更新列更加困难。所以我选择重置索引:

# sample data
df = pd.DataFrame({'tic':['a']*10+['b']*20,
                   'date': sorted(list(pd.date_range('2018-01-01', '2018-01-06', freq='d')
                           )* 5 )}).set_index('tic')

# make sure companies are grouped together, 
# skip if they are already
df = df.reset_index().sort_values(['tic','date'])

num_year = 5

# offset pattern
delta = pd.Series([pd.DateOffset(years=i) 
                      for i in range(num_year-1,-1,-1)] * (len(df)//num_year))

# update date
df.date = df.date - delta

df.head(10)

输出:

+----+------+------------+
|    | tic  |    date    |
+----+------+------------+
| 0  | a    | 2014-01-01 |
| 1  | a    | 2015-01-01 |
| 2  | a    | 2016-01-01 |
| 3  | a    | 2017-01-01 |
| 4  | a    | 2018-01-01 |
| 5  | a    | 2014-01-02 |
| 6  | a    | 2015-01-02 |
| 7  | a    | 2016-01-02 |
| 8  | a    | 2017-01-02 |
| 9  | a    | 2018-01-02 |
+----+------+------------+

推荐阅读