首页 > 解决方案 > 显示每个月 DataFrame 的前 5 个最大值

问题描述

我正在尝试处理具有很多列(505)的数据框,并且我只想选择每个月的前 5 个值。您将在下面找到我的 DataFrame 图像的链接。

链接照片

这是示例:

  Dates         1        2       3           4       5     6
2002-07-31  -31.710916  NaN  -5.208684  -29.773404  NaN -7.308558   
2002-08-31  -44.941351  NaN   3.665286  -23.987135  NaN 3.134669    
2002-09-30  -36.725548  NaN   4.114474  -19.536571  NaN -0.986986   
2002-10-31  -25.377286  NaN  -0.486158  -5.887594   NaN -0.787117   
2002-11-30  19.766328   NaN  -5.298877  -10.672174  NaN -21.057946  
2002-12-31  1.996514    NaN  -7.570497  -9.257122   NaN -19.630112  
2003-01-31  -0.366083   NaN -14.124492  -5.434475   NaN -8.053424   
2003-02-28  -17.869297  NaN -20.075997  1.009837    NaN -11.616974  

我该怎么做?我已经尝试过 df.max(axis=1) 但我想在最大值之后添加 4 个其他值。谢谢你的帮助

标签: pythonpandasdataframetime-series

解决方案


我假设您希望每行最多 5 列,因为这是我解释您的问题的方式。以下选择示例输入中的最大 2 行,因为它只有 4 个非 nan 列。

import io
import re
import pandas as pd


# First read in the data you supplied. 
data=io.StringIO(re.sub(" +","\t",
"""Dates         1        2       3           4       5     6
2002-07-31  -31.710916  NaN  -5.208684  -29.773404  NaN -7.308558
2002-08-31  -44.941351  NaN   3.665286  -23.987135  NaN 3.134669
2002-09-30  -36.725548  NaN   4.114474  -19.536571  NaN -0.986986
2002-10-31  -25.377286  NaN  -0.486158  -5.887594   NaN -0.787117
2002-11-30  19.766328   NaN  -5.298877  -10.672174  NaN -21.057946
2002-12-31  1.996514    NaN  -7.570497  -9.257122   NaN -19.630112
2003-01-31  -0.366083   NaN -14.124492  -5.434475   NaN -8.053424
2003-02-28  -17.869297  NaN -20.075997  1.009837    NaN -11.616974"""))
df = pd.read_csv(data,sep="\t")

# Then we preprocess the data, so it is in a long format instead of a wide
df = df.melt(id_vars='Dates',var_name='Column_name',value_name='Value')

# Finally extract the top 2 values for each date, but first set the index so the output knows what column the input came from
print(df.set_index('Column_name').groupby('Dates')['Value'].apply(lambda grp: grp.nlargest(2)))

输出是

Dates       Column_name
2002-07-31  3              -5.208684
            6              -7.308558
2002-08-31  3               3.665286
            6               3.134669
2002-09-30  3               4.114474
            6              -0.986986
2002-10-31  3              -0.486158
            6              -0.787117
2002-11-30  1              19.766328
            3              -5.298877
2002-12-31  1               1.996514
            3              -7.570497
2003-01-31  1              -0.366083
            4              -5.434475
2003-02-28  4               1.009837
            6             -11.616974
Name: Value, dtype: float64

很难给出更合适的答案,除非你更明确地知道你想要什么输出。


推荐阅读