首页 > 解决方案 > 使用 Python 查找在给定日期范围内具有最小值的列名列表

问题描述

给定如下数据集和从2013-05-01到的日期范围2013-05-15

import pandas as pd
import numpy as np

np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])

出去:

            A_values  B_values  C_values  D_values  E_values    target
2013-02-26  6.059783  7.333694  1.389472  3.126731  9.972433  1.281624
2013-02-27  1.789931  7.529254  6.621605  7.843101  0.968944  0.585713
2013-02-28  9.623960  6.165574  0.866300  5.612724  6.165247  9.638430
2013-03-01  5.743043  3.711608  4.521452  2.018502  5.693051  1.950960
2013-03-02  5.837040  4.763135  5.178144  8.230986  7.322250  0.690563
             ...       ...       ...       ...       ...       ...
2013-05-22  8.795736  6.316478  0.427136  3.864723  6.803052  5.554824
2013-05-23  7.959282  1.839659  2.225667  5.140268  2.756443  8.440801
2013-05-24  5.412016  9.081583  7.212742  5.890717  0.430184  0.968086
2013-05-25  1.088414  9.016004  5.384490  1.610210  5.461017  0.672555
2013-05-26  4.930571  2.338785  9.823048  6.893207  6.312104  5.273122

首先我过滤列df.filter(regex='_values$'),然后我希望返回一个列名列表,其最小值落在给定的日期范围内(2013-05-01, 2013-05-15),即。如果 columnA_values的最小值在此范围内的任何一天,A_values则将包含在返回的结果列表中。

我如何在 Pandas 或 Numpy 中实现这一点?谢谢。

标签: python-3.xpandasnumpy

解决方案


用于DataFrame.idxmin每列的最小日期时间,然后过滤索引Series.between_time

s = df.filter(regex='_values$').idxmin()

out = s[s.between('2013-05-01','2013-05-15')].index.tolist()
print (out)
['D_values']

编辑:

df1 = df.filter(regex='_values$')
s1 = df1.idxmin()
s2 = df1.idxmax()

#removed tolist
out1 = s1[s1.between('2013-03-16','2013-03-31')].index
out2 = s2[s2.between('2013-05-01','2013-05-15')].index

out = out1.intersection(out2).tolist()

推荐阅读