首页 > 解决方案 > 如何根据范围生成列值

问题描述

我想以它们在数据框中出现Start_time(s)End_time(s)方式生成一系列值,例如 (0.1, 2.5),以便我可以使用它在下面的第二个数据框中提取值范围(时间,以秒为单位):

   Words    Start_time(in sec)  End_time(in secs)   Time_per_words
0   let         0.1                 2.5                2.6
1   me          2.5                 2.6                5.1
2   tell        2.6                 2.9                5.5
3   you         2.9                 3.0                5.9
4   about       3.0                 3.2                6.2
5    4          10.7                11.0               21.7

而不是手动计算每个范围:

df = amp[amp['Time'].between(0.1, 2.5)]
df = df.sort_values('Amplitudes', ascending=False)[:5]
df.head()

此数据帧是 amp.head():

        Time    Amplitudes
1220673 5.36    0.000155
1220674 1.36    0.000936
1220675 0.18    0.001319
1220676 2.36    0.001513
1220677 0.45    0.001666
1220678 1.06    0.001476
1220679 0.17    0.000820
1220680 55.36   0.000409
1220681 55.36   0.000227
1220682 0.09    0.000847
1220683 0.46    0.001333
1220684 1.26    0.001595
1220685 0.30    0.001481
1220686 55.36   0.001312
1220687 55.36   0.002050

预期输出:

    Words    Start_time(in sec)  End_time(in secs)   Total_Time_words  Amplitude
0    let            0.1               2.5                 2.6            0.23
1    me             2.5               2.6                 5.1            0.12
2    tell           2.6               2.9                 5.5            0.09
3    you            2.9               3.0                 5.9            1.20
4    about          3.0               3.2                 6.2            0.67

标签: pythonpandasdataframerange

解决方案


用于cutstartend间隔分箱,然后聚合means 并添加到原始:

bins = np.insert(df['End_time(in secs)'].values, 0, df['Start_time(in sec)'].iat[0])
print (bins)
[ 0.1  2.5  2.6  2.9  3.   3.2 11. ]

b = pd.cut(amp['Time'], bins=bins, labels=df['End_time(in secs)'])
s = amp.groupby(b)['Amplitudes'].mean().rename(index=float)
df = df.join(s, on='End_time(in secs)')
print (df)
   Words  Start_time(in sec) End_time(in secs)  Time_per_words  Amplitudes
0    let                 0.1               2.5             2.6    0.001349
1     me                 2.5               2.6             5.1         NaN
2   tell                 2.6               2.9             5.5         NaN
3    you                 2.9                 3             5.9         NaN
4  about                 3.0               3.2             6.2         NaN
5      4                10.7                11            21.7    0.000155

如果没有像前 5 行这样的连续组:

d = {e:amp.loc[amp['Time'].between(s, e), 'Amplitudes'].mean() 
     for s, e in df[['Start_time(in sec)','End_time(in secs)']].to_numpy()}

df['Amplitudes'] = df['End_time(in secs)'].map(d)
print (df)
   Words  Start_time(in sec)  End_time(in secs)  Time_per_words  Amplitudes
0    let                 0.1                2.5             2.6    0.001349
1     me                 2.5                2.6             5.1         NaN
2   tell                 2.6                2.9             5.5         NaN
3    you                 2.9                3.0             5.9         NaN
4  about                 3.0                3.2             6.2         NaN
5      4                10.7               11.0            21.7         NaN

推荐阅读