首页 > 解决方案 > 在python中使用步长计算滑动窗口

问题描述

我使用熊猫有这些数据:

SNP = pd.read_csv("C:/Users/sia/Desktop/SNP.txt",delimiter=r"\s+",header=0)
ID Chr Position p
M1  1   4762    0.40
M2  1   77143   0.62
M3  1   130756  0.22
M4  1   227358  0.50
M5  1   265131  0.60
M6  1   568128  0.64
M7  2   2000    0.32
M8  2   18000   0.36
M9  2   60300   0.64
M10 2   71118   0.50
M11 2   71595   0.28
M12 2   200000  0.10

在python中,如何根据新数据帧中每个Chr的位置列中的滑动窗口(100000)和步长(50000)获得p值的总和,如下所示:

   Chr  start   end     sum.p.slide
    1   0       100000  1.02
    1   50000   150000  0.84
    1   100000  200000  0.22
    1   150000  250000  0.50
    1   200000  300000  1.10
    1   250000  350000  0.60
    1   300000  400000  Na
    1   350000  450000  Na
    1   400000  500000  Na
    1   450000  550000  Na
    1   500000  600000  0.64
    2   0       100000  2.1
    2   50000   150000  Na
    2   100000  200000  0.1

标签: pythonpython-3.xpandaspython-2.7numpy

解决方案


我敢肯定有更好的方法来做到这一点,但你去吧。

df['range1'] = pd.cut(df.Position, [x for x in range(0, df.Position.max()+100000,100000)])
df['range2'] = pd.cut(df.Position, [x for x in range(50000, df.Position.max()+50000,100000)])

a = df[['range1','Chr','p']].groupby(['Chr','range1']).agg({'p':sum})
b = df[['range2','Chr','p']].groupby(['Chr','range2']).agg({'p':sum})


out = pd.concat([a,b], axis=1).fillna(np.nan).sum(axis=1).replace(0.0, np.nan).reset_index()

out['start'] = out.level_1.apply(lambda x:x.left)
out['end'] = out.level_1.apply(lambda x:x.right)

out.drop(columns=['level_1'], inplace=True)

out.columns = ['Chr','sum.p.silde','start','end']

out[['Chr','start','end','sum.p.silde']]

输出

    Chr start   end     sum.p.silde
0   1   0       100000  1.02
1   1   50000   150000  0.84
2   1   100000  200000  0.22
3   1   150000  250000  0.50
4   1   200000  300000  1.10
5   1   250000  350000  0.60
6   1   300000  400000  NaN
7   1   350000  450000  NaN
8   1   400000  500000  NaN
9   1   450000  550000  NaN
10  1   500000  600000  0.64
11  2   0       100000  2.10
12  2   50000   150000  1.42
13  2   100000  200000  0.10
14  2   150000  250000  0.10
15  2   200000  300000  NaN
16  2   250000  350000  NaN
17  2   300000  400000  NaN
18  2   350000  450000  NaN
19  2   400000  500000  NaN
20  2   450000  550000  NaN
21  2   500000  600000  NaN

推荐阅读