首页 > 解决方案 > Subtracting two columns named in certain pattern

问题描述

I have a data frame with a large number of columns. In this question, however, I brought a mini-version of the df.

df = pd.DataFrame({'WT_IGL_x':[1,2,3,2,1,1,3,4,1,2], 'LA_WHN_x':[1,0,1,0,1,1,0,0,1,0], 'LA_WHN_y':[2,1,2,3,3,4,1,1,2,1], 'WT_IGL_y':[2,1,2,3,3,4,1,1,2,1], 'GT_OPP_IGL_x':[1,2,3,2,1,1,3,4,1,2], 'GT_OPP_IGL_y':[1,2,3,2,1,1,3,4,1,2]})

I want to subtract two columns that has same prefix and the middle part. In this case the 1st/4th columns and 2nd/3rd columns and add a column with the difference and name them by adding a suffix '_diff'

so my desired output will be:

df = pd.DataFrame({'WT_IGL_x':[1,2,3,2,1,1,3,4,1,2], 'LA_WHN_x':[1,0,1,0,1,1,0,0,1,0], 'LA_WHN_y':[2,1,2,3,3,4,1,1,2,1], 'WT_IGL_y':[2,1,2,3,3,4,1,1,2,1], 'WT_IGL_diff': [-1, 1, 1, -1, -2, -3, 2, 3, -1, 1], 'LA_WHN_diff': [-1, -1, -1, -3, -2, -3, -1, -1, -1, -1], GT_OPP_IGL_diff: [0,0,0,0,0,0,0,0,0,0]})

subtracting two columns is easy but

  1. matching two columns with the patterns
  2. adding columns with the differences
  3. name the column automatically with suffix '_diff'

are hard.

Thank you for your help.

标签: pandasdataframe

解决方案


Idea is create MultiIndex by split of last _ by str.rsplit, then select by second level with DataFrame.xs and subtract, e.g. by DataFrame.sub, then add DataFrame.add_suffix and last use DataFrame.join:

df1 = df.copy()
df1.columns = df1.columns.str.rsplit('_', n=1, expand=True)
df1 = df1.xs('x', axis=1, level=1).sub(df1.xs('y', axis=1, level=1)).add_suffix('_diff')
df1 = df.join(df1)
print (df1)
   WT_IGL_x  LA_WHN_x  LA_WHN_y  WT_IGL_y  LA_WHN_diff  WT_IGL_diff
0         1         1         2         2           -1           -1
1         2         0         1         1           -1            1
2         3         1         2         2           -1            1
3         2         0         3         3           -3           -1
4         1         1         3         3           -2           -2
5         1         1         4         4           -3           -3
6         3         0         1         1           -1            2
7         4         0         1         1           -1            3
8         1         1         2         2           -1           -1
9         2         0         1         1           -1            1

推荐阅读