首页 > 解决方案 > How to use dplyr:mutate to mulitply pairs of columns specified by parts of the variable name

问题描述

I have the following example:

df <- data.frame(
id = c(1,2,3),
  fix_01.2012 = c(2,5,7),
  fix_02.2012 = c(5,1,7),
  fix_03.2012 = c(6,1,5),
  fox_01.2012 = c(0.4, 0.5, 0.7),
  fox_02.2012 = c(0.6, 0.5, 0.8),
  fox_03.2012 = c(0.7, 0.5, 0.9)
  )

  id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012
1  1           2           5           6         0.4         0.6         0.7
2  2           5           1           1         0.5         0.5         0.5
3  3           7           7           5         0.7         0.8         0.9

The table below is what I want to get. I want to create a new column for each date (e.g. "01.2012"):

res_date = fix_date * fox_date

As I have many dates / pairs of dates, I guess this needs to be done by looping through the names.

 id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012 res_01.2012 res_02.2012 res_03.2012
1  1           2           5           6         0.4         0.6         0.7         0.8         3.0         4.2
2  2           5           1           1         0.5         0.5         0.5         2.5         0.5         0.5
3  3           7           7           5         0.7         0.8         0.9         4.9         5.6         4.5

Anyone can help? Thanks very much in advance!

标签: rnamesdplyr

解决方案


Here is an idea that uses split.default to split the data frame based on similar column names (based on your conditions). We then loop over that list and multiply the columns. In this case, we use Reduce (rather than i[1]*i[2]) to multiply in order to account for more than two columns

do.call(cbind, 
   lapply(split.default(df[-1], gsub('.*_', '', names(df[-1]))), function(i) Reduce(`*`, i)))

#     01.2012 02.2012 03.2012
#[1,]     0.8     3.0     4.2
#[2,]     2.5     0.5     0.5
#[3,]     4.9     5.6     4.5

Bind them back to the original with cbind.data.frame()


推荐阅读