首页 > 解决方案 > 按照特定模式更改多个变量名称

问题描述

我有一个包含数百个变量的数据集,看起来大致像这样

dt <- data.frame(id= c(1,1,1,2,2,2,3,3,3), time=c(1,2,1,2,1,2,1,2,1), dp_eu_ = rnorm(9), EU_top = rnorm(9), fr_dp_us_ = rnorm(9), us = rnorm(9), c= rnorm(9), dp_eu_fit= rnorm(9))
dt
# id time     dp_eu_      EU_top      dp_us_          us            c  dp_eu_fit
# 1  1    1 -1.1184009 -1.07430118  0.61398523 -0.68343624 -0.050577369  0.2849573
# 2  1    2  0.4347047 -0.53454071 -0.30716538 -1.02328242  0.626537910  0.7790181
# 3  1    1  0.2318315 -0.05854228  0.05169733 -0.22130149 -0.224553878  1.5612293
# 4  2    2  1.2640080  2.07899296 -0.95918953 -0.35961156  0.839223862  0.5001897
# 5  2    1 -0.4374764 -0.25284854 -0.46251901  0.08630344  1.749488237  0.7155184
# 6  2    2  0.5042690  0.13322671  1.00881113  0.43807458 -0.007357072  0.5086272
# 7  3    1  0.3672216  1.92995242  0.48708183  0.58206127  0.112447259 -0.4707959
# 8  3    2 -1.5431709  0.53362731  1.17361087 -1.00932195 -0.125171990  0.8641184
# 9  3    1 -1.4577268  0.23413541 -0.32399489 -0.91040641  1.995611848  1.3348043

我想用以下标准更改我的变量的名称:如果变量名称包含dp_theneu并且us应该是大写字母,EUandUS分别。否则名称应保持不变

我知道如何一个一个地更改变量名,但鉴于我有数百个变量,这个操作应该系统化。

最终的数据集应该是这样的

f.dt <- data.frame(id= c(1,1,1,2,2,2,3,3,3), time=c(1,2,1,2,1,2,1,2,1), dp_EU_ = rnorm(9), EU_top = rnorm(9), fr_dp_US_ = rnorm(9), us = rnorm(9), c= rnorm(9), dp_EU_fit= rnorm(9))
f.dt
# id time     dp_EU_      EU_top      fr_dp_US_     us            c      dp_EU_fit
# 1  1    1 -1.1184009 -1.07430118  0.61398523 -0.68343624 -0.050577369  0.2849573
# 2  1    2  0.4347047 -0.53454071 -0.30716538 -1.02328242  0.626537910  0.7790181
# 3  1    1  0.2318315 -0.05854228  0.05169733 -0.22130149 -0.224553878  1.5612293
# 4  2    2  1.2640080  2.07899296 -0.95918953 -0.35961156  0.839223862  0.5001897
# 5  2    1 -0.4374764 -0.25284854 -0.46251901  0.08630344  1.749488237  0.7155184
# 6  2    2  0.5042690  0.13322671  1.00881113  0.43807458 -0.007357072  0.5086272
# 7  3    1  0.3672216  1.92995242  0.48708183  0.58206127  0.112447259 -0.4707959
# 8  3    2 -1.5431709  0.53362731  1.17361087 -1.00932195 -0.125171990  0.8641184
# 9  3    1 -1.4577268  0.23413541 -0.32399489 -0.91040641  1.995611848  1.3348043

非常感谢您的帮助

此致

标签: r

解决方案


我们可以使用rename_at. 指定starts_with'dp'的列名,sub用于捕获 'eu|us' 作为捕获组,在替换中指定捕获组的反向引用 ( \\1) 并将其转换为大写 ( \\U)

library(dplyr)
dt %>%
   rename_at(vars(starts_with('dp')),
            ~sub("_(eu|us)", "_\\U\\1", ., perl = TRUE))
#  id time     dp_EU_     EU_top      dp_US_          US           c   dp_EU_fit
#1  1    1  0.4978505 -1.6866933  0.82158108  2.16895597 -0.04287046 -0.50232345
#2  1    2 -1.9666172  0.8377870  0.68864025  1.20796200  1.36860228 -0.33320738
#3  1    1  0.7013559  0.1533731  0.55391765 -1.12310858 -0.22577099 -1.01857538
#4  2    2 -0.4727914 -1.1381369 -0.06191171 -0.40288484  1.51647060 -1.07179123
#5  2    1 -1.0678237  1.2538149 -0.30596266 -0.46665535 -1.54875280  0.30352864
#6  2    2 -0.2179749  0.4264642 -0.38047100  0.77996512  0.58461375  0.44820978
#7  3    1 -1.0260044 -0.2950715 -0.69470698 -0.08336907  0.12385424  0.05300423
#8  3    2 -0.7288912  0.8951257 -0.20791728  0.25331851  0.21594157  0.92226747
#9  3    1 -0.6250393  0.8781335 -1.26539635 -0.02854676  0.37963948  2.05008469

或使用subfrombase R

names(dt) <-  sub("^(dp)_(eu|us)", "\\1_\\U\\2", names(dt), perl = TRUE)

推荐阅读