首页 > 解决方案 > 在两个不同的数据框中复制给定列的顺序

问题描述

我有两个数据框

> x
               tags freq.Freq
1            #solar         1
2      #solarpanels         2
3             #wind         3
4    #ClimateChange         4
5           #energy         5
6  #renewableenergy         6
7         #windfarm         7
8           #Suncor         8
9            #Solar         9
10        #WindTree        10
11       #renewable        11
12      #climatecri        12
13      #renewables        13

> y
               tags freq.Freq
1        #renewable       740
2    #ClimateChange       722
3             #wind       638
4           #energy       541
5         #WindTree       525
6       #climatecri       518
7            #solar       359
8  #renewableenergy       326
9            #Solar       296
10      #renewables       245
11     #solarpanels      1029
12        #windfarm       291
13          #Suncor       282

y$freq.Freq是错误的。我想根据 中的位置复制该列的相应值x。例如#renewablein xhasx$freq.Freq等于 11,#ClimateChangehasx$freq.Freq等于 4 等等。那么第二个数据框应该是:

> y
               tags freq.Freq
1        #renewable       11
2    #ClimateChange       4
3             #wind       3
4           #energy       5
5         #WindTree       10
6       #climatecri       12
7            #solar       1
8  #renewableenergy       6
9            #Solar       9
10      #renewables       13
11     #solarpanels       2
12        #windfarm       7
13          #Suncor       8

我怎样才能获得正确的表达y?我试过了,x[order(y$tags),]但没有得到正确的结果。

标签: rdataframe

解决方案


我们可以match用来匹配两个数据集中的'tags'列,并从'x'数据集中获取'freq.Freq'的对应值

x$freq.Freq[match(y$tags, x$tags)]
#[1] 11  4  3  5 10 12  1  6  9 13  2  7  8

或者另一种选择是factor

as.integer(factor(y$tags, levels = x$tags))
#[1] 11  4  3  5 10 12  1  6  9 13  2  7  8

或与mutate

library(dplyr)
y %>% 
    mutate(freq.Freq = match(tags, x$tags))

数据

x <- structure(list(tags = c("solar", "solarpanels", "wind", "ClimateChange", 
"energy", "renewableenergy", "windfarm", "Suncor", "Solar", "WindTree", 
"renewable", "climatecri", "renewables"), freq.Freq = 1:13), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13"
))

y <- structure(list(tags = c("renewable", "ClimateChange", "wind", 
"energy", "WindTree", "climatecri", "solar", "renewableenergy", 
"Solar", "renewables", "solarpanels", "windfarm", "Suncor"), 
    freq.Freq = c(740L, 722L, 638L, 541L, 525L, 518L, 359L, 326L, 
    296L, 245L, 1029L, 291L, 282L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13"
))

推荐阅读