首页 > 解决方案 > 使用 dplyr 和 data.table 在组内分配行号

问题描述

我需要创建一个新列,其中包含组内的行号。

一些需要处理的数据:

> set.seed(222)
> dt <- diamonds %>% 
  select(cut, color, price) %>% 
  rename(riding=cut,party=color,votes=price) %>% 
  group_by(riding) %>% sample_n(3) %>% 
  distinct(riding,party,.keep_all = TRUE) %>%
  arrange(riding, desc(votes) ) %>% data.table %T>% print 

       riding party votes
        <ord> <ord> <int>
1:      Fair     H  3658
2:      Fair     G  2808
3:      Good     E  2542
4:      Good     D   684
5: Very Good     G  7974
6: Very Good     F  1637
7: Very Good     D   447
8:   Premium     H  5458
9:   Premium     F  2469
10:   Premium     D  1892
11:     Ideal     F 10786
12:     Ideal     E  4832
13:     Ideal     G   757

所以想要的输出应该是这样的:

       riding party votes place
       <ord> <ord> <int>  <int>
1:      Fair     H  3658   1
2:      Fair     G  2808   2
3:      Good     E  2542   1
4:      Good     D   684   2
5: Very Good     G  7974   1
6: Very Good     F  1637   2
7: Very Good     D   447   3
8:   Premium     H  5458   1
9:   Premium     F  2469   2
10:   Premium     D  1892  3
11:     Ideal     F 10786  1
12:     Ideal     E  4832  2
13:     Ideal     G   757  3

请告诉我如何做到这一点,使用一个dplyrdata.table两个,或两者。

我认为以下内容会起作用,但事实并非如此。有谁知道为什么?它改为提供全局行 n。我可以使用 .Iby吗?

> dt2[ order(votes), place:=.I, by=riding][]     # does not work
riding party votes place
<ord> <ord> <int> <int>
1:      Fair     H  3658     1
2:      Fair     G  2808     2
3:      Good     E  2542     3
4:      Good     D   684     4
5: Very Good     G  7974     5
6: Very Good     F  1637     6
7: Very Good     D   447     7
8:   Premium     H  5458     8
9:   Premium     F  2469     9
10:   Premium     D  1892    10
11:     Ideal     F 10786    11
12:     Ideal     E  4832    12
13:     Ideal     G   757    13

标签: rdplyrdata.table

解决方案


dplyr()我建议使用group_by()withriding然后使用从 1 到 的序列创建新变量n()

library(dplyr)
library(data.table)
#Code
set.seed(222)
dt <- diamonds %>% 
  select(cut, color, price) %>% 
  rename(riding=cut,party=color,votes=price) %>% 
  group_by(riding) %>% sample_n(3) %>% 
  distinct(riding,party,.keep_all = TRUE) %>%
  arrange(riding, desc(votes) ) %>% data.table %>% print 

#Create id
dt %>% group_by(riding) %>% mutate(place=1:n())

输出:

# A tibble: 13 x 4
# Groups:   riding [5]
   riding    party votes place
   <ord>     <ord> <int> <int>
 1 Fair      H      3658     1
 2 Fair      G      2808     2
 3 Good      E      2542     1
 4 Good      D       684     2
 5 Very Good G      7974     1
 6 Very Good F      1637     2
 7 Very Good D       447     3
 8 Premium   H      5458     1
 9 Premium   F      2469     2
10 Premium   D      1892     3
11 Ideal     F     10786     1
12 Ideal     E      4832     2
13 Ideal     G       757     3

推荐阅读