首页 > 解决方案 > 当 x 发生在具有 x 的分类数据的日期表时,如何从日期表中转换 tibble

问题描述

所以我有一个数据集,显示每个国家加入世界贸易组织 (WTO) 及其前身关税和贸易总协定 (1995) 的年份。需要注意的重要一点是,世贸组织是在 1995 年作为 GATT 的扩展(创建于 1947 年)而创建的,一些 GATT 成员(例如下面的安哥拉)没有在 1995 年立即加入世贸组织,而是等到 1996 年或更晚,具体取决于国家。一些国家也不是关贸总协定的成员,但在世贸组织成立后加入了世贸组织(例如下面的阿富汗)。

我想以下面第一个 tibble 的格式获取我的数据,并更改格式以列出每个国家的所有年份和一个分类变量,显示它们是 GATT、WTO 的成员,还是两者都不是。我的实际数据集比这个例子大得多,日期从 1948 年到 2017 年,还有更多的国家,所以手动这样做会很糟糕。

对于这个例子,只限制从 1992 年到 1996 年的日期并查看前 6 个国家,基本上我想从这个开始:

df <- data.frame(Country = c("Afghanistan", "Albania", "Angola", "Antigua and Barbuda", "Argentina", "Armenia"), 
                 Year_joined_WTO = c(2016, 2000, 1996, 1995, 1995, 2003),
                 Year_joined_GATT = c(NA, NA, 1994, 1987, 1967, NA))
df <- as_tibble(df)

> df
# A tibble: 6 x 3
  Country             Year_joined_WTO Year_joined_GATT
  <fct>                         <dbl>            <dbl>
1 Afghanistan                    2016               NA
2 Albania                        2000               NA
3 Angola                         1996             1994
4 Antigua and Barbuda            1995             1987
5 Argentina                      1995             1967
6 Armenia                        2003               NA

对此:

df_intended <- data.frame(Country = c("Afghanistan", "Afghanistan","Afghanistan","Afghanistan","Afghanistan", "Albania", "Albania","Albania","Albania","Albania","Angola", "Angola","Angola","Angola","Angola","Antigua and Barbuda","Antigua and Barbuda","Antigua and Barbuda","Antigua and Barbuda","Antigua and Barbuda", "Argentina", "Argentina","Argentina","Argentina","Argentina","Armenia","Armenia","Armenia","Armenia","Armenia"), 
                 Year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996),
                 Member_WTO_GATT = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "GATT", "GATT", "WTO", "GATT","GATT","GATT", "WTO", "WTO", "GATT","GATT","GATT", "WTO", "WTO", NA, NA, NA, NA, NA))
df_intended <- as_tibble(df_intended)

print(tbl_df(df_intended), n =30)

# A tibble: 30 x 3
   Country              Year Member_WTO_GATT
   <fct>               <dbl> <fct>          
 1 Afghanistan          1992 NA             
 2 Afghanistan          1993 NA             
 3 Afghanistan          1994 NA             
 4 Afghanistan          1995 NA             
 5 Afghanistan          1996 NA             
 6 Albania              1992 NA             
 7 Albania              1993 NA             
 8 Albania              1994 NA             
 9 Albania              1995 NA             
10 Albania              1996 NA             
11 Angola               1992 NA             
12 Angola               1993 NA             
13 Angola               1994 GATT           
14 Angola               1995 GATT           
15 Angola               1996 WTO            
16 Antigua and Barbuda  1992 GATT           
17 Antigua and Barbuda  1993 GATT           
18 Antigua and Barbuda  1994 GATT           
19 Antigua and Barbuda  1995 WTO            
20 Antigua and Barbuda  1996 WTO            
21 Argentina            1992 GATT           
22 Argentina            1993 GATT           
23 Argentina            1994 GATT           
24 Argentina            1995 WTO            
25 Argentina            1996 WTO            
26 Armenia              1992 NA             
27 Armenia              1993 NA             
28 Armenia              1994 NA             
29 Armenia              1995 NA             
30 Armenia              1996 NA  

我曾尝试将这些年收集到一个专栏中,但我遇到的问题是如何在一个专栏中显示每个国家/地区的每年,并在他们加入后的几年中显示他们是会员。

我微弱的尝试:

df2 <- df %>% 
  group_by(Country) %>% 
  gather(Year_joined_WTO, Year_joined_GATT, key = member_wto_gatt, value = Year)

> df2
# A tibble: 12 x 3
# Groups:   Country [6]
   Country             member_wto_gatt   Year
   <fct>               <chr>            <dbl>
 1 Afghanistan         Year_joined_WTO   2016
 2 Albania             Year_joined_WTO   2000
 3 Angola              Year_joined_WTO   1996
 4 Antigua and Barbuda Year_joined_WTO   1995
 5 Argentina           Year_joined_WTO   1995
 6 Armenia             Year_joined_WTO   2003
 7 Afghanistan         Year_joined_GATT    NA
 8 Albania             Year_joined_GATT    NA
 9 Angola              Year_joined_GATT  1994
10 Antigua and Barbuda Year_joined_GATT  1987
11 Argentina           Year_joined_GATT  1967
12 Armenia             Year_joined_GATT    NA

我也尝试过与我想要的所有日期列表进行一些连接和合并(例如

years <- data.frame(Year = c(1992:1996))
years <- as_tibble(years)

> df3 <- right_join(df2, years)
Joining, by = "Year"
Warning message:
Factor `Country` contains implicit NA, consider using `forcats::fct_explicit_na` 

> df3
# A tibble: 6 x 3
# Groups:   Country [7]
  Country             member_wto_gatt   Year
  <fct>               <chr>            <dbl>
1 NA                  NA                1992
2 NA                  NA                1993
3 Angola              Year_joined_GATT  1994
4 Antigua and Barbuda Year_joined_WTO   1995
5 Argentina           Year_joined_WTO   1995
6 Angola              Year_joined_WTO   1996

) 但他们完全不成功,我找不到任何类似的例子来说明如何做到这一点。任何帮助,将不胜感激

标签: rdataframedplyr

解决方案


您可以尝试使用gather,completefill. gather将数据转换为长格式,用于使用和sub的列名,然后是具有最新非 NA 值的 NA 值。"WTO""GATT"group_by Countryfill

library(dplyr)
library(tidyr)

df %>%
  gather(key, Value, -Country) %>%
  mutate(key = sub("Year_joined_", "", key)) %>%
  group_by(Country) %>%
  complete(Value = seq(1992, 1996)) %>%
  fill(key) 

对于您的真实数据,您可以使用seq(min(Value), max(Value))而不是硬编码年份,或者如果您已经知道每个国家/地区应该拥有哪些年份,您可以使用这些数字。


推荐阅读