首页 > 解决方案 > 如何查找与我的数据框间接链接的值?

问题描述

我想要以下示例的一些想法/解决方案。我有两张桌子,deals.dfturnover.df。请看下面的玩具示例,它反映了它们的样子:

deals.df <- data.frame(DealID = c(100101, 100102, 100103, 100104),
                       CompanyName= c('ABC', 'ABC', 'DEF', 'HIJ'),
                       DealYear = c(2013,2014,2015,2013),
                       DealYearL1= deals.df$DealYear-1,
                       DealYearP1= deals.df$DealYear+1,
                       DealYearL1Turnover= c('?', '?', '?', '?'),
                       DealYearTurnover= c('?', '?', '?', '?'),
                       DealYearP1Turnover= c('?', '?', '?', '?'))

turnover.df <- data.frame(CompanyName=c('ABC', 'DEF', 'HIJ'),
                          Turnover2011= c(100, 150, 180),
                          Turnover2012=c(110, 160, 200),
                          Turnover2013= c(125, 175, 210),
                          Turnover2014= c(135,180,230),
                          Turnover2015= c(145, 200, 235),
                          Turnover2016= c(160, 220, 250))

NoteDealYearL1表示“交易年份减 1”,DealYearP1意思是“交易年份加 1”。我想要做的是用“?”填充变量。逻辑是这样的:

对于每个DealIDin ,找到与交易前一年、交易当年和交易后一年deals.df的营业额相关的营业额CompanyName(即带有“?”值的变量)。turnover.df因此,例如,第一行看起来像这样:

#Ideal output in deals.df

DealID   CompanyName   DealYear   DealYearL1   DealYearP1   DealYearL1Turnover   DealYearTurnover     DealYearP1Turnover
100101   ABC          2013         2012         2014           110                     125                135



标签: rlookup

解决方案


使用的一种可能性tidyverse(假设变量turnover.df按年份顺序排列 - 如果未排序,则arrange在之前mutate):

library(tidyverse)

long.df <- turnover.df %>%
  rename_at(vars(starts_with("Turnover")), ~ str_replace(., "^(Turnover)", "\\1_")) %>%
  pivot_longer(cols = starts_with("Turnover"), 
               names_to = c(".value", "DealYear"), 
               names_sep = "_",
               names_ptypes = list(DealYear = numeric())) %>%
  group_by (CompanyName) %>%
  mutate(DealYearL1Turnover = lag(Turnover),
         DealYearP1Turnover = lead(Turnover)) %>%
  rename(DealYearTurnover = Turnover)

deals.df %>%
  left_join(long.df, by = c("CompanyName", "DealYear"))

输出

  DealID CompanyName DealYear DealYearL1 DealYearP1 DealYearTurnover DealYearL1Turnover DealYearP1Turnover
1 100101         ABC     2013       2012       2014              125                110                135
2 100102         ABC     2014       2013       2015              135                125                145
3 100103         DEF     2015       2014       2016              200                180                220
4 100104         HIJ     2013       2012       2014              210                200                230

数据

deals.df <- data.frame(DealID = c(100101, 100102, 100103, 100104),
                       CompanyName= c('ABC', 'ABC', 'DEF', 'HIJ'),
                       DealYear = c(2013,2014,2015,2013))

deals.df$DealYearL1= deals.df$DealYear-1
deals.df$DealYearP1= deals.df$DealYear+1

turnover.df <- data.frame(CompanyName=c('ABC', 'DEF', 'HIJ'),
                          Turnover2011= c(100, 150, 180),
                          Turnover2012= c(110, 160, 200),
                          Turnover2013= c(125, 175, 210),
                          Turnover2014= c(135, 180, 230),
                          Turnover2015= c(145, 200, 235),
                          Turnover2016= c(160, 220, 250))

推荐阅读