首页 > 解决方案 > 如何将一行的值与R中不同列中的另一行直接匹配

问题描述

所以我有这个客户在线购买机票的数据。我想看看有多少人订了回程票。所以基本上我想将起点城市与直接行的目的地城市相匹配,反之亦然,对于同一个人和帐户,这将为我提供他们的双向旅行数据,然后我想计算他们的旅行天数。我正在尝试在 R 中执行此操作,但我无法将来源与直接行的目的地相匹配,反之亦然。

我把客户的账号排序了,手动看看有没有回程,还蛮多的。

数据是这样的:

Account number          origin city Destination city    Date
1                     London    chicago              7/22/2018
2                      Milan    London               7/23/2018
2                      London    Milan               7/28/2018
1                     chicago    london              8/22/2018

标签: r

解决方案


另一种选择是加入自身,并反转字段。

编辑:添加了“trip_num”以更好地处理同一个人的重复旅行。

library(dplyr)
# First, convert date field to Date type
df <- df %>% 
  mutate(Date = lubridate::mdy(Date)) %>%
  # update with M-M's suggestion in comments
  mutate_at(.vars = vars(origin_city, Destination_city), .funs = toupper) %>%
  # EDIT: adding trip_num to protect against extraneous joins for repeat trips
  group_by(Account_number, origin_city, Destination_city) %>%
  mutate(trip_num = row_number()) %>%
  ungroup()

df2 <- df %>%
  left_join(df, by = c("Account_number", "trip_num",
                       "origin_city" = "Destination_city",
                       "Destination_city" = "origin_city")) %>%
  mutate(days = (Date.x - Date.y)/lubridate::ddays(1))


> df2
# A tibble: 6 x 7
  Account_number origin_city Destination_city Date.x     trip_num Date.y      days
           <int> <chr>       <chr>            <date>        <int> <date>     <dbl>
1              1 LONDON      CHICAGO          2018-07-22        1 2018-08-22   -31
2              2 MILAN       LONDON           2018-07-23        1 2018-07-28    -5
3              2 LONDON      MILAN            2018-07-28        1 2018-07-23     5
4              1 CHICAGO     LONDON           2018-08-22        1 2018-07-22    31
5              2 MILAN       LONDON           2018-08-23        2 2018-08-28    -5
6              2 LONDON      MILAN            2018-08-28        2 2018-08-23     5

数据:(由 Account_number 2 添加重复行程)

df <- read.table(
  header = T, 
  stringsAsFactors = F,
  text = "Account_number          origin_city Destination_city    Date
1                     London    chicago              7/22/2018
2                      Milan    London               7/23/2018
2                      London    Milan               7/28/2018
1                     chicago    london              8/22/2018
2                      Milan    London               8/23/2018
2                      London    Milan               8/28/2018")

推荐阅读