首页 > 解决方案 > How to extend dataframe based on values in the same dataframe in R

问题描述

I have the following tibble and I want to sample the arrival time for every passenger with a poisson distribution rpois(n, lambda).

# A tibble: 3 x 4
  flight terminal passengers arrivaltime
  <chr>  <chr>         <dbl>       <dbl>
1 LX123  A                3         120
2 UA1    B                2         130

The final tibble should look like this and every row represents a single passenger with the arrival time being a sample from the poisson distribution with lambda being the arrival time of the flight in the first tibble.

# A tibble: 3 x 4
  flight terminal arrivaltime
  <chr>  <chr>         <dbl>
1 LX123  A              125
2 LX123  A              115
3 LX123  A              118
4 UA1    B              129
5 UA1    B              132

I already have the following code which calculates the rpois values and applies it to the tibble:

f = function(x, output){
  n = as.integer(x[[3]])
  lambda = as.integer(x[[4]])
  rpois(n, lambda)
} 
apply(tibble, MARGIN = 1, FUN = f)

My question is now how to complete my approach to create the second tibble. Since the dataset used is huge, fast computation is an issue.

标签: rfunctionapplyprobabilitytibble

解决方案


Here is one option with tidyverse where we uncount based on the 'passengers' column, grouped by 'flight' apply the rpois with the number of rows (n()) and the first element of 'arrivaltime'

library(dplyr)
library(tidyr)
df1 %>%
    uncount(passengers) %>%
    group_by(flight) %>%
    mutate(arrivaltime = rpois(n(), first(arrivaltime)))

Or another option is to use map2 to loop over corresponding elements of 'passengers', 'arrivaltime' to apply the rpois and unnest the list column to expand the dataset rows

library(purrr)
df1 %>%
  mutate(arrivaltime = map2(passengers, arrivaltime, rpois)) %>%
  unnest(c(arrivaltime))
# A tibble: 5 x 4
#  flight terminal passengers arrivaltime
#  <chr>  <chr>         <dbl>       <int>
#1 LX123  A                 3         127
#2 LX123  A                 3         110
#3 LX123  A                 3         131
#4 UA1    B                 2         109
#5 UA1    B                 2         133

data

df1 <- structure(list(flight = c("LX123", "UA1"), terminal = c("A", 
"B"), passengers = c(3, 2), arrivaltime = c(120, 130)), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

推荐阅读