首页 > 解决方案 > Split columns or vector to pass to Purrr Function

问题描述

I have a web scrape function that I created that gets data from an API. I pass a df column I have to one of the function arguments in the web scrape function. The issue I'm having is that the URL takes up to 500 numbers in one of the parameters, and my df has 2000 rows.

How would I split the rows by 500 in order to pass the values into the function?

I've created a very basic reprex that shows the workflow of what I am looking to do. I want to pass the split df column to the parse function. I'm guessing I would need to wrap the JSON parse with map_dfr

library(tidyverse)

sample_df <- tibble(id = 1:20,
             col_2 = rnorm(1:20)) 

# parse function
parse_people <- function(ids = c("1", "10"), argument_2 = NULL){
  # Fake Base Url
  base_url <- "https://www.thisisafakeurl.com/api/people?Ids="

  # fix query parameters to collapse Ids to pass to URL
  ids<- stringr::str_c(ids, collapse = ",")

  url <- glue::glue("{base_url}{ids}")

  # Get URL
  resp <- httr::GET(url)

  # Save Response in JSON Format
  out <- httr::content(resp, as = "text", encoding = "UTF-8")

  # Read into JSON format.  
    jsonlite::fromJSON(out, simplifyDataFrame = TRUE, flatten = TRUE)

}


sample_parse <- parse_people(sample_df$id)

I think I probably need to create 2 functions. 1 function that parses the data, and one that uses map_dfr based off of the splits.

Something like:

# Split ID's from DF here.  I want blocks of 500 rows to pass below

# Map Split ID's over parse_people
ids %>% 
map_dfr(parse_people)

标签: rtidyversepurrr

解决方案


Possible duplicate here.

In the meantime, you can split your 20 row dataframe into 5 dataframes of 4 rows each via:

sample_df <- tibble(id = 1:20,
                    col_2 = rnorm(1:20)) 

split(sample_df, rep(1:5, each = 4))

Then you can pass the resulting list of dataframes to a purrr function.

Edit: If you don't know the total rows in advance, want to split by a given number, but also include all rows, there's another solution in the link:

chunk <- 3
n <- nrow(sample_df)
r  <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(sample_df,r)

Here I want chunks of 3, but it will include all rows (the last data frame in the list has 2 rows)


推荐阅读