r - Split columns or vector to pass to Purrr Function
问题描述
I have a web scrape function that I created that gets data from an API. I pass a df
column I have to one of the function arguments in the web scrape function. The issue I'm having is that the URL takes up to 500 numbers in one of the parameters, and my df
has 2000 rows.
How would I split the rows by 500 in order to pass the values into the function?
I've created a very basic reprex that shows the workflow of what I am looking to do. I want to pass the split df column to the parse function. I'm guessing I would need to wrap the JSON
parse with map_dfr
library(tidyverse)
sample_df <- tibble(id = 1:20,
col_2 = rnorm(1:20))
# parse function
parse_people <- function(ids = c("1", "10"), argument_2 = NULL){
# Fake Base Url
base_url <- "https://www.thisisafakeurl.com/api/people?Ids="
# fix query parameters to collapse Ids to pass to URL
ids<- stringr::str_c(ids, collapse = ",")
url <- glue::glue("{base_url}{ids}")
# Get URL
resp <- httr::GET(url)
# Save Response in JSON Format
out <- httr::content(resp, as = "text", encoding = "UTF-8")
# Read into JSON format.
jsonlite::fromJSON(out, simplifyDataFrame = TRUE, flatten = TRUE)
}
sample_parse <- parse_people(sample_df$id)
I think I probably need to create 2 functions. 1 function that parses the data, and one that uses map_dfr based off of the splits.
Something like:
# Split ID's from DF here. I want blocks of 500 rows to pass below
# Map Split ID's over parse_people
ids %>%
map_dfr(parse_people)
解决方案
In the meantime, you can split your 20 row dataframe into 5 dataframes of 4 rows each via:
sample_df <- tibble(id = 1:20,
col_2 = rnorm(1:20))
split(sample_df, rep(1:5, each = 4))
Then you can pass the resulting list of dataframes to a purrr function.
Edit: If you don't know the total rows in advance, want to split by a given number, but also include all rows, there's another solution in the link:
chunk <- 3
n <- nrow(sample_df)
r <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(sample_df,r)
Here I want chunks of 3, but it will include all rows (the last data frame in the list has 2 rows)
推荐阅读
- reactjs - 表单加载时最初不会填充下拉字段
- javascript - Chart.JS - 错误的 Y 轴
- selenium - Capybara + Selenium + Web 抓取 - 并行请求失败
- android - 图像选择器库不适用于颤振应用
- r - 用 R 链接计划中的航班
- sql - 闪亮的应用程序正在将 dateInputs 作为数字写入 SQL,我如何将它们格式化为日期?
- javascript - 基于 innerHTML 值的颜色
- datagrid - 使用 C# 如何从 Telerik GridCheckBoxColumn 获取值
- javascript - 将 ondrop 分配给 JavaScript 中的元素
- python-3.x - RuntimeWarning 尝试运行 Discord Bot 时