首页 > 解决方案 > 使用 r 中的 `map` 函数将`for loop` 转换为 purrr

问题描述

我需要从 NASA 的 POWER(全球能源资源预测)下载天气数据。该软件包nasapower是为使用 R 进行数据检索而开发的软件包。我需要下载许多位置(纬度、经度坐标)。为此,我尝试了一个具有三个位置的简单循环作为可重现的示例。

library(nasapower)

data1 <- read.csv(text = "
location,long,lat
loc1, -56.547, -14.2427
loc2, -57.547, -15.2427
loc3, -58.547, -16.2427")

i=1
all.weather <- data.frame()
for (i in seq_along(1:nrow(data1))) {

        weather.data <- get_power(community = "AG",
                          lonlat = c(data1$long[i],data1$lat[i]),
                          dates = c("2015-01-01", "2015-01-10"),
                          temporal_average = "DAILY",
                          pars = c("T2M_MAX"))
        
        all.weather <-rbind(all.weather, weather.data)
}

这很完美。问题是我试图模仿这个使用purrr::map,因为我想在tidyverse. 这就是我所做的,但它不起作用:

library(dplyr)
library(purrr)

all.weather <- data1 %>%
    group_by(location) %>%
    map(get_power(community = "AG",
                lonlat = c(long, lat),
                dates = c("2015-01-01", "2015-01-10"),
                temporal_average = "DAILY",
                site_elevation = NULL,
                pars = c("T2M_MAX")))

我收到以下错误:

Error in isFALSE(length(lonlat != 2)) : object 'long' not found

关于如何运行它的任何提示purrr

标签: rloopsfor-looptidyversepurrr

解决方案


为了使您的代码工作,请使用purrr::pmap而不是map像这样:

  1. map是一个参数函数,map2两个参数函数,pmap是最通用的函数,允许具有两个以上参数的函数。

  2. pmap将遍历您的 df 的行。由于您的 df 有 3 列,因此即使location未使用第一个参数,也会将 3 个参数传递给函数。要完成这项工作并使用列名,您必须通过以下方式指定函数和参数名称function(location, long, lat)

library(nasapower)

data1 <- read.csv(text = "
location,long,lat
loc1, -56.547, -14.2427
loc2, -57.547, -15.2427
loc3, -58.547, -16.2427")

library(dplyr)
library(purrr)

all.weather <- data1 %>%
  pmap(function(location, long, lat) get_power(community = "AG",
                                               lonlat = c(long, lat),
                                               dates = c("2015-01-01", "2015-01-10"),
                                               temporal_average = "DAILY",
                                               site_elevation = NULL,
                                               pars = c("T2M_MAX"))) %>% 
  # Name list with locations
  setNames(data1$location) %>% 
  # Add location names as identifiers
  bind_rows(.id = "location")

head(all.weather)
#> NASA/POWER SRB/FLASHFlux/MERRA2/GEOS 5.12.4 (FP-IT) 0.5 x 0.5 Degree Daily Averaged Data  
#>  Dates (month/day/year): 01/01/2015 through 01/10/2015  
#>  Location: Latitude  -14.2427   Longitude -56.547  
#>  Elevation from MERRA-2: Average for 1/2x1/2 degree lat/lon region = 379.25 meters   Site = na  
#>  Climate zone: na (reference Briggs et al: http://www.energycodes.gov)  
#>  Value for missing model data cannot be computed or out of model availability range: NA  
#>  
#>  Parameters: 
#>  T2M_MAX MERRA2 1/2x1/2 Maximum Temperature at 2 Meters (C)  
#>  
#> # A tibble: 6 x 9
#>   location   LON   LAT  YEAR    MM    DD   DOY YYYYMMDD   T2M_MAX
#>   <chr>    <dbl> <dbl> <dbl> <int> <int> <int> <date>       <dbl>
#> 1 loc1     -56.5 -14.2  2015     1     1     1 2015-01-01    29.9
#> 2 loc1     -56.5 -14.2  2015     1     2     2 2015-01-02    30.1
#> 3 loc1     -56.5 -14.2  2015     1     3     3 2015-01-03    27.3
#> 4 loc1     -56.5 -14.2  2015     1     4     4 2015-01-04    28.7
#> 5 loc1     -56.5 -14.2  2015     1     5     5 2015-01-05    30  
#> 6 loc1     -56.5 -14.2  2015     1     6     6 2015-01-06    28.7

推荐阅读