首页 > 解决方案 > shape file > data frame > using ggplot and joining on GEO ID with other data sets

问题描述

The county shape file is only available as a national shape file (2017 TIGER/Line® Shapefiles: Counties (and equivalent), https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2017&layergroup=Counties+%28and+equivalent%29

I want to select just one state (e.g. Pennsylvania). So,

# read county polygons
counties <- readOGR(dsn="tl_2017_us_county", layer="tl_2017_us_county") 
# subset to PA counties 
PA_counties <- subset(counties, counties@data$STATEFP == "42")

HOWEVER, when I try and create a data frame and map, I'm getting the error: Error in FUN(X[[i]], ...) : object 'lon' not found

# create a data frame
PA_counties.df <- as.data.frame(PA_counties)
PA_counties.dfFORT <-fortify(PA_counties.df, region = "GEOID")

gg<-ggplot()
gg <- gg +geom_polygon(data =PA_counties.dfFORT, aes(x=lon, y=lat, group=group,
        fill=NA), color = "blue", fill=NA, size = 0.5 
gg <- gg +coord_map()
gg 

Help? I'm hoping to create this map; merge the data with another file by GeoID, and fill some of the counties (e.g. for GeoID xxx if =1 then fill with blue, etc).

This must be a very common mapping use case scenario? Grateful for any tips?

Best, Lori

标签: rggplot2mappingrgdalsf

解决方案


我做了一些故障排除,我鼓励你也这样做。首先,查看强化数据框的名称:您有 、 和 等列,INTPLAT而不是、和。INTPLONGROUPlatlonggroup

当您调用as.data.frame空间数据框然后调用fortify时,您并没有得到我认为您期望的结果。如果您仔细查看从这两个函数获得的输出,它似乎是质心或其他点,这样每个县只有一个点,并且坐标是因子,而不是数字。相反,您需要调用fortify空间数据框本身。您应该期待一个包含数千行的数据框,因为构成每个县的多边形形状需要很多点。

请注意,我曾经tigris::counties获取 shapefile,因为我无法读取下载的文件来制作 reprex,但我很确定 shapefile 是相同的。

library(tidyverse)
library(sf)
library(rgdal)

counties <- tigris::counties(cb = T)

# counties <- readOGR(dsn="tl_2017_us_county", layer="tl_2017_us_county") 
PA_counties <- subset(counties, counties@data$STATEFP == "42")
PA_counties.dfFORT <- fortify(PA_counties, region = "GEOID")

names(PA_counties.dfFORT)
#> [1] "long"  "lat"   "order" "hole"  "piece" "id"    "group"

然后您可以geom_polygon按预期使用:

ggplot(PA_counties.dfFORT, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = NA, color = "blue") +
  coord_map()

一种更简单、更灵活的方法是使用sf. 同样,您可以sf::read_sf在 shapefile 上使用;为了制作一个代表,我调用sf::st_as_sf了我得到的空间数据框tigrissf允许您使用dplyr-style 函数执行过滤、添加列和计算摘要等操作。

counties_sf <- st_as_sf(counties)
# counties_sf <- read_sf("tl_2017_us_county")
pa_counties_sf <- counties_sf %>%
  filter(STATEFP == "42")
head(pa_counties_sf)
#> Simple feature collection with 6 features and 9 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -80.36087 ymin: 39.72002 xmax: -74.7215 ymax: 40.74368
#> epsg (SRID):    4269
#> proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
#>   STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID       NAME LSAD
#> 1      42      003 01213657 0500000US42003 42003  Allegheny   06
#> 2      42      017 01209173 0500000US42017 42017      Bucks   06
#> 3      42      041 01209176 0500000US42041 42041 Cumberland   06
#> 4      42      055 01213670 0500000US42055 42055   Franklin   06
#> 5      42      061 01213672 0500000US42061 42061 Huntingdon   06
#> 6      42      071 01209181 0500000US42071 42071  Lancaster   06
#>        ALAND    AWATER                       geometry
#> 1 1890889706  37411488 MULTIPOLYGON (((-80.36078 4...
#> 2 1565408723  45201371 MULTIPOLYGON (((-75.48406 4...
#> 3 1412834155  12521844 MULTIPOLYGON (((-77.62503 4...
#> 4 2000052118   1544300 MULTIPOLYGON (((-78.09914 3...
#> 5 2265337403  37851955 MULTIPOLYGON (((-78.2567 40...
#> 6 2444606396 103423673 MULTIPOLYGON (((-76.72162 4...

sf也很容易使用,ggplot因为从ggplot2几周前发布的 3.0.0 版本开始,它ggplot附带了一个自动geom_sf读取sf对象geometry列的功能。您可以直接从读取 shapefile 到绘制它。

ggplot(pa_counties_sf) +
  geom_sf(fill = NA, color = "blue")

作为一个例子,你可以如何将你的形状合并到另一个数据集中,我为每个县制作了一些带有字母的虚拟数据,然后调用dplyr::left_join将其与sf数据框连接,然后绘制。希望这有助于您入门!

pa_data <- data_frame(
  GEOID = pa_counties_sf$GEOID
) %>%
  mutate(type = sample(c("A", "B"), size = nrow(.), replace = T))

pa_counties_sf %>%
  left_join(pa_data, by = "GEOID") %>%
  ggplot() +
    geom_sf(aes(fill = type), color = "blue")

reprex 包(v0.2.0)于 2018 年 7 月 15 日创建。


推荐阅读