首页 > 解决方案 > st_read() 中的查询问题

问题描述

我是 sf 包的新手,并试图根据查询读取 shapefile 并对其进行子集化。在这里,我使用了 sf_read()

  load <- st_read(dsn = "~Data", layer = "CBSA_MetroDiv", 
            query = "select * from CBSA_MetroDiv limit 3;")

但它抛出一个错误

Reading layer `CBSA_MetroDiv' from data source `\Data' using driver `ESRI Shapefile'

Error in st_sf(x, ..., agr = agr, sf_column_name = sf_column_name) : 
no simple features geometry column present

有人可以指导我解决这个问题。

标签: rspsf

解决方案


Update: query now implemented

The query option should now work for GDAL/OGR data sources.

Without query:

> s = st_read(f)
Reading layer `uk_LAD_may_2020_with_insets_v1.1' from data source 
  `/nobackup/rowlings/Downloads/uk_LAD_may_2020_with_insets_v1.1.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 464 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -87544.11 ymin: 5344.479 xmax: 980803.6 ymax: 1220302
Projected CRS: OSGB 1936 / British National Grid

gets 464 features. With query (and I have quote up the layer name because its got dots in it, you may not need to):

> s = st_read(f, query="select * from \"uk_LAD_may_2020_with_insets_v1.1\" where scale = 3")
Reading query `select * from "uk_LAD_may_2020_with_insets_v1.1" where scale = 3' from data source `/nobackup/rowlings/Downloads/uk_LAD_may_2020_with_insets_v1.1.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 52 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -87544.11 ymin: 146818.5 xmax: 949029.8 ymax: 986035
Projected CRS: OSGB 1936 / British National Grid

Only 52 features.

Previous Answer: for older sf package versions only:

The query option is only mentioned in the documentation for class DBIObject, for the "default S3" method there's no query parameter, so your query string is being passed into the ... argument which ends up being passed to st_as_sf and then at some later point it throws a spanner in the works.

There may be a way round, but one solution is to create a virtual data set file with the SQL in it. For example, I have a shapefile of French postal regions, and here is a virtual data set file called filter.vrt which applies an SQL select:

<OGRVRTDataSource>
    <OGRVRTLayer name="points">
        <SrcDataSource relativeToVRT="1">codes_postaux_region.shp</SrcDataSource>
        <SrcSQL>select * from codes_postaux_region where POP2010 > 20000</SrcSQL>
 </OGRVRTLayer>
</OGRVRTDataSource>

Create a similar file for your shapefile and SQL using a plain text editor, and then read it. Here you can see i get 6048 features if I read the shapefile, but only 707 when I read the virtual data file:

> fr = st_read("./codes_postaux_region.shp",quiet=TRUE)
> nrow(fr)
[1] 6048
> fr = st_read("./filter.vrt",quiet=TRUE)
> nrow(fr)
[1] 707

You might need to set the coordinate system on the filtered data set, either after reading it in if you know it or possibly via another VRT file parameter.

Might be worth pinging Edzer to see if SQL in st_read for shapefiles can be implemented or if I've missed something. I feel there's a way of telling st_read what the geometry column is...


推荐阅读