首页 > 解决方案 > How can I transform geospatial data read as lat/lon coordinates from a CSV to match data read from a SHP?

问题描述

Introduction

I am reading two files with geospatial data, one as .shp and one as .csv (using options = c('X_POSSIBLE_NAMES=lat', 'Y_POSSIBLE_NAMES=lon'). The .shp file has unusual values in the bbox field when it's read, but graphs well. The .csv file has the right values in the bbox field when it's read, but graphs badly. Here are the read messages and dputs.

Read messages

Reading layer `WWW` from data source `XXX.shp' using driver `ESRI Shapefile'
Simple feature collection with 222 features and 12 fields
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 745400.7 ymin: 2402049 xmax: 998753.2 ymax: 2671392
epsg (SRID):    32645
proj4string:    +proj=utm +zone=45 +datum=WGS84 +units=m +no_defs

and

options:        X_POSSIBLE_NAMES=lat Y_POSSIBLE_NAMES=lon 
Reading layer `YYY` from data source `ZZZ`.csv' using driver `CSV'
Simple feature collection with 5 features and 3 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 24.70942 ymin: 87.91372 xmax: 24.70942 ymax: 88.20073
epsg (SRID):    NA
proj4string:    NA

dputs

structure(list(Transporta = 31159.5791334, Transpor_1 = 1, Transpor_2 = 2, 
    Transpor_3 = 0, TransportS = 1, Transpor_4 = 120.8226354, 
    Transpor_5 = 40.37383139, Transpor_6 = 0.334157844, Transpor_7 = 0.6603, 
    Transpor_8 = 0.8694, Transpor_9 = structure(NA_integer_, .Label = character(0), class = "factor"), 
    class = structure(1L, .Label = "0", class = "factor"), geometry = structure(list(
        structure(c(745439.842100001, 745546.5795, 745632.3839, 
        745788.981400001, 745917.343899999, 745985.294699999, 
        746066.299, 746154.675899999, 746368.8333, 746569.5261, 
        746772.2455, 746881.8819, 747004.401999999, 747073.7139, 
        747272.5434, 747470.2489, 747669.9781, 747812.230399999, 
        747975.2242, 748282.6014, 748497.1264, 748706.9185, 749043.816300001, 
        749292.6146, 749610.7259, 749925.135099999, 750001.7305, 
        750624.1876, 750825.9755, 750948.688199999, 751059.9489, 
        751163.9131, 751327.932799999, 751563.730499999, 751814.7569, 
        751971.032300002, 752126.386900001, 752195.9161, 752213.0906, 
        752386.998999999, 752474.422399999, 752562.3503, 752732.1897, 
        752984.445600001, 753125.1189, 753316.914400001, 753451.7143, 
        753804.5972, 753975.4802, 754070.401899998, 754153.675599999, 
        754287.556099999, 754405.755000002, 754531.939100001, 
        754559.107699999, 754588.9883, 754515.016200001, 754515.4296, 
        754545.5989, 754566.464399999, 754662.8412, 754914.5953, 
        755505.306800001, 756006.1144, 757647.819899998, 757818.6545, 
        757857.6266, 757878.117400001, 757888.392300001, 757979.6043, 
        758039.2786, 758104.023299999, 758273.8898, 758353.649, 
        758363.7606, 758425.7181, 758461.5542, 758412.883999999, 
        758361.6557, 758337.5731, 758369.8289, 758400.552799999, 
        758599.791, 758782.7463, 758948.5656, 759066.780899999, 
        759122.1718, 759140.5877, 759163.8683, 759310.9385, 759418.1858, 
        761587.9766, 761697.946600001, 761732.9896, 761735.1635, 
        761771.6095, 761834.645200001, 761880.7744, 761988.7883, 
        762117.364700001, 762363.861100001, 762461.7972, 762577.650000001, 
        2549587.0462, 2549520.9314, 2549439.3669, 2549255.2505, 
        2549117.1188, 2548990.8909, 2548851.1706, 2548730.2398, 
        2548549.1469, 2548369.9096, 2548184.485, 2548055.6101, 
        2547843.9775, 2547702.8617, 2547377.6804, 2547071.7279, 
        2546836.4809, 2546710.2235, 2546607.1267, 2546468.6995, 
        2546397.9558, 2546368.2405, 2546431.9041, 2546481.667, 
        2546520.1404, 2546550.2619, 2546534.1647, 2546540.4899, 
        2546584.3733, 2546647.1947, 2546721.98, 2546784.4899, 
        2546843.9544, 2546872.2121, 2546826.1245, 2546737.2423, 
        2546642.6158, 2546522.2403, 2546392.8828, 2545671.1662, 
        2545403.889, 2545106.76, 2544870.7301, 2544665.9495, 
        2544572.9174, 2544320.907, 2544112.5035, 2543574.9242, 
        2543279.2004, 2543016.8945, 2542750.9041, 2542376.3794, 
        2542042.1049, 2541679.6057, 2541467.5463, 2541190.3413, 
        2540905.4868, 2540780.0052, 2540631.2105, 2540525.5178, 
        2540373.1915, 2540070.0945, 2539482.8831, 2539046.8166, 
        2537839.0783, 2537724.5041, 2537640.0879, 2537539.1494, 
        2537377.2627, 2537091.1622, 2536893.6567, 2536619.2583, 
        2536123.8235, 2535849.6827, 2535679.4848, 2535292.3446, 
        2534815.1725, 2534426.1384, 2534186.3689, 2533976.9259, 
        2533708.7239, 2533530.0804, 2533205.0132, 2532931.801, 
        2532675.3421, 2532349.1915, 2532184.0268, 2531985.8156, 
        2531722.8626, 2531466.0844, 2531342.3258, 2531359.46, 
        2531296.537, 2531224.2141, 2531098.6506, 2530945.319, 
        2530557.4524, 2530278.6875, 2529895.6515, 2529626.4184, 
        2529270.0938, 2529090.8971, 2528794.2838), .Dim = c(103L, 
        2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING", 
    "sfc"), precision = 0, bbox = structure(c(xmin = 745439.842100001, 
    ymin = 2528794.2838, xmax = 762577.650000001, ymax = 2549587.0462
    ), class = "bbox"), crs = structure(list(epsg = 32645L, proj4string = "+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(Transporta = NA_integer_, 
Transpor_1 = NA_integer_, Transpor_2 = NA_integer_, Transpor_3 = NA_integer_, 
TransportS = NA_integer_, Transpor_4 = NA_integer_, Transpor_5 = NA_integer_, 
Transpor_6 = NA_integer_, Transpor_7 = NA_integer_, Transpor_8 = NA_integer_, 
Transpor_9 = NA_integer_, class = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = 1L, class = c("sf", 
"data.frame"))

and

structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600", 
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500", 
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052, 
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975, 
88.1996488668206, 88.2007268451616), geometry = structure(list(
    structure(c(24.7094195311052, 87.9137151118854), class = c("XY", 
    "POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
    ), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.2007268451616), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", 
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052, 
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
    proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(width = NA_integer_, 
lat = NA_integer_, lon = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = c(NA, 
5L), class = c("sf", "data.frame"))

Attempted fixes with st_crs() and st_transform()

I've tried setting the epsg and proj4string attributes of the second to those of the first, but it didn't fix the graph or the wrong geometry values:

# make a copy of second_sf
second_sf_transformed <- second_sf

# make EPSG of the copy of the second the same as the first
st_crs(second_sf_transformed) <- st_crs(first_sf)

# make the proj4string of the copy of the second the same as the first
second_sf_transformed <- st_transform(second_sf_transformed, '+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs')

Suspected issue

There is a transform that I need to be doing on the data from the csv so that I can analyze both shapefiles together, but I don't know what it is. Does anyone know?

Update

Thanks to @dww I got the coordinates for the shapefile right, but the csv data still won't graph correctly.

Transforms

# make EPSG of the copy of the second the same as the first (keeps ggplot() from complaining)
st_crs(second_sf_transformed) <- st_crs(first_sf)

# reproject data
first_sf_transformed <- st_transform(first_sf, "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" )

Dputs after transforms

shp data

structure(list(Transporta = 31159.5791334, Transpor_1 = 1, Transpor_2 = 2, 
    Transpor_3 = 0, TransportS = 1, Transpor_4 = 120.8226354, 
    Transpor_5 = 40.37383139, Transpor_6 = 0.334157844, Transpor_7 = 0.6603, 
    Transpor_8 = 0.8694, Transpor_9 = structure(NA_integer_, .Label = character(0), class = "factor"), 
    class = structure(1L, .Label = "0", class = "factor"), geometry = structure(list(
        structure(c(89.3951268474236, 89.3961571846398, 89.3969809093774, 
        89.3984785916188, 89.3997082546062, 89.4003506946848, 
        89.4011182562287, 89.401960703154, 89.4040200160893, 
        89.4059482811558, 89.4078952678924, 89.4089436459468, 
        89.4101043491955, 89.4107575329244, 89.4126439952916, 
        89.4145225108038, 89.4164320555286, 89.4177986910559, 
        89.4193712573017, 89.4223458004554, 89.4244259337919, 
        89.4264665255084, 89.4297614260539, 89.4321951598272, 
        89.4353028498903, 89.4383730983178, 89.439117259213, 
        89.4451869671167, 89.4471614499647, 89.4483680787049, 
        89.4494650144704, 89.4504888199855, 89.4520976579026, 
        89.4544012217856, 89.4568411070362, 89.4583501885189, 
        89.4598493329306, 89.4605074942507, 89.4606537526005, 
        89.4622309990771, 89.4630394820499, 89.4638479664175, 
        89.4654649342105, 89.4678903853704, 89.4692464104709, 
        89.4710746363703, 89.4723543932316, 89.4757057279109, 
        89.4773226951387, 89.4782047141727, 89.4789725685317, 
        89.4802157622041, 89.4813126972553, 89.4824827608323, 
        89.4827125868794, 89.4829581002389, 89.4821902450637, 
        89.4821735976744, 89.4824430933553, 89.4826290191, 89.4835431323413, 
        89.4859465218357, 89.4916059082089, 89.4964138369265, 
        89.5122097031426, 89.5138551060626, 89.5142207513114, 
        89.5144035734198, 89.5144767019004, 89.5153176863904, 
        89.5158661532172, 89.5164511855445, 89.5180234596532, 
        89.5187547490365, 89.518824862045, 89.5193638514563, 
        89.5196333452499, 89.5190943561545, 89.5185553667664, 
        89.5182858719894, 89.5185553670613, 89.5188248615575, 
        89.5207113235949, 89.5224477642401, 89.5240200382999, 
        89.5251169729864, 89.525628876173, 89.5257751340317, 
        89.5259579570345, 89.5273474078494, 89.5283712135035, 
        89.5495054976574, 89.5505658674762, 89.5508949488069, 
        89.5508949489754, 89.5512240291922, 89.5517724970054, 
        89.5521747065285, 89.5531619475244, 89.5543685767302, 
        89.556708704603, 89.557632082246, 89.5587100597639, 23.0366666407601, 
        23.0360541520806, 23.0353053114777, 23.0336204191383, 
        23.0323547265548, 23.0312054063198, 23.0299323767221, 
        23.0288278360431, 23.0271616644131, 23.0255142147017, 
        23.0238106014021, 23.0226311762231, 23.0207029106886, 
        23.0194189874174, 23.0164545471189, 23.013663804625, 
        23.0115108869883, 23.0103501834476, 23.0093954108462, 
        23.0081002152887, 23.0074297034657, 23.0071301670893, 
        23.0076543553864, 23.0080662184381, 23.0083657545691, 
        23.0085904068476, 23.0084336181262, 23.008397053537, 
        23.0087626986679, 23.009311166109, 23.0099693276515, 
        23.0105177951238, 23.0110296978796, 23.0112490852379, 
        23.0107951600909, 23.0099693278917, 23.0090917796211, 
        23.007994844059, 23.006824780279, 23.0002848724084, 22.9978594199624, 
        22.9951644743623, 22.9930085179719, 22.9911220557635, 
        22.9902610598549, 22.9879574957376, 22.9860561421864, 
        22.9811507560206, 22.9784558101354, 22.9760740319897, 
        22.9736607748343, 22.9702602760285, 22.9672254217359, 
        22.9639346170366, 22.9620166398532, 22.9595103113266, 
        22.956950796692, 22.9558182646749, 22.9544707918072, 
        22.9535137332223, 22.9521242822197, 22.949350394888, 
        22.9439605028831, 22.939948302187, 22.928796127859, 22.9277357574655, 
        22.9269679028182, 22.9260537905528, 22.9245912104683, 
        22.9219951306966, 22.9202034702482, 22.9177170836491, 
        22.9132196495687, 22.9107332636081, 22.9091957023515, 
        22.9056922716249, 22.9013803590159, 22.8978769292378, 
        22.8957209717259, 22.893834509633, 22.8914090590013, 
        22.8897920924525, 22.8868276509463, 22.8843336909749, 
        22.8819935625524, 22.8790318381511, 22.8775326931856, 
        22.87574103285, 22.8733643396172, 22.8710242115184, 22.8698907125254, 
        22.8697078900248, 22.8691228573515, 22.8684646968471, 
        22.8673311965241, 22.8659417452486, 22.8624315537002, 
        22.8599086028934, 22.8564349744818, 22.8539851531464, 
        22.8507309120264, 22.8490984092694, 22.846403462933), .Dim = c(103L, 
        2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING", 
    "sfc"), precision = 0, bbox = structure(c(xmin = 89.3951268474236, 
    ymin = 22.846403462933, xmax = 89.5587100597639, ymax = 23.0366666407601
    ), class = "bbox"), crs = structure(list(epsg = 4326L, proj4string = "+proj=longlat +datum=WGS84 +no_defs"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(Transporta = NA_integer_, 
Transpor_1 = NA_integer_, Transpor_2 = NA_integer_, Transpor_3 = NA_integer_, 
TransportS = NA_integer_, Transpor_4 = NA_integer_, Transpor_5 = NA_integer_, 
Transpor_6 = NA_integer_, Transpor_7 = NA_integer_, Transpor_8 = NA_integer_, 
Transpor_9 = NA_integer_, class = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = 1L, class = c("sf", 
"data.frame"))

csv data

structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600", 
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500", 
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052, 
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975, 
88.1996488668206, 88.2007268451616), geometry = structure(list(
    structure(c(24.7094195311052, 87.9137151118854), class = c("XY", 
    "POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
    ), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.2007268451616), class = c("XY", "POINT", "sfg"))), n_empty = 0L, crs = structure(list(
    epsg = 32645L, proj4string = "+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs"), class = "crs"), class = c("sfc_POINT", 
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052, 
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"))), row.names = c(NA, -5L), sf_column = "geometry", agr = structure(c(width = NA_integer_, 
lat = NA_integer_, lon = NA_integer_), class = "factor", .Label = c("constant", 
"aggregate", "identity")), class = c("sf", "data.frame"))

标签: rsf

解决方案


问题似乎是sf从 csv 数据创建的对象似乎在几何列中颠倒了纬度和经度:


csv_in <- structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600", 
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500", 
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052, 
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975, 
88.1996488668206, 88.2007268451616), geometry = structure(list(
    structure(c(24.7094195311052, 87.9137151118854), class = c("XY", 
    "POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
    ), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052, 
    88.2007268451616), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT", 
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052, 
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"), crs = structure(list(epsg = NA_integer_, 
    proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(width = NA_integer_, 
lat = NA_integer_, lon = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), row.names = c(NA, 
5L), class = c("sf", "data.frame"))


Simple feature collection with 5 features and 3 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 24.70942 ymin: 87.91372 xmax: 24.70942 ymax: 88.20073
epsg (SRID):    NA
proj4string:    NA
               width      lat      lon                  geometry
1 2.4716396025460600 24.70942 87.91372 POINT (24.70942 87.91372)
2 2.6281478054445400 24.70942 87.91398 POINT (24.70942 87.91398)
3  4.263698279008270 24.70942 88.11611 POINT (24.70942 88.11611)
4 3.0063089766321100 24.70942 88.19965 POINT (24.70942 88.19965)
5 3.3519104918569500 24.70942 88.20073 POINT (24.70942 88.20073)

(请注意,在几何列中,第一列是经度)。

您可以通过更正导入脚本然后使用 分配 CRSsf::st_crs(csv_in) <- 4326或使用类似的方法来解决此问题:

csv_in <- sf::st_as_sf(sf::st_drop_geometry(csv_in), coords = c("lon","lat"), remove= FALSE)
sf::st_crs(csv_in) <- 4326
csv_in

Simple feature collection with 5 features and 3 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 87.91372 ymin: 24.70942 xmax: 88.20073 ymax: 24.70942
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
               width      lat      lon                  geometry
1 2.4716396025460600 24.70942 87.91372 POINT (87.91372 24.70942)
2 2.6281478054445400 24.70942 87.91398 POINT (87.91398 24.70942)
3  4.263698279008270 24.70942 88.11611 POINT (88.11611 24.70942)
4 3.0063089766321100 24.70942 88.19965 POINT (88.19965 24.70942)
5 3.3519104918569500 24.70942 88.20073 POINT (88.20073 24.70942)

(请注意,现在几何列中的坐标是“倒置的”)。

试图绘制这个给出:

ggplot(csv_in) + geom_sf() + geom_sf(data = shp_in)

,这似乎没问题

在此处输入图像描述


推荐阅读