r - How can I transform geospatial data read as lat/lon coordinates from a CSV to match data read from a SHP?
问题描述
Introduction
I am reading two files with geospatial data, one as .shp and one as .csv (using options = c('X_POSSIBLE_NAMES=lat', 'Y_POSSIBLE_NAMES=lon'
). The .shp file has unusual values in the bbox field when it's read, but graphs well. The .csv file has the right values in the bbox field when it's read, but graphs badly. Here are the read messages and dputs.
Read messages
Reading layer `WWW` from data source `XXX.shp' using driver `ESRI Shapefile'
Simple feature collection with 222 features and 12 fields
geometry type: LINESTRING
dimension: XY
bbox: xmin: 745400.7 ymin: 2402049 xmax: 998753.2 ymax: 2671392
epsg (SRID): 32645
proj4string: +proj=utm +zone=45 +datum=WGS84 +units=m +no_defs
and
options: X_POSSIBLE_NAMES=lat Y_POSSIBLE_NAMES=lon
Reading layer `YYY` from data source `ZZZ`.csv' using driver `CSV'
Simple feature collection with 5 features and 3 fields
geometry type: POINT
dimension: XY
bbox: xmin: 24.70942 ymin: 87.91372 xmax: 24.70942 ymax: 88.20073
epsg (SRID): NA
proj4string: NA
dputs
structure(list(Transporta = 31159.5791334, Transpor_1 = 1, Transpor_2 = 2,
Transpor_3 = 0, TransportS = 1, Transpor_4 = 120.8226354,
Transpor_5 = 40.37383139, Transpor_6 = 0.334157844, Transpor_7 = 0.6603,
Transpor_8 = 0.8694, Transpor_9 = structure(NA_integer_, .Label = character(0), class = "factor"),
class = structure(1L, .Label = "0", class = "factor"), geometry = structure(list(
structure(c(745439.842100001, 745546.5795, 745632.3839,
745788.981400001, 745917.343899999, 745985.294699999,
746066.299, 746154.675899999, 746368.8333, 746569.5261,
746772.2455, 746881.8819, 747004.401999999, 747073.7139,
747272.5434, 747470.2489, 747669.9781, 747812.230399999,
747975.2242, 748282.6014, 748497.1264, 748706.9185, 749043.816300001,
749292.6146, 749610.7259, 749925.135099999, 750001.7305,
750624.1876, 750825.9755, 750948.688199999, 751059.9489,
751163.9131, 751327.932799999, 751563.730499999, 751814.7569,
751971.032300002, 752126.386900001, 752195.9161, 752213.0906,
752386.998999999, 752474.422399999, 752562.3503, 752732.1897,
752984.445600001, 753125.1189, 753316.914400001, 753451.7143,
753804.5972, 753975.4802, 754070.401899998, 754153.675599999,
754287.556099999, 754405.755000002, 754531.939100001,
754559.107699999, 754588.9883, 754515.016200001, 754515.4296,
754545.5989, 754566.464399999, 754662.8412, 754914.5953,
755505.306800001, 756006.1144, 757647.819899998, 757818.6545,
757857.6266, 757878.117400001, 757888.392300001, 757979.6043,
758039.2786, 758104.023299999, 758273.8898, 758353.649,
758363.7606, 758425.7181, 758461.5542, 758412.883999999,
758361.6557, 758337.5731, 758369.8289, 758400.552799999,
758599.791, 758782.7463, 758948.5656, 759066.780899999,
759122.1718, 759140.5877, 759163.8683, 759310.9385, 759418.1858,
761587.9766, 761697.946600001, 761732.9896, 761735.1635,
761771.6095, 761834.645200001, 761880.7744, 761988.7883,
762117.364700001, 762363.861100001, 762461.7972, 762577.650000001,
2549587.0462, 2549520.9314, 2549439.3669, 2549255.2505,
2549117.1188, 2548990.8909, 2548851.1706, 2548730.2398,
2548549.1469, 2548369.9096, 2548184.485, 2548055.6101,
2547843.9775, 2547702.8617, 2547377.6804, 2547071.7279,
2546836.4809, 2546710.2235, 2546607.1267, 2546468.6995,
2546397.9558, 2546368.2405, 2546431.9041, 2546481.667,
2546520.1404, 2546550.2619, 2546534.1647, 2546540.4899,
2546584.3733, 2546647.1947, 2546721.98, 2546784.4899,
2546843.9544, 2546872.2121, 2546826.1245, 2546737.2423,
2546642.6158, 2546522.2403, 2546392.8828, 2545671.1662,
2545403.889, 2545106.76, 2544870.7301, 2544665.9495,
2544572.9174, 2544320.907, 2544112.5035, 2543574.9242,
2543279.2004, 2543016.8945, 2542750.9041, 2542376.3794,
2542042.1049, 2541679.6057, 2541467.5463, 2541190.3413,
2540905.4868, 2540780.0052, 2540631.2105, 2540525.5178,
2540373.1915, 2540070.0945, 2539482.8831, 2539046.8166,
2537839.0783, 2537724.5041, 2537640.0879, 2537539.1494,
2537377.2627, 2537091.1622, 2536893.6567, 2536619.2583,
2536123.8235, 2535849.6827, 2535679.4848, 2535292.3446,
2534815.1725, 2534426.1384, 2534186.3689, 2533976.9259,
2533708.7239, 2533530.0804, 2533205.0132, 2532931.801,
2532675.3421, 2532349.1915, 2532184.0268, 2531985.8156,
2531722.8626, 2531466.0844, 2531342.3258, 2531359.46,
2531296.537, 2531224.2141, 2531098.6506, 2530945.319,
2530557.4524, 2530278.6875, 2529895.6515, 2529626.4184,
2529270.0938, 2529090.8971, 2528794.2838), .Dim = c(103L,
2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING",
"sfc"), precision = 0, bbox = structure(c(xmin = 745439.842100001,
ymin = 2528794.2838, xmax = 762577.650000001, ymax = 2549587.0462
), class = "bbox"), crs = structure(list(epsg = 32645L, proj4string = "+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(Transporta = NA_integer_,
Transpor_1 = NA_integer_, Transpor_2 = NA_integer_, Transpor_3 = NA_integer_,
TransportS = NA_integer_, Transpor_4 = NA_integer_, Transpor_5 = NA_integer_,
Transpor_6 = NA_integer_, Transpor_7 = NA_integer_, Transpor_8 = NA_integer_,
Transpor_9 = NA_integer_, class = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = 1L, class = c("sf",
"data.frame"))
and
structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600",
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500",
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052,
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975,
88.1996488668206, 88.2007268451616), geometry = structure(list(
structure(c(24.7094195311052, 87.9137151118854), class = c("XY",
"POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.2007268451616), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT",
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052,
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"), crs = structure(list(epsg = NA_integer_,
proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(width = NA_integer_,
lat = NA_integer_, lon = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = c(NA,
5L), class = c("sf", "data.frame"))
Attempted fixes with st_crs()
and st_transform()
I've tried setting the epsg and proj4string attributes of the second to those of the first, but it didn't fix the graph or the wrong geometry values:
# make a copy of second_sf
second_sf_transformed <- second_sf
# make EPSG of the copy of the second the same as the first
st_crs(second_sf_transformed) <- st_crs(first_sf)
# make the proj4string of the copy of the second the same as the first
second_sf_transformed <- st_transform(second_sf_transformed, '+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs')
Suspected issue
There is a transform that I need to be doing on the data from the csv so that I can analyze both shapefiles together, but I don't know what it is. Does anyone know?
Update
Thanks to @dww I got the coordinates for the shapefile right, but the csv data still won't graph correctly.
Transforms
# make EPSG of the copy of the second the same as the first (keeps ggplot() from complaining)
st_crs(second_sf_transformed) <- st_crs(first_sf)
# reproject data
first_sf_transformed <- st_transform(first_sf, "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" )
Dputs after transforms
shp data
structure(list(Transporta = 31159.5791334, Transpor_1 = 1, Transpor_2 = 2,
Transpor_3 = 0, TransportS = 1, Transpor_4 = 120.8226354,
Transpor_5 = 40.37383139, Transpor_6 = 0.334157844, Transpor_7 = 0.6603,
Transpor_8 = 0.8694, Transpor_9 = structure(NA_integer_, .Label = character(0), class = "factor"),
class = structure(1L, .Label = "0", class = "factor"), geometry = structure(list(
structure(c(89.3951268474236, 89.3961571846398, 89.3969809093774,
89.3984785916188, 89.3997082546062, 89.4003506946848,
89.4011182562287, 89.401960703154, 89.4040200160893,
89.4059482811558, 89.4078952678924, 89.4089436459468,
89.4101043491955, 89.4107575329244, 89.4126439952916,
89.4145225108038, 89.4164320555286, 89.4177986910559,
89.4193712573017, 89.4223458004554, 89.4244259337919,
89.4264665255084, 89.4297614260539, 89.4321951598272,
89.4353028498903, 89.4383730983178, 89.439117259213,
89.4451869671167, 89.4471614499647, 89.4483680787049,
89.4494650144704, 89.4504888199855, 89.4520976579026,
89.4544012217856, 89.4568411070362, 89.4583501885189,
89.4598493329306, 89.4605074942507, 89.4606537526005,
89.4622309990771, 89.4630394820499, 89.4638479664175,
89.4654649342105, 89.4678903853704, 89.4692464104709,
89.4710746363703, 89.4723543932316, 89.4757057279109,
89.4773226951387, 89.4782047141727, 89.4789725685317,
89.4802157622041, 89.4813126972553, 89.4824827608323,
89.4827125868794, 89.4829581002389, 89.4821902450637,
89.4821735976744, 89.4824430933553, 89.4826290191, 89.4835431323413,
89.4859465218357, 89.4916059082089, 89.4964138369265,
89.5122097031426, 89.5138551060626, 89.5142207513114,
89.5144035734198, 89.5144767019004, 89.5153176863904,
89.5158661532172, 89.5164511855445, 89.5180234596532,
89.5187547490365, 89.518824862045, 89.5193638514563,
89.5196333452499, 89.5190943561545, 89.5185553667664,
89.5182858719894, 89.5185553670613, 89.5188248615575,
89.5207113235949, 89.5224477642401, 89.5240200382999,
89.5251169729864, 89.525628876173, 89.5257751340317,
89.5259579570345, 89.5273474078494, 89.5283712135035,
89.5495054976574, 89.5505658674762, 89.5508949488069,
89.5508949489754, 89.5512240291922, 89.5517724970054,
89.5521747065285, 89.5531619475244, 89.5543685767302,
89.556708704603, 89.557632082246, 89.5587100597639, 23.0366666407601,
23.0360541520806, 23.0353053114777, 23.0336204191383,
23.0323547265548, 23.0312054063198, 23.0299323767221,
23.0288278360431, 23.0271616644131, 23.0255142147017,
23.0238106014021, 23.0226311762231, 23.0207029106886,
23.0194189874174, 23.0164545471189, 23.013663804625,
23.0115108869883, 23.0103501834476, 23.0093954108462,
23.0081002152887, 23.0074297034657, 23.0071301670893,
23.0076543553864, 23.0080662184381, 23.0083657545691,
23.0085904068476, 23.0084336181262, 23.008397053537,
23.0087626986679, 23.009311166109, 23.0099693276515,
23.0105177951238, 23.0110296978796, 23.0112490852379,
23.0107951600909, 23.0099693278917, 23.0090917796211,
23.007994844059, 23.006824780279, 23.0002848724084, 22.9978594199624,
22.9951644743623, 22.9930085179719, 22.9911220557635,
22.9902610598549, 22.9879574957376, 22.9860561421864,
22.9811507560206, 22.9784558101354, 22.9760740319897,
22.9736607748343, 22.9702602760285, 22.9672254217359,
22.9639346170366, 22.9620166398532, 22.9595103113266,
22.956950796692, 22.9558182646749, 22.9544707918072,
22.9535137332223, 22.9521242822197, 22.949350394888,
22.9439605028831, 22.939948302187, 22.928796127859, 22.9277357574655,
22.9269679028182, 22.9260537905528, 22.9245912104683,
22.9219951306966, 22.9202034702482, 22.9177170836491,
22.9132196495687, 22.9107332636081, 22.9091957023515,
22.9056922716249, 22.9013803590159, 22.8978769292378,
22.8957209717259, 22.893834509633, 22.8914090590013,
22.8897920924525, 22.8868276509463, 22.8843336909749,
22.8819935625524, 22.8790318381511, 22.8775326931856,
22.87574103285, 22.8733643396172, 22.8710242115184, 22.8698907125254,
22.8697078900248, 22.8691228573515, 22.8684646968471,
22.8673311965241, 22.8659417452486, 22.8624315537002,
22.8599086028934, 22.8564349744818, 22.8539851531464,
22.8507309120264, 22.8490984092694, 22.846403462933), .Dim = c(103L,
2L), class = c("XY", "LINESTRING", "sfg"))), class = c("sfc_LINESTRING",
"sfc"), precision = 0, bbox = structure(c(xmin = 89.3951268474236,
ymin = 22.846403462933, xmax = 89.5587100597639, ymax = 23.0366666407601
), class = "bbox"), crs = structure(list(epsg = 4326L, proj4string = "+proj=longlat +datum=WGS84 +no_defs"), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(Transporta = NA_integer_,
Transpor_1 = NA_integer_, Transpor_2 = NA_integer_, Transpor_3 = NA_integer_,
TransportS = NA_integer_, Transpor_4 = NA_integer_, Transpor_5 = NA_integer_,
Transpor_6 = NA_integer_, Transpor_7 = NA_integer_, Transpor_8 = NA_integer_,
Transpor_9 = NA_integer_, class = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = 1L, class = c("sf",
"data.frame"))
csv data
structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600",
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500",
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052,
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975,
88.1996488668206, 88.2007268451616), geometry = structure(list(
structure(c(24.7094195311052, 87.9137151118854), class = c("XY",
"POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.2007268451616), class = c("XY", "POINT", "sfg"))), n_empty = 0L, crs = structure(list(
epsg = 32645L, proj4string = "+proj=utm +zone=45 +datum=WGS84 +units=m +no_defs"), class = "crs"), class = c("sfc_POINT",
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052,
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"))), row.names = c(NA, -5L), sf_column = "geometry", agr = structure(c(width = NA_integer_,
lat = NA_integer_, lon = NA_integer_), class = "factor", .Label = c("constant",
"aggregate", "identity")), class = c("sf", "data.frame"))
解决方案
问题似乎是sf
从 csv 数据创建的对象似乎在几何列中颠倒了纬度和经度:
csv_in <- structure(list(width = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("2.4716396025460600",
"2.6281478054445400", "3.0063089766321100", "3.3519104918569500",
"4.263698279008270"), class = "factor"), lat = c(24.7094195311052,
24.7094195311052, 24.7094195311052, 24.7094195311052, 24.7094195311052
), lon = c(87.9137151118854, 87.9139846064706, 88.1161055453975,
88.1996488668206, 88.2007268451616), geometry = structure(list(
structure(c(24.7094195311052, 87.9137151118854), class = c("XY",
"POINT", "sfg")), structure(c(24.7094195311052, 87.9139846064706
), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.1161055453975), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.1996488668206), class = c("XY", "POINT", "sfg")), structure(c(24.7094195311052,
88.2007268451616), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT",
"sfc"), precision = 0, bbox = structure(c(xmin = 24.7094195311052,
ymin = 87.9137151118854, xmax = 24.7094195311052, ymax = 88.2007268451616
), class = "bbox"), crs = structure(list(epsg = NA_integer_,
proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(width = NA_integer_,
lat = NA_integer_, lon = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = c(NA,
5L), class = c("sf", "data.frame"))
Simple feature collection with 5 features and 3 fields
geometry type: POINT
dimension: XY
bbox: xmin: 24.70942 ymin: 87.91372 xmax: 24.70942 ymax: 88.20073
epsg (SRID): NA
proj4string: NA
width lat lon geometry
1 2.4716396025460600 24.70942 87.91372 POINT (24.70942 87.91372)
2 2.6281478054445400 24.70942 87.91398 POINT (24.70942 87.91398)
3 4.263698279008270 24.70942 88.11611 POINT (24.70942 88.11611)
4 3.0063089766321100 24.70942 88.19965 POINT (24.70942 88.19965)
5 3.3519104918569500 24.70942 88.20073 POINT (24.70942 88.20073)
(请注意,在几何列中,第一列是经度)。
您可以通过更正导入脚本然后使用 分配 CRSsf::st_crs(csv_in) <- 4326
或使用类似的方法来解决此问题:
csv_in <- sf::st_as_sf(sf::st_drop_geometry(csv_in), coords = c("lon","lat"), remove= FALSE)
sf::st_crs(csv_in) <- 4326
csv_in
Simple feature collection with 5 features and 3 fields
geometry type: POINT
dimension: XY
bbox: xmin: 87.91372 ymin: 24.70942 xmax: 88.20073 ymax: 24.70942
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
width lat lon geometry
1 2.4716396025460600 24.70942 87.91372 POINT (87.91372 24.70942)
2 2.6281478054445400 24.70942 87.91398 POINT (87.91398 24.70942)
3 4.263698279008270 24.70942 88.11611 POINT (88.11611 24.70942)
4 3.0063089766321100 24.70942 88.19965 POINT (88.19965 24.70942)
5 3.3519104918569500 24.70942 88.20073 POINT (88.20073 24.70942)
(请注意,现在几何列中的坐标是“倒置的”)。
试图绘制这个给出:
ggplot(csv_in) + geom_sf() + geom_sf(data = shp_in)
,这似乎没问题
推荐阅读
- azure - 在存储帐户中授予“存储 blob 数据贡献者”的应用程序角色
- vba - 如何从模板填充电子邮件的发件人、收件人、抄送?
- ruby-on-rails - 如何从数组中的控制器获取 ID
- makefile - glibc-2.14 make error: 遇到 shlib.lds 语法错误
- php - 如何每 24 小时从 .txt 文件中获取字符串?
- python - 尝试使用 Peewee (python) 连接到 Microsoft SQL 数据库
- shell - 如何将文本文件中的参数逐行传递给包含在字符串中的命令
- apache-kafka - 重新平衡分区时 Apache Kafka 代理的 OOM
- networking - 无法从 WSL2 访问网络
- mysql - SQL Groupby -- 自定义聚合函数:最大值减去每个组中不同的值