r - Aggregate rows with conditions, sum quantitative values, and keep character values of the max
问题描述
I have a sf object of points within same polygons. I want to aggregate all info by polygon Id and keep only one point per polygon : summing quantitative values and keeping character values of the max population with a fcode starting with "P".
I don't know how I can do that.
Here, an exemple with a sample of my sf (6 points in one polygon)
#My sf with 6 points in the same polygon (Poly.id =1), so I want to aggregate by Poly.id
str(Paris)
Classes ‘sf’, ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 15 variables:
$ geonameId : chr "2988507" "9844895" "2988623" "9843563" ...
$ asciiName : chr "Paris" "Paris 01 Ancien - Quartier Champs-Elysees" "Paris 05 Pantheon" "Paris 09 Ancien - Quartier Hotel-de-Ville" ...
$ members_etmun : num 30 0 0 0 0 0
$ members_urbact : num 3 0 0 0 0 0
$ participations_eucicop: num 175 1 9 13 1 6
$ partners_eucicop : num 151 1 7 8 1 4
$ projects_eucicop : num 156 1 8 13 1 6
$ fcodeName : chr "capital of a political entity" "historical populated place" "populated place" "historical populated place" ...
$ fcode : chr "PPLC" "PPLH" "PPL" "PPLH" ...
$ countryCode : chr "FR" "FR" "FR" "FR" ...
$ continentCode : chr "EU" "EU" "EU" "EU" ...
$ population : chr "2138551" "0" "0" "0" ...
$ CityId : int 2 3 4 5 6 7
$ Poly.id : int 1 1 1 1 1 1
$ geometry :sfc_POINT of length 6; first list element: 'XY' num 3760488 2889158
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr "geonameId" "asciiName" "members_etmun" "members_urbact" ...
Here, the output I want (Get all character variables of the first row, because it's the max of "population" and the fcode starts with "P", then sum variables(3:7)). So something like that :
str(Paris)
Classes ‘sf’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 15 variables:
$ geonameId : chr "2988507"
$ asciiName : chr "Paris"
$ members_etmun : num 30
$ members_urbact : num 3
$ participations_eucicop: num 205
$ partners_eucicop : num 172
$ projects_eucicop : num 185
$ fcodeName : chr "capital of a political entity"
$ fcode : chr "PPLC"
$ countryCode : chr "FR"
$ continentCode : chr "EU"
$ population : chr "2138551"
$ CityId : int 2
$ Poly.id : int 1
$ geometry :sfc_POINT of length 1; first list element: 'XY' num 3760488 2889158
...
I've tried several things with dplyr or the function aggregate but I don't know how to implement conditions (sum of some quantitative variables, and conditional selection of qualitative values)
Below : the info for the reprex
Paris <-
structure(
list(
geonameId = c(
"2988507",
"9844895",
"2988623",
"9843563",
"9843589",
"2970479"
),
asciiName = c(
"Paris",
"Paris 01 Ancien - Quartier Champs-Elysees",
"Paris 05 Pantheon",
"Paris 09 Ancien - Quartier Hotel-de-Ville",
"Paris 10 Ancien - Quartier Invalides",
"Paris 15 Vaugirard"
),
members_etmun = c(30, 0, 0, 0, 0, 0),
members_urbact = c(3,
0, 0, 0, 0, 0),
participations_eucicop = c(175, 1, 9, 13,
1, 6),
partners_eucicop = c(151, 1, 7, 8, 1, 4),
projects_eucicop = c(156,
1, 8, 13, 1, 6),
fcodeName = c(
"capital of a political entity",
"historical populated place",
"populated place",
"historical populated place",
"historical populated place",
"populated place"
),
fcode = c("PPLC",
"PPLH", "PPL", "PPLH", "PPLH", "PPL"),
countryCode = c("FR",
"FR", "FR", "FR", "FR", "FR"),
continentCode = c("EU", "EU",
"EU", "EU", "EU", "EU"),
population = c("2138551", "0", "0",
"0", "0", "0"),
CityId = 2:7,
Poly.id = c(1L, 1L, 1L, 1L,
1L, 1L),
geometry = structure(
list(
structure(
c(3760487.79387524,
2889157.59339236),
class = c("XY", "POINT", "sfg")
),
structure(
c(3757309.96074126,
2891227.55260787),
class = c("XY", "POINT", "sfg")
),
structure(
c(3760266.48357105,
2888217.53980758),
class = c("XY", "POINT", "sfg")
),
structure(
c(3760964.29140438,
2889376.36175204),
class = c("XY", "POINT", "sfg")
),
structure(
c(3757255.00710537,
2889909.86397746),
class = c("XY", "POINT", "sfg")
),
structure(
c(3756810.81051837,
2888176.61058656),
class = c("XY", "POINT", "sfg")
)
),
class = c("sfc_POINT",
"sfc"),
precision = 0,
bbox = structure(
c(
xmin = 3756810.81051837,
ymin = 2888176.61058656,
xmax = 3760964.29140438,
ymax = 2891227.55260787
),
class = "bbox"
),
crs = structure(
list(epsg = 3035L, proj4string = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"),
class = "crs"
),
n_empty = 0L
)
),
row.names = c(NA,
6L),
sf_column = "geometry",
agr = structure(
c(
geonameId = NA_integer_,
asciiName = NA_integer_,
members_etmun = NA_integer_,
members_urbact = NA_integer_,
participations_eucicop = NA_integer_,
partners_eucicop = NA_integer_,
projects_eucicop = NA_integer_,
fcodeName = NA_integer_,
fcode = NA_integer_,
countryCode = NA_integer_,
continentCode = NA_integer_,
population = NA_integer_,
CityId = NA_integer_,
Poly.id = NA_integer_
),
class = "factor",
.Label = c("constant",
"aggregate", "identity")
),
class = c("sf", "tbl_df", "tbl", "data.frame")
)
What kind of function do I have to use in order to do something like this ? Thank you in advance
解决方案
There is probably an easier way to do this but...
You could use by
to create the row of the dataframe for each polygon id, then rbind
to bind them back together into a single dataframe.
I don't fully understand your 'keeping character values of the max population with an fcode starting with p' but you can adapt the code below if I haven't quite got it right:
## First change the population to a numeric:
Paris$population <- as.numeric(Paris$population)
# This will apply the function in FUN to each subset of d defined by a unique poly.id
polygons <- by(data=Paris, INDICES = list(Paris$Poly.id) , FUN = function(d){
# Identify all the rows with fcode that starts with P
startswithP <- d[substr(d$fcode,1,1)=="P",]
# Now get the row with the highest population from those rows
# There might be more than one of these so just take the first row.
maxpop <- startswithP[startswithP$population == max(startswithP$population),][1,]
# Now construct the row of my dataframe.
# First sum all the numeric rows of d.
# Then identify all the character rows of maxpop
data.frame( as.list(apply(d[sapply(d, is.numeric)],2,sum)),
maxpop[sapply(maxpop, is.character)])
})
# Now I have a list of one-row dataframes I can rbind them together.
Paris.perpolygon <- do.call(rbind, polygons)
Paris.perpolygon
推荐阅读
- vhdl - VHDL fullAdder 模拟?
- woocommerce - 在结帐时移动 WooCommerce 密码字段
- tensorflow - 使用对象检测 tensorflow API 在录制的视频中进行预测
- javascript - 无法在 Angular 中使用子组件的选择器
- electron - Electron:如何检测应用程序是否处于未聚焦/后台
- html - 如何使用 Bulma 完成像 Google Keep 这样的砌体布局?
- javascript - 如何将 JUPYTERHUB 嵌入 iframe?
- python - 在熊猫数据框中展平嵌套的 Json
- android - 显示/隐藏 Android 覆盖问题
- sql - 我可以使用 sql 获得唯一的观看时间吗