r - 通过另一个变量的加权值对一个变量进行排名?
问题描述
超级 R 初学者在这里。我试图通过另一列/变量的加权值来获得某个变量的排名。例如,我有一个如下所示的数据集:
State <- rep(c("MN", "MN", "OR", "OR", "ME", "ME", "CO", "CO", "HI", "HI"), each = 3)
PopA <- c("145", "215", "200", "300", "177", "155", "2013", "89", "102", "3451",
"565", "805", "204", "650", "975", "145", "2045", "789", "226", "398",
"763","346","987","1236","765","876","95","45","3457","4557")
PopB <- c("190", "7410", "523", "963", "1254", "235", "3140", "4041", "896", "7458",
"105", "40", "5673", "638", "1444", "673", "257", "4211", "869", "245",
"8545","8553","8853","234","635","963","3456","6754","234","2244")
inc1 <- c("55000", "67000", "34000", "17000", "135000", "98000", "54000", "55000", "102000", "170000",
"75000", "12000", "345000", "23000", "13000", "78000", "112000", "48000", "45000", "89000",
"10000", "12000", "16000", "23000", "98000", "96000", "34000", "65000", "59000", "39000" )
inc2 <- c("23000", "98000", "45000", "92000", "87000", "55000", "29000", "65000", "59000", "155000",
"65000", "23000", "95000", "134000", "76000", "69000", "45000", "95000", "230000", "125000",
"48000", "97000", "65000", "23000", "16000", "76000", "34500", "76000", "98000", "35000")
data <- data.frame(State, PopA, PopB, inc1, inc2)
我正在尝试获取 4 个名为Overall_rank1_PopA、Overall_rank2_PopB、Rank_by_state1_PopA 和Rank_by_state2_PopB 的新列。在这些列中,我想通过加权总体 A 和加权总体 B 获得整个数据集的 inc1 和 inc2 排名,然后还按状态分组。我想通过popA和popB的加权百分位数(加权分位数?)来做到这一点。
目前,我有:
ranking <- data %>%
arrange(inc1, inc2) %>%
mutate(overall_rank1 = rank(inc1, ties.method = "average"), overall_rank2 = rank(inc2, ties.method = "average"))
ranking2 <- ranking %>%
group_by(State)%>%
mutate(state_rank1 = rank(inc1, ties.method = "average"),
state_rank2 = rank(inc2, ties.method = "average"))
然而,这只给了我有序的、非加权的排名。
有谁知道如何做到这一点?
解决方案
步骤1:删除原始数据框中整数周围的所有引号(这些使它们充当字符,无法正确排名)
Step2:为加权人口增加创建新列
data %>% mutate(popAGrowth = inc1/PopA) %>% mutate(popBGrowth = inc2/PopB) -> data
Step3:按增长量对每一行进行排名(第一名是最高百分比增长)
data %>% mutate(popAGrowthRank = rank(-popAGrowth)) -> data
data %>% mutate(popBGrowthRank = rank(-popBGrowth)) -> data
Step4:根据“popAGrowth”和“popBGrowth”对每个状态进行排名
data %>% group_by(State) %>% mutate(stateRank1 = rank(-popAGrowth), stateRank2 = rank(-popBGrowth))
我希望这有帮助!(如果您想丢弃我制作的加权列,可以在另一个管道中使用“select()”)
推荐阅读
- alexa - Alexa 语音到基于 Python 的 AI 模型
- python - 在 httpGET api-request-method 中插入 excel 列的每一行作为参数。我需要该 excel 列的每一行的 json 文件
- magento - Magento 2 - 在页面加载时在图像上传器中显示保存的图像
- java - 如何将 STREAM_NOTIFICATION 上的StreamVolume 设置为值,给出 java.lang.SecurityException
- .htaccess - 重定向除部分 url 301 之外的所有 url
- ios - 为什么`UIImagePickerController`总是将视频文件转换为.MOV?
- c++ - 使用类变量作为类成员函数的默认参数
- android - Android SDK - 我们如何检测 Google Play 服务所需的应用程序中的库或代码
- excel - 为什么 excel.button 不显示在对象浏览器中?
- python-3.x - Django将messages.error消息添加到FormView