首页 > 解决方案 > 查找具有特定因子组最小值的行

问题描述

我正在尝试根据 state.region 变量从 state.x77 数据集中找到最低收入。

df1 <- data.frame(state.region,state.x77,row.names = state.name)
tapply(state.x77,state.region,min)

我试图让它输出 X 地区收入最低的州,例如南阿拉巴马州的收入最低。我试图使用tapply,但我一直收到错误消息

Error in tapply(state.x77, state.region, min) : 
  arguments must have same length

问题是什么?

标签: rdataframe

解决方案


这是一个解决方案。首先获取收入向量并将其作为命名向量。然后用于tapply获取最低收入的名称。

state <- setNames(state.x77[, "Income"], rownames(state.x77))
tapply(state, state.region, function(x) names(x)[which.min(x)])
#     Northeast          South  North Central           West 
#       "Maine"  "Mississippi" "South Dakota"   "New Mexico" 

以下更复杂的代码将输出州名、地区和收入。

df1 <- data.frame(
  State = rownames(state.x77),
  Income = state.x77[, "Income"],
  Region = state.region
)
merge(aggregate(Income ~ Region, df1, min), df1)[c(3, 1, 2)]
#         State        Region Income
#1 South Dakota North Central   4167
#2        Maine     Northeast   3694
#3  Mississippi         South   3098
#4   New Mexico          West   3601

还有另一种解决方案,aggregate但避免merge.

agg <- aggregate(Income ~ Region, df1, min)
i <- match(agg$Income, df1$Income)
data.frame(
  State = df1$State[i],
  Region = df1$Region[i],
  Income = df1$Income[i]
)
#         State        Region Income
#1        Maine     Northeast   3694
#2  Mississippi         South   3098
#3 South Dakota North Central   4167
#4   New Mexico          West   3601

推荐阅读