首页 > 解决方案 > 计算R中具有相似值的行的平均值

问题描述

我正在使用一个关于 22 个月内英国地区失业率的数据集。

我将原始数据集一分为二:一个包含失业率较高的区域 ( df1 ),另一个包含失业率较低的区域 ( df2 )。

两者所需的输出相同,因此我将发布df1的结构:

df1目前包括五个地区的每月失业率:

我想计算每个月每个地区的平均失业率(即 1 月 19 日、2 月 20 日(一直到 10 月 20 日)的东北部、伦敦(等)的平均失业率。

关键是,一旦我将所有地区的平均失业率汇总为一个,我就可以有一个图而不是五个不同的图。

预期输出:

Date | Region | Unemployment rate
01-2019 | ABC | AJan_19+B_Jan19+C_Jan19 / 3
02-2019 | ABC | AFeb_19+B_Feb19+C_Feb19 / 3
03-2019 | ABC | AMar_19+B_Feb19+C_Feb19 / 3

等等

因此,我不是每个月有 5 个值(即每个区域一个值),而是将区域的值相加,然后将它们除以每个月的区域数。

这是df1的结构

structure(list(
Date = structure(c(17897, 17897, 17897, 17897, 
  17897, 17928, 17928, 17928, 17928, 17928, 17956, 17956, 17956, 
  17956, 17956, 17987, 17987, 17987, 17987, 17987, 18017, 18017, 
  18017, 18017, 18017, 18048, 18048, 18048, 18048, 18048, 18078, 
  18078, 18078, 18078, 18078, 18109, 18109, 18109, 18109, 18109, 
  18140, 18140, 18140, 18140, 18140, 18170, 18170, 18170, 18170, 
  18170, 18201, 18201, 18201, 18201, 18201, 18231, 18231, 18231, 
  18231, 18231, 18262, 18262, 18262, 18262, 18262, 18293, 18293, 
  18293, 18293, 18293, 18322, 18322, 18322, 18322, 18322, 18353, 
  18353, 18353, 18353, 18353, 18383, 18383, 18383, 18383, 18383, 
  18414, 18414, 18414, 18414, 18414, 18444, 18444, 18444, 18444, 
  18444, 18475, 18475, 18475, 18475, 18475, 18506, 18506, 18506, 
  18506, 18506, 18536, 18536, 18536, 18536, 18536), class = "Date"), 
Region = structure(c(4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 
  9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L), 
.Label = c("England", 
    "South East", "South West", "London", "East of England", 
    "East Midlands", "West Midlands", "Yorkshire and The Humber", 
    "North East", "North West"), class = "factor"), 
Unemployment.rate = c(4.2102766429572, 
    4.68247349426148, 5.0708122696351, 5.23113585152962, 5.05625777763551, 
    4.45850956493638, 4.24086209425895, 5.20425572086481, 4.90649662696461, 
    5.58119346747183, 4.36960549219723, 4.02517515965457, 5.07463979478007, 
    4.74861899849302, 5.41295614949722, 4.2765275404374, 4.29397104451947, 
    4.95863831882363, 4.92741739593892, 5.69156027694963, 4.2650375361128, 
    4.23454968410189, 4.79139912788739, 5.02305883708418, 5.5878529496241, 
    4.54049887070026, 4.28118824655063, 4.56621383409869, 5.02948552097342, 
    5.34849310422496, 4.63523851140925, 4.63665149464923, 4.15610221124255, 
    4.28827168334814, 4.97071907922267, 4.63148007856079, 4.50379542173275, 
    3.98279027057451, 4.00981283870947, 5.80674097480643, 4.5449089097835, 
    4.46358064141772, 4.09111105457073, 3.90122545742185, 5.85180583091048, 
    4.50615604436695, 3.65653388653173, 4.4653881330391, 4.08974888999112, 
    6.11361138828401, 4.31177130663949, 3.86911315140672, 4.31748261760943, 
    4.34062792253313, 6.21086689536757, 4.28854311714984, 3.58533538113168, 
    4.43826006085208, 4.47398990035041, 6.11583334445995, 4.4614986334698, 
    3.93320874039025, 4.50210360585639, 4.58329815843159, 6.1811363458787, 
    4.4993016103369, 4.02503140646339, 4.81764323428107, 4.71840892982655, 
    5.61192961811575, 4.66797282030472, 3.76788548732822, 5.02382063022771, 
    4.27033347501753, 5.40098295976569, 4.63121679655635, 3.67161258712684, 
    4.80322174913054, 3.91339590231661, 5.20229523339659, 5.10845457998552, 
    3.97182605242641, 4.85515814694348, 3.78242013517353, 4.97115704468143, 
    4.6437916194869, 4.3194319371037, 4.41226516242903, 3.75797094178592, 
    5.16820059074221, 4.98077486925899, 4.38753537321373, 4.37107017836121, 
    3.98499236263049, 5.15965087736712, 5.2511686249283, 4.39271393019063, 
    4.62628095567074, 4.16298001615593, 6.62714213785116, 5.95104220347072, 
    4.89588411607636, 4.9378241924801, 4.65307341597827, 6.67088507450695, 
    6.33714099073375, 5.32040137455687, 5.402969264185, 5.15177120913334, 
    6.56889233919367)), 
    row.names = c(NA, -110L), class = c("tbl_df", "tbl", "data.frame"))

标签: rmean

解决方案


使用base R

#Code
df1$Date <- format(df1$Date,'%b-%Y')
#Aggregate
out <- aggregate(Unemployment.rate~.,data=df1,mean,na.rm=T)

输出:

head(out)
      Date Region Unemployment.rate
1 Apr-2019 London          4.276528
2 Apr-2020 London          4.631217
3 Aug-2019 London          4.631480
4 Aug-2020 London          5.251169
5 Dec-2019 London          4.288543
6 Feb-2019 London          4.458510

每月另一种选择:

#Code
df1$Date <- format(df1$Date,'%b')
#Aggregate
out <- aggregate(Unemployment.rate~.,data=df1,mean,na.rm=T)

推荐阅读