首页 > 解决方案 > 使用 geom_line 绘制多条线(基于分组)

问题描述

请帮助我,关于我尝试在 ggplot2 中使用 geom_line 绘制分组多行时遇到的问题。当我尝试根据一个变量/列(即:区域)对行进行分组时,就会出现问题。

GDP_time_series_analysis %>% 
  group_by(Region) %>% 
  ggplot()+geom_line(aes(Year, Total_GDP, group=Region, color=Region))

我提供的代码生成以下图表: 在此处输入图像描述 仅对于仅由一个县(另一个变量)组成的一个区域(紫线)是正确的图表,但对于具有更多县的其他 3 个区域则不是正确的图表。我想分组存在问题,我无法将其他 3 个区域分组为图表组(尽管如您所见,我确实在代码中使用了 group_by (Region))。

抱歉,如果这个问题不完全符合标准(这是我的第一个问题),谢谢。数据子集如下:

 structure(list(County = c("City of Zagreb", "City of Zagreb", 
"City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
"City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
 "City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
 "City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
 "Zagreb County", "Zagreb County", "Zagreb County", "Zagreb County", 
 "Zagreb County", "Zagreb County", "Zagreb County", "Zagreb County", 
 "Zagreb County", "Zagreb County", "Zagreb County", "Zagreb County"
  ), Region = c("Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", 
 "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", 
  "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "North        Croatia", 
 "North Croatia", "North Croatia", "North Croatia", "North Croatia", 
 "North Croatia", "North Croatia", "North Croatia", "North Croatia", 
  "North Croatia", "North Croatia", "North Croatia"), Year = c(2000, 
  2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
  2012, 2013, 2014, 2015, 2016, 2017, 2000, 2001, 2002, 2003, 2004, 
  2005, 2006, 2007, 2008, 2009, 2010, 2011), Population = c(771000, 
  771000, 772000, 772000, 775000, 776000, 778000, 780000, 783000, 
   785000, 788000, 790000, 792000, 795000, 798000, 8e+05, 802000, 
  803000, 296000, 296000, 299000, 302000, 305000, 307000, 310000, 
  312000, 314000, 315000, 317000, 317000), GDP_percap_EUR =        c(8975.53835599625, 
  10168.0040269207, 11091.6676199461, 12240.0345558531, 13421.0447587177, 
   15085.3049042075, 16647.4994908354, 18025.966664434, 19706.5391945802, 
   18534.1115208295, 19739.3466772558, 19408.6216726494,          18961.2735614516, 
   18546.0140474649, 18477.4378485715, 18994.6373722612, 19710.3754557913, 
  20849.7073006642, 4335.38213876616, 4307.23697694032, 5278.97949713334, 
  5459.93196849043, 5967.08989896781, 6687.19494658443, 6861.43232701965, 
  7759.05700432905, 8446.22608743048, 8086.60105100451, 7541.08792074132, 
  7667.23597749996), GDP_percap_PPP_EU_100 = c(80.0982702062271, 
  82.6988344044675, 85.4138484640405, 91.204873884138, 93.9216165828703, 
  99.0724656137407, 104.305150969215, 107.963791825045, 111.305636873515, 
  109.91689646398, 111.438020798517, 110.735014385039, 110.140140004045, 
  107.718076160351, 105.910224718338, 106.327225119802, 107.021331220602, 
  108.151130040081, 38.6892235568413, 35.0317994125204, 40.6519533638096, 
  40.6839052888146, 41.7582043486098, 43.9180311969089, 42.9904043624586, 
  46.4716944599064, 47.7056151035234, 47.9577394076775, 42.5730357896448, 
  43.7450685876577), Total_GDP = c(6920140072.47311, 7839531104.75587, 
  8562767402.59836, 9449306677.11856, 10401309688.0062, 11706196605.665, 
  12951754603.8699, 14060253998.2585, 15430220189.3563, 14549277543.8512, 
  15554605181.6776, 15332811121.393, 15017328660.6697, 14744081167.7346, 
   14744995403.16, 15195709897.809, 15807721115.5446, 16742314962.4333, 
  1283273113.07478, 1274942145.17433, 1578414869.64287, 1648899454.48411, 
  1819962419.18518, 2052968848.60142, 2127044021.37609, 2420825785.35066, 
  2652114991.45317, 2547279331.06642, 2390524870.875, 2430513804.86749
   )), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"
    ))

标签: rggplot2

解决方案


问题是,您的数据位于县级,但您将其绘制在区域上(粒度较小)。如果您尝试按照您的方式直接绘制数据,您最终会得到每组多个值。您必须应用汇总统计数据才能获得一些有意义的结果。

这是一个使用一些虚拟数据的小插图:

df <- tibble(County = rep(c("Krapina-Zagorje", "Varaždin","Zagreb"), each = 3),
         Region = rep(c("North Croatia","North Croatia","Zagreb"), each = 3),
         Year = rep(2015:2017,3),
         GDP = 1:9)
ggplot(df, aes(x = Year, y = GDP, colour =Region, group = Region)) +  geom_line() + geom_point()

在此处输入图像描述

由于您每组只需要一个值,因此您必须相应地总结您的数据(我假设您对每组的总和感兴趣):

ggplot(df, aes(x = Year, y = GDP, colour =Region, group = Region)) + stat_summary(fun = sum, geom = "line")

在此处输入图像描述


推荐阅读