r - Why does the stacked bar chart not match values in table?
问题描述
I have a dataset containing cities and prices for specific items in those cities for going out (e.g. taxi, drinks, dinner etc.) - dataset can be found here: https://data.world/makeovermonday/2018w48
I calculated the total sum to pay for a party night and a date night:
CostNightPrepared <- CostNight %>%
group_by(City, Category) %>%
mutate(TotalCost = sum(Cost, na.rm = TRUE))%>%
arrange(desc(Category), TotalCost)
To plot it out:
Visual <- ggplot(CostNightPrepared, aes(TotalCost, fct_rev(fct_reorder(City, TotalCost)), fill=Category)) +
geom_col(position = "stack") +
geom_text(aes(label = round(TotalCost, 1)), position = position_dodge(1))
it gives me the following output:
If you notice, for example, the last city, Zurich, has the value for 179 for "Party Night", however, the column reachers to around 800 on x axis! The same goes for all other columns - they do not match their values for both the "Date night" and "Party Night". What is the issue here?
If I do the same code, but using position = dodge for geom_col(), then it works:
Visual <- ggplot(CostNightPrepared, aes(TotalCost, fct_rev(fct_reorder(City, TotalCost)), fill=Category)) +
geom_col(position = "dodge") +
geom_text(aes(label = round(TotalCost, 1)), position = position_dodge(1))
Here's the output:
As you can see, the values match their correspond column sizes (lenghts) on the x axis.
so why, when using position = "dodge", my columns don't match the actual values in the dataset and have arbitrary values on the x axis?
解决方案
I think you want summarize
instead of mutate
. By using mutate
, you got the City/Category total for every row, and then fed each of those rows into ggplot2.
You really want one row for each City/Category combination, which is what summarize produces.
Reproducible example:
mtcars %>%
head() %>%
group_by(carb, gear) %>%
mutate(total_wt = sum(wt)) %>%
ungroup() -> mtcars_summary
#mtcars_summary
## A tibble: 6 x 12
# mpg cyl disp hp drat wt qsec vs am gear carb total_wt
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 5.50
#2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 5.50
#3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 2.32
#4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 6.68
#5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 3.44
#6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 6.68
Note above that mutate
gives each row its group's total weight. ggplot2::geom_col will then stack all the rows it receives, leading to longer bars than you wanted. (Another hint is that the text looks "overplotted" -- this is because every character is being printed once for each row in the group -- ie you might have ten copies of the same text on top of each other, leading to poor anti-aliasing appearance.)
ggplot(mtcars_summary, aes(total_wt,
carb %>% as_factor %>% fct_reorder(total_wt),
fill = as.character(gear))) +
geom_col(position = "stack") +
geom_text(aes(label = round(total_wt, 1)), position = position_dodge(1))
If we replace mutate
with summarize
, we get more what you expected, where the input going into the bars is not repeated for each element in your original data:
#mtcars_summary
## A tibble: 4 x 3
# carb gear total_wt
# <dbl> <dbl> <dbl>
#1 1 3 6.68
#2 1 4 2.32
#3 2 3 3.44
#4 4 4 5.50