首页 > 解决方案 > 曲线末端的标签 (matplotlib-seaborn)

问题描述

我有多个这种格式的数据框:

year    count   cum_sum
2001    5   5
2002    15  20
2003    14  34
2004    21  55
2005    44  99
2006    37  136
2007    55  191
2008    69  260
2009    133 393
2010    94  487
2011    133 620
2012    141 761
2013    206 967
2014    243 1210
2015    336 1546
2016    278 1824
2017    285 2109
2018    178 2287

我生成了如下图: 在此处输入图像描述

为此目的使用了以下代码:

fig, ax = plt.subplots(figsize=(12,8))

sns.pointplot(x="year", y="cum_sum", data=china_papers_by_year_sorted, color='red')
sns.pointplot(x="year", y="cum_sum", data=usa_papers_by_year_sorted, color='blue')
sns.pointplot(x="year", y="cum_sum", data=korea_papers_by_year_sorted, color='lightblue')
sns.pointplot(x="year", y="cum_sum", data=japan_papers_by_year_sorted, color='yellow')
sns.pointplot(x="year", y="cum_sum", data=brazil_papers_by_year_sorted, color='green')

ax.set_ylim([0,2000])
ax.set_ylabel("Cumulative frequency")

fig.text(x = 0.91, y = 0.76, s = "China", color = "red", weight = "bold") #Here I have had to indicate manually x and y coordinates
fig.text(x = 0.91, y = 0.72, s = "South Korea", color = "lightblue", weight = "bold") #Here I have had to indicate manually x and y coordinates

plt.show()

问题是向绘图添加文本的方法无法识别数据坐标。因此,我不得不手动指示每个数据框标签的坐标(请参阅“中国”和“韩国”)。有没有聪明的方法呢?我看过一个使用“.last_valid_index()”方法的例子。但是,由于无法识别数据坐标,因此无法正常工作。

标签: pythonmatplotlibseaborn

解决方案


您无需pointplot手动重复调用和添加标签。而是country在您的数据框中添加一列以指示国家/地区,组合数据框,然后使用国家作为hue.

相反,请执行以下操作:

# Add a country label to dataframe itself
china_papers_by_year_sorted['country'] = 'China'
usa_papers_by_year_sorted['country'] = 'USA'
korea_papers_by_year_sorted['country'] = 'Korea'
japan_papers_by_year_sorted['country'] = 'Japan'
brazil_papers_by_year_sorted['country'] = 'Brazil'

# List of dataframes with same columns
frames = [china_papers_by_year_sorted, usa_papers_by_year_sorted,
          korea_papers_by_year_sorted, japan_papers_by_year_sorted,
          brazil_papers_by_year_sorted]

# Combine into one dataframe
result = pd.concat(frames)

# Plot.. hue will make country name a label
ax = sns.pointplot(x="year", y="cum_sum", hue="country", data=result)
ax.set_ylim([0,2000])
ax.set_ylabel("Cumulative frequency")
plt.show()

编辑:编辑添加,如果你想注释行本身而不是使用图例,这个现有问题的答案表明如何注释行尾。


推荐阅读