首页 > 解决方案 > 根据 2 个 pandas 列的比较在 matplotlib 中绘制一条线

问题描述

这应该是一个简单的问题,但是数小时的搜索和阅读文档并没有帮助找到这个简单但古老的 pandas 的答案“如果一列中的值 < 另一列中的值,如何做某事。

我正在为 pandas 使用的语法和命名法而苦苦挣扎。这对我来说一点也不直观,以至于我什至无法在搜索时提出正确的问题或应用有用的标签。

我的项目是绘制一系列时间周期图,以便我可以直观地查看日历年中周期的开始和结束时间,以及它们是否按照以下示例代码溢出到另一个日历年:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

data = {'Period':['Q1','Q2','Q3','Q4'], 'Start':[338,78,190,273], 'End':[117,166,299,17]}

df = pd.DataFrame(data)

#print (df) # for testing

fig, ax = plt.subplots()

plt.xlim([-5,370])   # set x-axis limits to be number of days in calendar year with spacer of 5 days on either side for ease of viewing

ax.scatter(df['Start'],df['Period'],color = 'green', label = 'Cycle Start', marker = '|', s = 100, zorder = 2)  # plot 
ax.scatter(df['End'],df['Period'],color = 'red', label = 'Cycle End', marker = '|', s = 100, zorder = 2)

ax.vlines(0,-1,len(df['Period']),color='purple', label = 'Calendar Year Start / End',linewidth = 2, zorder = 1)   # put vertical line at Day 0 of calendar year
ax.vlines(365,-1,len(df['Period']),color='purple', linewidth = 2, zorder = 1) # put vertical line at Day 365 of calendar year


##
#   Need to execute one of the following code to draw horizontal line(s) for each period / row in dataframe, not both.
##

##  Option 1: Draw line between start and end points if start and end dates are in same calendar year (green marker to left of red).
ax.hlines(df['Period'], xmin=df['Start'], xmax=df['End'], color='blue', label = 'Cycle', linewidth = 2, zorder = 0) 

##  Option 2: Draw 2 lines if the start or end dates are not in the same calendar year (red marker to left of green).
ax.hlines(df['Period'], xmin=df['Start'], xmax=365, color='orange', linewidth = 2, zorder = 0)  # end date is in next calendar year
ax.hlines(df['Period'], xmin=0, xmax=df['End'], color='orange', linewidth = 2, zorder = 0)  # start date is in previous calendar year


ax.legend(ncol=2, loc = 'upper center')

## set the x axis to show the month names instead of day numbers
plt.xticks(np.linspace(0,365,13)[:-1], ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov', 'Dec'))

plt.show()

我不知道该怎么做是执行以下操作所需的简单熊猫条件魔术;

if df['Start'] < df['End']:
    'draw one line as shown above'
else:
    'draw 2 lines as shown above'

根据将一列的值与同一行中另一列的值进行比较,pandas 的语法是什么?

我是否需要使用 for 循环并分别绘制每条线,或者可以通过某种形式的 df.loc[df[... 或其他熊猫风格的比较语句来完成?

这应该很简单,但我看不到解决方案。

标签: pandasif-statementconditional-statements

解决方案


找到了一种有效的解决方案(不完美,但当奇迹发生时我会采取行动)。

使用 for 循环并逐行绘制正确的线。

唯一的问题是它每次绘制蓝线时都会向图例添加“循环”标签(在同一日历年开始和结束),因此我必须从 hline 中删除该标签并添加单独的图例条目以保留任何具有多年数据的数据集的可读性。而且它不一定将标签放在我想要的图例中(希望它放在最后,所以循环开始和循环结束在同一列中。

随意提供输入。

这是完成我想做的完整代码:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

data = {'Period':['Q1','Q2','Q3','Q4'], 'Start':[338,78,190,273], 'End':[117,166,299,17]}

df = pd.DataFrame(data)

#print (df) # for testing

fig, ax = plt.subplots()

plt.xlim([-5,370])   # set x-axis limits to be number of days in calendar year with spacer of 5 days on either side for ease of viewing

ax.scatter(df['Start'],df['Period'],color = 'green', label = 'Cycle Start', marker = '|', s = 100, zorder = 2)  # plot 
ax.scatter(df['End'],df['Period'],color = 'red', label = 'Cycle End', marker = '|', s = 100, zorder = 2)

ax.vlines(0,-1,len(df['Period']),color='purple', label = 'Calendar Year Start / End',linewidth = 2, zorder = 1)   # put vertical line at Day 0 of calendar year
ax.vlines(365,-1,len(df['Period']),color='purple', linewidth = 2, zorder = 1) # put vertical line at Day 365 of calendar year


##
#   Draw horizontal line(s) to show the cycle for each period / row in dataframe.
##

for i in range(len(df)):

    ##  Option 1: Draw line between start and end points if start and end dates are in same calendar year (green marker to left of red).
    if df.loc[i]['Start'] < df.loc[i]['End']:
        ax.hlines(df.loc[i,'Period'], xmin=df.loc[i,'Start'], xmax=df.loc[i,'End'], color='blue', linewidth = 2, zorder = 0) 
         
    ##  Option 2: Draw 2 lines if the start or end dates are not in the same calendar year (red marker to left of green).
    else:
        ax.hlines(df.loc[i]['Period'], xmin=df.loc[i]['Start'], xmax=365, color='orange', linewidth = 2, zorder = 0)  # end date is in next calendar year
        ax.hlines(df.loc[i]['Period'], xmin=0, xmax=df.loc[i]['End'], color='orange', linewidth = 2, zorder = 0)  # start date is in previous calendar year

ax.plot([],[],linewidth=2, label='Cycle', color='blue')  # just for legend only. 

ax.legend(ncol=2, loc = 'upper center')

## set the x axis to show the month names instead of day numbers
plt.xticks(np.linspace(0,365,13)[:-1], ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov', 'Dec'))

plt.show()

有人知道代替 for 循环和 if-else 语句的更好或更类似于熊猫的方法吗?

我读到的每一篇文章都说这是通过数据框的“最糟糕”和最慢的方式,所以当我将它应用于数百家拥有数百个财务周期的公司时,这可能是一个问题。


推荐阅读