首页 > 解决方案 > 如何使用 Bokeh 获取 pandas 时间序列数据框的条形图?

问题描述

我正在尝试获取时间序列数据的条形图,类似于以下示例:

from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource,FactorRange
from bokeh.palettes import Spectral6
from bokeh.plotting import figure

output_file("bars.html")

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
    '2015'   : [2, 1, 4, 3, 2, 4],
    '2016'   : [5, 3, 3, 2, 4, 6],
    '2017'   : [3, 2, 4, 4, 5, 3]}

# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", 
"2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an 
hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year",
       toolbar_location=None, tools="")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)        

这是我的数据:

import pandas as pd
import numpy as np

dates = pd.date_range('20190101', periods=100)

dfr = pd.DataFrame(np.random.randn(100, 6), index=dates, 
columns=list('ABCDEF'))

dfr=dfr.resample('M').sum() 

我无法弄清楚如何将 dfr 转换为字典,以便获得类似于工作示例的条形图。提前致谢。请提出前进的方向。

标签: pythonpandasdataframebokeh

解决方案


stack您需要通过for重塑 DataFrame Series,然后将第一级转换MultiIndex为格式字符串YYYY-MM-DD并传递给字典:

output_file("bars.html")

dates = pd.date_range('20190101', periods=100)

dfr = pd.DataFrame(np.random.randn(100, 6), index=dates, columns=list('ABCDEF'))

s = dfr.resample('M').sum().stack()
s.index = [s.index.get_level_values(0).strftime('%Y-%m-%d'),
           s.index.get_level_values(1)]

x = s.index.values
print (x)
[('2019-01-31', 'A') ('2019-01-31', 'B') ('2019-01-31', 'C')
 ('2019-01-31', 'D') ('2019-01-31', 'E') ('2019-01-31', 'F')
 ('2019-02-28', 'A') ('2019-02-28', 'B') ('2019-02-28', 'C')
 ('2019-02-28', 'D') ('2019-02-28', 'E') ('2019-02-28', 'F')
 ('2019-03-31', 'A') ('2019-03-31', 'B') ('2019-03-31', 'C')
 ('2019-03-31', 'D') ('2019-03-31', 'E') ('2019-03-31', 'F')
 ('2019-04-30', 'A') ('2019-04-30', 'B') ('2019-04-30', 'C')
 ('2019-04-30', 'D') ('2019-04-30', 'E') ('2019-04-30', 'F')]

counts = s.values
print (counts)
[ 5.8759305  -7.52857928  2.74794675  9.91942791  1.49860961  0.16046735
  0.15459667  3.86407105  0.79097565 -2.65899131  1.86548175  1.41251127
 -3.67053891 13.90439142  2.80744458  2.51583516 -2.37587758  4.49826959
 -0.7661524  -6.22533991  5.90391326  4.40654035  1.93598738  2.49407506]

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), plot_height=250, title="Sums by Months",
       toolbar_location=None, tools="")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)        

推荐阅读