首页 > 解决方案 > 如何使用 Pandas Groupby 加速 Dash 应用程序

问题描述

我的 Dash 应用程序运行,但是我有一个大约 10k 行的数据框,要重新加载我使用的绘图和数据表,我需要重新运行groupby需要很长时间才能加载的语句。用户可以选择左侧的过滤器来更新仪表板。在仪表板中,我需要每个城市的分组客户数量。因为我有 adcc.Graph和 adash_table我基本上两次引用相同的底层数据帧,因此更新它两次,我认为这是非常低效的。

有没有办法让数据帧只更新一次,然后将结果一次发送到dcc.Graphdash_table?另外,还有其他方法可以加快应用程序的速度吗?

import pandas as pd
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_table

df = pd.DataFrame.from_dict({'Customer': [111, 222, 555, 666],
        'zip_city': ['Aguadilla', 'Aguadilla', 'Arecibo', 'Wrangell'],
        'zip_latitude':[18.498987, 18.498987, 18.449732,56.409507],
        'zip_longitude':[-67.13699,-67.13699,-66.69879,-132.33822],
        'Gender':['m','f','m','f']})

df["CustomerCount"] = df.groupby(["zip_city"], as_index=False)["Customer"].transform("count")

gender_options = []
for gender in df['Gender'].unique():
    gender_options.append({'label':str(gender),
                           'value':gender})

app = dash.Dash()


app.css.append_css({'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'})

app.layout = html.Div([html.H1('A Dashboard', style={'textAlign':'center'}),
                       html.Div(children=[
                            html.H1('Input', style={'textAlign':'center'}),
                            html.H6('Gender'),
                            html.P(
                                    dcc.Checklist(id='gender-picker',
                                    options=gender_options,
                                    values=['m','f']
                                    )
                                )
                                ],
                               style = {'float':'left'},
                                className = "two columns"
                            ),
                        html.Div([dcc.Tabs(children=[dcc.Tab(label='Map',
                                                            children=html.Div([
                                                                    dcc.Graph(id='CustomerMap')
                                                                    ])
                                                            ),
                                                    dcc.Tab(label='Data',
                                                            children=[html.Div([dash_table.DataTable(
                                                                                id='table',
                                                                                columns = [{"name": i, "id": i} for i in df.columns],
                                                                                data = df.to_dict("rows")
                                                                                )])
                                                                    ]
                                                            )
                                                    ]
                                            )
                                ])
                        ]
                    )

@app.callback(
    dash.dependencies.Output('CustomerMap', 'figure'),
    [dash.dependencies.Input('gender-picker', 'values')])

def update_figure(selected_gender):    
    filtered_df = df[df['Gender'].isin(selected_gender)]
    filtered_df["CustomerCount"] = filtered_df.groupby(["zip_city"], as_index=False)["Customer"].transform("count")

    customerCount = filtered_df['CustomerCount'].tolist()
    zipcity = filtered_df['zip_city'].tolist()
    hovertext = []
    for i in range(len(customerCount)):
        k = str(zipcity[i]) + ':' + str(customerCount[i])
        hovertext.append(k)
    return {'data':[dict(
                        type = 'scattergeo',
                        locationmode = 'USA-states',
                        lon = filtered_df['zip_longitude'],
                        lat = filtered_df['zip_latitude'],
                        text = hovertext,
                        hoverinfo = 'text',
                        marker = dict(
                        size = filtered_df['CustomerCount'],
                        line = dict(width=0.5, color='rgb(40,40,40)'),
                        sizemode = 'area'
                        ),
                        transforms = [dict(
                        type = 'aggregate',
                        groups = filtered_df['zip_city'],
                        aggregations = [dict(target = filtered_df['Customer'], func = 'count', enabled = True)]
                                        )
                                        ]
                        )
                    ]
            }


@app.callback(
    dash.dependencies.Output('table', 'data'),
    [dash.dependencies.Input('gender-picker', 'values')])

def update_table(selected_gender):    
    filtered_df = df[df['Gender'].isin(selected_gender)]
    filtered_df["CustomerCount"] = filtered_df.groupby(["zip_city"], as_index=False)["Customer"].transform("count")
    return filtered_df.to_dict("rows")

if __name__ == '__main__':
    app.run_server()

标签: pythonpandasplotly-dash

解决方案


推荐阅读