python - 如何使用 Pandas Groupby 加速 Dash 应用程序
问题描述
我的 Dash 应用程序运行,但是我有一个大约 10k 行的数据框,要重新加载我使用的绘图和数据表,我需要重新运行groupby
需要很长时间才能加载的语句。用户可以选择左侧的过滤器来更新仪表板。在仪表板中,我需要每个城市的分组客户数量。因为我有 adcc.Graph
和 adash_table
我基本上两次引用相同的底层数据帧,因此更新它两次,我认为这是非常低效的。
有没有办法让数据帧只更新一次,然后将结果一次发送到dcc.Graph
和dash_table
?另外,还有其他方法可以加快应用程序的速度吗?
import pandas as pd
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_table
df = pd.DataFrame.from_dict({'Customer': [111, 222, 555, 666],
'zip_city': ['Aguadilla', 'Aguadilla', 'Arecibo', 'Wrangell'],
'zip_latitude':[18.498987, 18.498987, 18.449732,56.409507],
'zip_longitude':[-67.13699,-67.13699,-66.69879,-132.33822],
'Gender':['m','f','m','f']})
df["CustomerCount"] = df.groupby(["zip_city"], as_index=False)["Customer"].transform("count")
gender_options = []
for gender in df['Gender'].unique():
gender_options.append({'label':str(gender),
'value':gender})
app = dash.Dash()
app.css.append_css({'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'})
app.layout = html.Div([html.H1('A Dashboard', style={'textAlign':'center'}),
html.Div(children=[
html.H1('Input', style={'textAlign':'center'}),
html.H6('Gender'),
html.P(
dcc.Checklist(id='gender-picker',
options=gender_options,
values=['m','f']
)
)
],
style = {'float':'left'},
className = "two columns"
),
html.Div([dcc.Tabs(children=[dcc.Tab(label='Map',
children=html.Div([
dcc.Graph(id='CustomerMap')
])
),
dcc.Tab(label='Data',
children=[html.Div([dash_table.DataTable(
id='table',
columns = [{"name": i, "id": i} for i in df.columns],
data = df.to_dict("rows")
)])
]
)
]
)
])
]
)
@app.callback(
dash.dependencies.Output('CustomerMap', 'figure'),
[dash.dependencies.Input('gender-picker', 'values')])
def update_figure(selected_gender):
filtered_df = df[df['Gender'].isin(selected_gender)]
filtered_df["CustomerCount"] = filtered_df.groupby(["zip_city"], as_index=False)["Customer"].transform("count")
customerCount = filtered_df['CustomerCount'].tolist()
zipcity = filtered_df['zip_city'].tolist()
hovertext = []
for i in range(len(customerCount)):
k = str(zipcity[i]) + ':' + str(customerCount[i])
hovertext.append(k)
return {'data':[dict(
type = 'scattergeo',
locationmode = 'USA-states',
lon = filtered_df['zip_longitude'],
lat = filtered_df['zip_latitude'],
text = hovertext,
hoverinfo = 'text',
marker = dict(
size = filtered_df['CustomerCount'],
line = dict(width=0.5, color='rgb(40,40,40)'),
sizemode = 'area'
),
transforms = [dict(
type = 'aggregate',
groups = filtered_df['zip_city'],
aggregations = [dict(target = filtered_df['Customer'], func = 'count', enabled = True)]
)
]
)
]
}
@app.callback(
dash.dependencies.Output('table', 'data'),
[dash.dependencies.Input('gender-picker', 'values')])
def update_table(selected_gender):
filtered_df = df[df['Gender'].isin(selected_gender)]
filtered_df["CustomerCount"] = filtered_df.groupby(["zip_city"], as_index=False)["Customer"].transform("count")
return filtered_df.to_dict("rows")
if __name__ == '__main__':
app.run_server()
解决方案
推荐阅读
- react-native - React Native 表单验证
- python - 函数 "say" 抛出 AttributeError: 'str' object has no attribute 'channel' 错误
- android - 更改 org.gradle.jvmargs 的值可修复“此问题可能是由守护程序配置不正确引起的”。错误
- python - 程序通过除最后一个以外的所有其他测试用例。如何修复最后一个测试用例?
- javascript - 如何创建只有 1 个系列数据的饼图,背景为圆形
- c++ - 地图允许重复?
- c# - 如何使用 C# .net 中的 HDF5DotNet 读取 HDF5 多维数组数据集?
- angle - 如何将四元数值转换为特定角度
- shell - 如何在不以spark开头的spark提交中传递参数
- websphere - websphere 应用程序服务器配置文件