python - 调用 TabPy SCRIPT_REAL 时出现 Tableau 错误“所有字段必须是聚合或常量”
问题描述
我通过 Tableau 工作表中的计算字段调用 TabPy 服务器以运行假设检验:预订率是否因组而异?
我有一张桌子,例如:
Group Bookings
0 A 1
1 A 0
3998 B 1
3999 B 0
在 Python 中,在同一台服务器上(使用 python 2.7 docker 映像),我想要的测试很简单:
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(df['Group'], df['Bookings'])
prop_test = fisher_exact(df_cont_tbl)
print 'Fisher exact test: Odds ratio = {:.2f}, p-value = {:.3f}'.format(*prop_test)
回报:Fisher exact test: Odds ratio = 1.21, p-value = 0.102
我将 Tableau 连接到 TabPy 服务器,并且可以执行一个 hello-world 计算字段。例如,我用计算字段返回 42:SCRIPT_REAL("return 42", ATTR([Group]),ATTR([Bookings]) )
但是,我尝试使用计算字段调用上面的 stats 函数来提取 p 值:
SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
", [Group], [Bookings] )
我收到通知:计算包含下拉错误当使用表计算函数或来自多个数据源的字段时,所有字段必须是聚合或常量
我尝试用 包装输入ATTR()
,如下所示:
SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
",ATTR([Group]), ATTR([Bookings])
)
它将通知更改为“计算有效”,但从服务器返回 Pandas ValueError:
An error occurred while communicating with the External Service.
Error processing script
Error when POST /evaluate: Traceback
Traceback (most recent call last):
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 467, in post
result = yield self.call_subprocess(function_to_evaluate, arguments)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 488, in call_subprocess
ret = yield future
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 400, in result
return self.__get_result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 359, in __get_result
reraise(self._exception, self._traceback)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_compat.py", line 107, in reraise
exec('raise exc_type, exc_value, traceback', {}, locals_)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/thread.py", line 61, in run
result = self.fn(*self.args, **self.kwargs)
File "<string>", line 5, in _user_script
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/tools/pivot.py", line 479, in crosstab
df = DataFrame(data)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 266, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5398, in _arrays_to_mgr
index = extract_index(arrays)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5437, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
Error type : ValueError
Error message : If using all scalar values, you must pass an index
示例数据集:
要生成我要连接的 CSV:
import os
import pandas as pd
import numpy as np
from collections import namedtuple
OUTPUT_LOC = os.path.expanduser('~/TabPy_demo/ab_test_demo_results.csv')
GroupObs = namedtuple('GroupObs', ['name','n','p'])
obs = [GroupObs('A',3000,.10),GroupObs('B',1000,.13)]
# note true odds ratio = (13/87)/(10/90) = 1.345
np.random.seed(2019)
df = pd.concat( [ pd.DataFrame({'Group': grp.name,
'Bookings': pd.Series(np.random.binomial(n=1,
p=grp.p, size=grp.n))
}) for grp in obs
],ignore_index=True )
df.to_csv(OUTPUT_LOC,index=False)
解决方案
老问题,但也许这会帮助别人。有几个问题。首先是与数据传递到pd.crosstab
. Tableau 将值列表传递给 tabpy 服务器,因此将其包装在一个数组中以修复您遇到的错误。
SCRIPT_REAL(
"
import pandas as pd
import numpy as np
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(np.array(_arg1), np.array(_arg2))
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
",
attr([Group]), attr([Bookings])
)
另一个问题是执行表计算的方式。您想发送 tabpy 两个信息列表,每个列表都与您的表格一样长。在默认情况下,tableau 想要在行级别进行计算,这将不起作用。
我将行数包含在F1
我构建工作簿的 csv 中,并确保沿此函数计算 python 值。
现在,当您将 F1 放入工作表时,它将返回 P 值的次数与行数一样多,解决方法是将您的计算包装在另一个计算中,仅在它是第一行时才返回该值并将其放入你的工作表。
现在您可以将其放入工作表中。
推荐阅读
- ios - 当应用程序在 ios 13 中的静默推送通知中终止时运行代码
- arrays - Perl 中的数组问题
- r - 如何用 R 为卡片的多边形的一部分着色?
- node.js - 在 Azure Web 应用程序中使用 html-pdf 节点包和 Microsoft.AspNetCore.NodeServices 创建 pdf 时出错
- c# - 如何在 C# 中简化 TreeView 列表的输出?
- git - 无法从放置在管道阶段的脚本运行 git clone
- mysql - 我想从子查询的结果中计算男性和女性的总数。有没有mysql查询
- node.js - fs.createWriteStream 重新启动在 PM2 上运行的应用程序
- java - 作业帮助 - Java 继承和 Getters/Setters
- asp.net-core - 如何在自定义组件 Blazor 上使用绑定值和绑定值:事件