google-bigquery - How can I use UDF (and other functions) when running a bigquery query using DataFlow Engine?
问题描述
I'm trying to run a query in bigquery using cloud dataflow engine. The query uses a custom function (Levenshtein Distance) written in Javascript. I'm also experimenting the same issue when using other functions like ST_GeogPoint
or ARRAY_AGG
.
I'm getting this error Function not found: ST_GeogPoint
. If I delete the column that corresponds to the function, I get the same error with LevenshteinDistance
, then ARRAY_AGG
, and so on.
The query looks like this:
WITH
directory AS(
SELECT
TRIM(dir) AS street,
lat,
lon
FROM
bigquery.table.`project-id`.`dataset-name`.`table-name_1`),
cruza AS (
SELECT
name,
TRIM(p.dir) AS dir,
TRIM(directory.dir) AS street,
directory.lat AS lat,
directory.lon AS lon,
ST_GeogPoint(lat,lon) AS latlon,
CAST(FLOOR(DATE_DIFF(CURRENT_DATE(),birth_day,DAY)/362.25) AS int64) AS age,
dataset-name.LevenshteinDistance(TRIM(dir),TRIM(directory.dir)) AS lv_score
FROM
bigquery.table.`project-id`.`dataset-name`.`table-name_2` AS p,
directory
WHERE
p.com = 'my_com' and name is not null)
SELECT
AS value ARRAY_AGG(c ORDER BY lv_score LIMIT 1)[OFFSET(0)] AS col
FROM
cruza c
WHERE
lv_score <= 10
GROUP BY
dir
ORDER BY
col.lv_score
How can I use this functions?
解决方案
我不认为你将能够。Dataflow SQL 使用 ZetaSQL 的一个变体,即使它只支持一个子集。以下是支持的功能:
https://cloud.google.com/dataflow/docs/reference/sql
ZetaSQL 本身确实有一个ARRAY_AGG
功能,但它似乎在 Dataflow SQL 中尚不支持。
https://github.com/google/zetasql
另外,这里 Dataflow 引擎的用例是什么,通常您会使用它来访问直接查询 pubsub 订阅以进行流式分析。
推荐阅读
- c++ - C++ 内联字符串插值
- html - 树形结构父元素
- python - 如何在绘图时调整 xticks 标签
- amazon-web-services - DynamoDB - UUID 并避免全表扫描
- reactjs - 找不到模块的声明文件,@types/...@latest' 不在 npm 注册表中
- python - 带有熊猫的加权平均数据框
- php - 如何在表单构建器中使用带有选择标签的查询构建器?
- excel - File.content 的 Power Query 通配符
- r - 使用 plot 和 mfrow 函数在 r 中保存多个图
- arrays - 从数组复制到另一个大小不确定的数组