python - 在 pyspark sql 中查找两个时间戳之间的差异
问题描述
cal_avg_latency = spark.sql("SELECT UnitType, ROUND(AVG(TIMESTAMP_DIFF(OnSceneDtTmTS, ReceivedDtTmTS, MINUTE)), 2) as latency, count(*) as total_count FROM `SFSC_Incident_Census_view` WHERE EXTRACT(DATE from ReceivedDtTmTS) == EXTRACT(DATE from OnSceneDtTmTS) GROUP BY UnitType ORDER BY latency ASC")
错误:
ParseException: "\nmismatched input 'FROM' expecting <EOF>(line 1, pos 122)\n\n== SQL ==\nSELECT UnitType, ROUND(AVG(TIMESTAMP_DIFF(OnSceneDtTmTS, ReceivedDtTmTS, MINUTE)), 2) as latency, count(*) as total_count FROM SFSC_Incident_Census_view WHERE EXTRACT((DATE FROM ReceivedDtTmTS) == EXTRACT(DATE FROM OnSceneDtTmTS)) GROUP BY UnitType ORDER BY latency ASC\n--------------------------------------------------------------------------------------------------------------------------^^^\n"
错误在 WHERE 条件下,但即使我的 TIMESTAMP_DIFF 函数也不起作用
cal_avg_latency = spark.sql("SELECT UnitType, ROUND(AVG(TIMESTAMP_DIFF(OnSceneDtTmTS, ReceivedDtTmTS, MINUTE)), 2) as latency, count(*) as total_count FROM SFSC_Incident_Census_view GROUP BY UnitType ORDER BY latency ASC")
错误 :
AnalysisException: "Undefined function: 'TIMESTAMP_DIFF'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 27"
解决方案
错误消息似乎很清楚。Hive 没有TIMESTAMP_DIFF
功能。
如果您的列已经被适当地转换为一种timestamp
类型,您可以直接减去它们。否则,您可以显式地转换它们,并采取不同的方式:
SELECT ROUND(AVG(MINUTE(CAST(OnSceneDtTmTS AS timestamp) - CAST(ReceivedDtTmTS AS timestamp))), 2) AS latency
推荐阅读
- fortran - 复制通信器的 Fortran MPI 问题
- python - 比较两张图片,判断里面是否有相同的物体
- android - 如何在不点击按钮的情况下直接调用活动?
- java - 在java中读取具有不同行长的CSV文件
- javascript - 响应式轮播 React material-ui
- azure - Azure REST 服务成本
- r - 收到错误消息“checkFunc(Func2, times, y, rho) 中的错误:模型函数必须返回一个列表”
- daml - 向 Navigator 添加一个按钮以进行选择
- macos - 表情符号打破纳米光标
- java - 如何验证您连接的 Redis 实例是否使用加密?