首页 > 解决方案 > org.apache.spark.sql.AnalysisException:在 SELECT 子句之外不支持生成器,但得到:'Agg

问题描述

我正在使用 spark sql 调用 UDF,但是在从 udf 的 DF 中选择列时,我遇到了错误。不知道真正的原因是什么。请问有什么解决办法吗?

SQL:

select parsedData.raw_data_arr.priority, parsedData.raw_data_arr.status, parsedData.raw_data_arr.correlationtype, parsedData.raw_data_arr.source, max(parsedData.raw_data_arr.starttime), parsedData.raw_data_arr.uniquerefrence, parsedData.raw_data_arr.creationtime, parsedData.raw_data_arr.incidentname, parsedData.raw_data_arr.impactedentitycount, parsedData.raw_data_arr.serviceaffected, parsedData.raw_data_arr.category, parsedData.raw_data_arr.locationtype, parsedData.raw_data_arr.locationname, parsedData.raw_data_arr.technology, parsedData.raw_data_arr.domain, parsedData.raw_data_arr.vendor, parsedData.raw_data_arr.causename, parsedData.raw_data_arr.causecode, parsedData.raw_data_arr.alarmid, parsedData.raw_data_arr.classification, parsedData.raw_data_arr.configjson, parsedData.raw_data_arr.impactedentityname from (select explode(ParseIncidentData(fileName)) as raw_data_arr from KafkaInput group by impactedentityname,causename,causecode) parsedData

==================================================== =======================

例外:

19-08-21 20:05:06:559 [INFO] com.inn.sparkrunner.processor.audit.ProcessorAudit Inside @method ProcessorAudit.exceptionLog for processor [Repartition Files] com.inn.sparkrunner.processor.audit.ProcessorAudit.exceptionLog(ProcessorAudit.java:234)
19-08-21 20:05:06:559 [INFO] com.inn.sparkrunner.processors.transformation.ExecuteSparkSQL [Repartition Files] ignoreNullDataset :[false] , error : org.apache.spark.sql.AnalysisException: Generators are not supported outside the SELECT clause, but got: 'Aggregate ['impactedentityname, 'causename, 'causecode], [explode(UDF:ParseIncidentData(fileName#15)) AS raw_data_arr#23];
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$23.applyOrElse(Analyzer.scala:1649)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$23.applyOrElse(Analyzer.scala:1606)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

标签: scalaapache-sparkapache-spark-sqlspark-streaming

解决方案


推荐阅读