首页 > 解决方案 > SparkSQL 中的引用“column_name”不明确

问题描述

我是 Spark 和 SQL 的新手。我正在尝试使用 Spark Scala 执行 sql 查询。这是 SQL 查询:

SELECT 
        a.*
    FROM
    (   SELECT 
                a1.id_bu,
                a1.nama,
                a1.id_Bentuk_bu,
                a1.id_bentuk_usaha,
                a1.id_Jenis_bu,
                a1.id_Jenis_bu_kbli,
                a1.alamat,
                a1.kodepos,
                a1.telepon,
                a1.fax,
                a1.email,
                a1.website,
                a1.id_kabupaten,
                a1.id_propinsi,
                a1.npwp,
                a1.no_spt as modal_dasar,
                a1.log,
                a2.bu_nomor
            FROM 
                bu a1,
                bu_nomor a2
            where
                    a1.id_bu = a2.id_bu
                AND a1.id_propinsi = a2.id_propinsi 
    ) as a,
    
    ( SELECT 
            b.id_bu,
            b.id_sub_klasifikasi_kbli,
            b.kualifikasi_kbli,
            b.id_asosiasi_bu,
            b.propinsi,
            b.tgl_permohonan,
            c.tgl_habis
        FROM
            ( SELECT 
                    b1.id_bu,
                    b1.id_sub_klasifikasi_kbli,
                    b1.kualifikasi_kbli,
                    b1.id_asosiasi_bu,
                    b1.propinsi,
                    b1.tgl_permohonan
                FROM 
                    bu_registrasi_history_kbli b1
                WHERE 
                        b1.id_status = '4'
                    AND b1.tgl_proses < '2018-03-01' ) as b,
            ( SELECT 
                    c1.id_bu,
                    c1.id_klasifikasi,
                    c1.id_asosiasi_bu,
                    c1.tgl_habis
                FROM 
                    bu_sbu_kbli c1
                WHERE 
                    c1.tgl_habis >= '2018-03-01' ) as c
        WHERE 
                b.id_bu = c.id_bu
            AND SUBSTR( b.id_sub_klasifikasi_kbli, 1, 3) = c.id_klasifikasi
            AND b.id_asosiasi_bu = c.id_asosiasi_bu
    UNION all 
    SELECT 
            d.id_bu,
            d.id_sub_klasifikasi_kbli,
            d.kualifikasi_kbli,
            d.id_asosiasi_bu,
            d.propinsi,
            d.tgl_permohonan,
            e.tgl_habis
        FROM
            ( SELECT 
                    d1.id_bu,
                    d1.id_sub_klasifikasi_kbli,
                    d1.kualifikasi_kbli,
                    d1.id_asosiasi_bu,
                    d1.propinsi,
                    d1.tgl_permohonan
                FROM 
                    bu_registrasi_history_kbli_hapus d1
                WHERE 
                        d1.id_status='4'
                    AND d1.tgl_proses<'2018-03-01' ) as d,
            ( SELECT 
                    e1.id_bu,
                    e1.id_klasifikasi,
                    e1.id_asosiasi_bu,
                    e1.tgl_habis
                FROM 
                    bu_sbu_kbli_hapus e1
                WHERE
                    e1.tgl_habis >= '2018-03-01' ) as e
        WHERE 
                d.id_bu = e.id_bu
            AND SUBSTR( d.id_sub_klasifikasi_kbli, 1, 3) = e.id_klasifikasi
            AND d.id_asosiasi_bu = e.id_asosiasi_bu
        GROUP BY 
            id_bu,
            id_sub_klasifikasi_kbli
        ORDER BY 
            tgl_habis,
            tgl_permohonan DESC) x1
    WHERE 
        a.id_bu = x1.id_bu
    GROUP BY 
        x1.id_bu

我收到以下错误:

org.apache.spark.sql.AnalysisException: Reference 'id_bu' is ambiguous, could be: d.id_bu, e.id_bu.; line 81 pos 12
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)

我不确定错误是什么?是关于两个同名的列吗?如果我尝试按照倒数第二个 GroupBy 的错误中的建议使用 d.id_bu 和 d.id_sub_klasifikasi_kbli,它会说:

'd.`kualifikasi_kbli`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [id_bu#21], [id_bu#1, nama#2, id_Bentuk_bu#3, id_bentuk_usaha#4, id_Jenis_bu#5, id_Jenis_bu_kbli#6, alamat#7, kodepos#8, telepon#9, fax#10, email#11, website#12, id_kabupaten#13, id_propinsi#14, npwp#15, modal_dasar#0, log#17, bu_nomor#19]

知道我该如何解决这个问题吗?谢谢!

标签: mysqlsqlscalaapache-spark

解决方案


您必须在 group by 子句中指定表


推荐阅读