首页 > 解决方案 > SQL/Hive 查询以计算特定值每天的行数

问题描述

我目前正在编写一个 Python 脚本,该脚本使用查询从我们的 Hive 服务器中提取数据。我期望输出将根据“TxnCount”的输入过滤每天有 x 次或更多交易的卡号。

输入为:DateTime1、DateTime2、MerchantID、CardNum、terminalID 和 TxnCount。

我的代码(不工作):

Query = "SELECT TRIM(i002_number) as CardNum, i004_amt_trxn, TRIM(i042_merch_id) as MerchantID, i043a_merch_name, TRIM(i041_pos_id) as TerminalID, \
i049_cur_trxn, i062v2_trans_id, i003_proc_code, i006_amt_bill, i051_cur_bill, amt_card, cardcurrency, ltimestamp, \
i039_rsp_cd, i018_merch_type, i043b_merch_city, i043c_merch_cnt, i022_pos_entry, i032_acquirer_id, trxntype, reasoncode, \
SUBSTRING(i002_number, 1, 6) AS issuer_bin, COUNT(i002_number) as txncount\
CASE \
    WHEN SUBSTRING(i002_number,1,1) = 5 THEN 'MasterCard' \
    WHEN SUBSTRING(i002_number,1,1) = 4 THEN 'VISA' \
END AS source \
FROM tsys.ods_authorizations \
WHERE ltimestamp >= '"+DateTime1+"' AND ltimestamp <= '"+DateTime2+"' AND i042_merch_id = "+MerchantID+" \
AND i002_number = "+CardNum+" AND i041_pos_id = "+terminalID+""
HAVING txncount >= '"+TxnCount+"'

样本预期数据(截断):

CardNum         TimeStamp           TxnCount
123      2019-06-01 00:00:30.00        2   
123      2019-06-01 05:00:20.00        2
123      2019-06-03 20:00:00.00        1
456      2019-06-04 06:00:00.00        2
456      2019-06-04 00:00:10.91        2
789      2019-06-01 12:00:40.51        1

我认为我的问题是它无法计算每个卡号,因为我遇到了 GROUP BY 子句的问题。另外,我还没有拆分日期和时间,查询还不能识别日期之间的差异。

标签: pythonsqlhivedata-sciencehiveql

解决方案


您的查询格式不正确。您有COUNT()一堆其他列 - 而您没有GROUP BY. 这在 SQL 中是不允许的。

我建议您也使用参数,而不是修改查询字符串。所以,你大概想要这样的东西。

您的结果似乎需要每笔交易的详细信息,而不是摘要(因此您有两行计数为“2”而不是一个)。这表明您确实需要窗口函数:

SELECT a.*
FROM (SELECT a.*,
             (CASE WHEN i002_number LIKE '5%' THEN 'MasterCard'
                   WHEN i002_number LIKE '4%' THEN 'VISA'
              END) AS source
             COUNT(*) OVER (PARTITION BY i002_number, TRUNC(ltimestamp, 'DAY')) as txncount
      FROM FROM tsys.ods_authorizations a
      WHERE ltimestamp >= :timestamp1 AND
            ltimestamp <= :timestamp2 AND
            i042_merch_id = :MerchantID AND
            i002_number = :CardNum AND 
            i041_pos_id = :terminalID
     ) a
WHERE txncount >= :TxnCount

推荐阅读