首页 > 解决方案 > 在函数名称上拆分 SQL 语句,但在 Python 中保留分隔符

问题描述

假设我有以下string包含从SELECT子句中提取的 SQL 语句(实际上这是一个包含数百个此类语句的巨大 SQL 语句);

  SUM(case when(A.money-B.money>1000
                and A.unixtime-B.unixtime<=890769
                and B.col10 = "A"
                and B.col11 = "12"
                and B.col12 = "V") then 10
      end) as finalCond0,
  MAX(case when(A.money-B.money<0
                and A.unixtime-B.unixtime<=6786000
                and B.cond1 = "A"
                and B.cond2 = "4321"
                and B.cond3 in ("E", "F", "G")) then A.col10
        end) as finalCond1,
  SUM(case when(A.money-B.money>0
                and A.unixtime-B.unixtime<=6786000
                and B.cond1 = "A"
                and B.cond2 = "1234"
                and B.cond3 in ("A", "B", "C")) then 2
      end) as finalCond2    

如何在函数(即SUM, MAX,等)上拆分此查询,以便我可以提取最后一个查询但不删除分隔符(在本例中为MIN)?MEANSUM

因此,所需的输出将是一个字符串,如下所示:

  SUM(case when(A.money-B.money>0
                and A.unixtime-B.unixtime<=6786000
                and B.cond1 = "A"
                and B.cond2 = "1234"
                and B.cond3 in ("A", "B", "C")) then 2
      end) as finalCond2

PS:出于演示目的,我提供了某种缩进,但实际上这些语句用逗号分隔,这意味着原始形式中不会出现空格或换行符。

标签: pythonsqlregexsplit

解决方案


您不能在此处使用正则表达式,因为 SQL 语法不会形成您可以与 Pythonre引擎匹配的正则模式。您必须将字符串实际解析为令牌流或语法树;SUM(...)毕竟,您可以包含多种语法,包括子选择。

sqlparse可以做到这一点,即使它有点缺乏文档并且对外部使用不那么友好

重新使用walk_tokens我在链接到的另一篇文章中定义的函数:

from collections import deque
from sqlparse.sql import TokenList

def walk_tokens(token):
    queue = deque([token])
    while queue:
        token = queue.popleft()
        if isinstance(token, TokenList):
            queue.extend(token)
        yield token

SELECT从标识符列表中提取最后一个元素是:

import sqlparse
from sqlparse.sql import IdentifierList

tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
    if isinstance(tok, IdentifierList):
        # iterate to leave the last assigned to `identifier`
        for identifier in tok.get_identifiers():
            pass
        break

print(identifier)

演示:

>>> sql = '''\
...   SUM(case when(A.money-B.money>1000
...                 and A.unixtime-B.unixtime<=890769
...                 and B.col10 = "A"
...                 and B.col11 = "12"
...                 and B.col12 = "V") then 10
...       end) as finalCond0,
...   MAX(case when(A.money-B.money<0
...                 and A.unixtime-B.unixtime<=6786000
...                 and B.cond1 = "A"
...                 and B.cond2 = "4321"
...                 and B.cond3 in ("E", "F", "G")) then A.col10
...         end) as finalCond1,
...   SUM(case when(A.money-B.money>0
...                 and A.unixtime-B.unixtime<=6786000
...                 and B.cond1 = "A"
...                 and B.cond2 = "1234"
...                 and B.cond3 in ("A", "B", "C")) then 2
...       end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
...     if isinstance(tok, IdentifierList):
...         # iterate to leave the last assigned to `identifier`
...         for identifier in tok.get_identifiers():
...             pass
...         break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
                and A.unixtime-B.unixtime<=6786000
                and B.cond1 = "A"
                and B.cond2 = "1234"
                and B.cond3 in ("A", "B", "C")) then 2
      end) as finalCond2

identifier是一个sqlparse.sql.Identifier实例,但是再次将其转换为字符串(print()确实如此,或者您可以使用str())再次为您提供该部分的输入 SQL 字符串。


推荐阅读