首页 > 解决方案 > 用逗号解析字符串到 dict

问题描述

我有一个如下所示的输入字符串。我想根据逗号将其解析为如下所示的输出。问题是有时括号内包含逗号,如下例所示,引号内也包含引号。我对正则表达式匹配不太方便,因此非常感谢任何提示。

输入:

"ty_event_name, from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date,'${hiveconf:run_dt}' as sessions_fy,orders_xy"

输出:

{1:'ty_event_name',
2:'from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date',
3:''${hiveconf:run_dt}' as sessions_fy',
4:'orders_xy'}

试过:

import pandas as pd
import numpy as np
import re

teststr="ty_event_name, from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date,'${hiveconf:run_dt}' as sessions_fy,orders_xy"

tstr=re.sub('(?!\B"[^"]*),(?![^"]*"\B)',',',teststr).split()

tstr

输出:

['ty_event_name,',
 "from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd')",
 'as',
 "ty_date,'${hiveconf:run_dt}'",
 'as',
 'sessions_fy,orders_xy']

标签: regexpython-3.xpandasnumpy

解决方案


看起来它成功了:

代码:

re.split(r',\s*(?=[^)]*(?:\(|$))', teststr) 

输出:

['ty_event_name',
 "from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date",
 "'${hiveconf:run_dt}' as sessions_fy",
 'orders_xy']

推荐阅读