首页 > 解决方案 > Strange behaviour of Regexp_replace in a Hive SQL query

问题描述

I have some input information where I'm trying to remove the part .0 from my input where an ID string ends with .0.

select student_id, regexp_replace(student_id, '.0','') from school_result.credit_records where student_id like '%.0';

Input:

01-0230984.03
12345098.0
34567.0

Expected output:

01-0230984.03 
12345098
34567

But the result I'm getting is as follows: It's removing any character having with a 0 next to it instead of removing only the occurrences that end with .0

0129843
123498
34567

What am I doing wrong? Can someone please help?

标签: regexhivehiveqlregexp-replace

解决方案


正则表达式中的点具有特殊含义(表示任何字符)。如果您从字面上需要点 (.),则应使用双斜杠(在 Hive 中)对其进行屏蔽。还要添加行尾锚($):

with mydata as (
select stack(3,
'01-0230984.03',
'12345098.0',
'34567.0'
) as str
)

select regexp_replace(str,'\\.0$','') from mydata;

结果:

01-0230984.03
12345098
34567

正则表达式的字面意思'\\.0$'是点零 ( .0),行尾 ( $)。


推荐阅读