首页 > 解决方案 > How to subtract a column of string based on other columns in Hive?

问题描述

With this table, I am trying to remove parts of address that happened to appear in zip_code and city.

+----------------------------------------------+----------+------------+
| address                                      | zip_code | city       |
+----------------------------------------------+----------+------------+
| Oceans Group, 12 Pear Tree Road, Derby       | DE23 6PY | Derby      |
| 970 Stockport Road                           | M19 3NN  | Manchester |
| Cartridge World Guiseley                     |          | Edinburgh  |
| 33-41 Kelvin Avenue                          | G52 4LT  | Glasgow    |
| Cartridge World Haymarket, 54 Dalry Road, UK | EH5 1HX  | Edinburgh  |
| 50 Otley Road, Leeds, LS20 8AH, UK           | LS20 8AH |            |
+----------------------------------------------+----------+------------+

something like

SUBSTR('Oceans Group, 12 Pear Tree Road, Derby', 'DE23 6PY', 'Derby') returns 'Oceans Group, 12 Pear Tree Road, '
SUBSTR('50 Otley Road, Leeds, LS20 8AH, UK', 'LS20 8AH', '') returns '50 Otley Road, Leeds, , UK'

Hope this piece of code save you some time.

CREATE TABLE address_table(
      address    STRING
    , zip_code   STRING
    , city       STRING
);

INSERT INTO address_table VALUES ("Oceans Group, 12 Pear Tree Road, Derby", "DE23 6PY", "Derby");
INSERT INTO address_table VALUES ("970 Stockport Road", "M19 3NN", "Manchester");
INSERT INTO address_table VALUES ("Cartridge World Guiseley", "", "Edinburgh");
INSERT INTO address_table VALUES ("33-41 Kelvin Avenue", "G52 4LT", "Glasgow");
INSERT INTO address_table VALUES ("Cartridge World Haymarket, 54 Dalry Road, UK", "EH5 1HX", "Edinburgh");
INSERT INTO address_table VALUES ("50 Otley Road, Leeds, LS20 8AH, UK", "LS20 8AH", "");

标签: sqlstringselecthivesql-update

解决方案


Hive 没有常规的字符串替换功能,但您可以使用regexp_replace()

select
    a.*,
    regexp_replace(address, zip_code, '') new_address
from address_table

如果你想要一个update声明:

update address_table
set address = regexp_replace(address, zip_code, '')

推荐阅读