hive - 在 Hive 中将一行单词分成单词组
问题描述
我有一些文本,我想一次分解成两个、三个甚至四个单词。我正在尝试提取有意义的短语。
我已经使用split
并explode
检索了我需要的内容,但我希望一次将行分成两个或三个单词。这是我到目前为止所拥有的,一次只将行分成一个单词。
select explode(a.text) text
from (select split(text," ") text
from table abc
where id = 123
and date = 2019-08-16
) a
我得到的输出:
text
----
thank
you
for
calling
your
tv
is
not
working
?
我想要这样的输出:
text
----
Thank you
for calling
your tv
is not
working?
或类似的东西:
text
----
thank you for calling
your
tv is not working
?
解决方案
CREATE TABLE IF NOT EXISTS db.test_string
(
text string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS orc
;
INSERT INTO TABLE db.test_string VALUES
('thank you for calling your tv is not working ?');
以下是查询:
select k,s from db.test_string
lateral view posexplode(split(text,' ')) pe as i,s
lateral view posexplode(split(text,' ')) ne as j,k
where ne.j=pe.i-1
and ne.j%2==0
;
thank you
for calling
your tv
is not
working ?
Time taken: 0.248 seconds, Fetched: 5 row(s)
使用 where 子句将上述逻辑添加到您的实际表中,并让我知道它是如何进行的。
推荐阅读
- java - 如何避免在 Java 中重复 for 循环?
- r - How can I merge row with similar value in column, to get unique value in another column in R
- node.js - Unable to connect to Docker container from Windows host machine
- javascript - Redirect in Javascript Window Leaves New Window Open
- awk - 使用 sed 删除特定列的小数
- reactjs - How to toggle inline/portal in React
- c# - C# / FFMPEG - Is this the best way to programmatically combine multiple video files in different formats and encodings into one?
- macos - docker compose invalid mount config for type "bind" if use relative path in volume
- xcode - 使用 by.text 进行排毒子字符串匹配
- c# - 从 XML 加载和获取值