首页 > 解决方案 > 从 HIVE 阵列中提取的记录

问题描述

我有一个 HIVE 表,其中为tripid 加载数据作为tripid、gps_location_1、gps_location_2 位置,一次旅行可能是10 个gps 位置,而另一次旅行可能有500 个gps 位置,当我查询数据时,我的结果如下

选择tripid, size(gps) 作为行程计数;

tripid gps_location_1 |gps_location_2 | 计数 |

1451f2b3d |44.1145 | 44.1148| 9 |

选择tripid、gps.gps_location_1、gps.gps_location_1作为行程计数;

+---------+-------------------------------------- -----------------+----------------- ---------------------------------------------+

三叉戟 | gps_location_1 | gps_location_2 | 计数 |

+---------+-------------------------------------- ----------------------+---------------------------- -------------------------------+ 1451f2b3d |[44.1145,44.1146,44.1147,44.1148,44.1148,44.1129,44.1127,44.1121 ] | [44.1148,44.1146,44.1146,44.1141,44.1138,44.1129,44.1127]| 9 | +---------+-------------------------------------- ---------------------------------------+------------------------ ----------------------------------+

我可以从行程数组表中看到第一个值。

从行程中选择tripid、gps[0].gps_location_1、gps[0].gps_location_1;

tripid gps_location_1 gps_location_2

1451f2b3d 44.1145 44.1148

行程数组表的第二行

从行程中选择tripid、gps[1].gps_location_1、gps[1].gps_location_1;

tripid gps_location_1 gps_location_2

1451f2b3d 44.1146 44.1146

行程数组表的最后一行

选择tripid,gps[size(gps)].gps_location_1,gps[size(gps)].gps_location_1 from trip;

1451f2b3d 44.1121 44.1127

我需要像这样将每一行存储在我的新 target_trip 表中,循环遍历trip 表中一个tripid 的所有行并插入到下面显示的target_table 中。

我怎样才能实现它?

tripid gps_location_1 gps_location_2

1451f2b3d 44.1145 44.1148

1451f2b3d 44.1146 44.1146

1451f2b3d 44.1147 44.1146

1451f2b3d 44.1148 44.1141

1451f2b3d 44.1129 44.1138

1451f2b3d 44.1127 44.1129

1451f2b3d 44.1121 44.1127

标签: hivehiveql

解决方案


使用lateral view explode

elect tripid, coordinates.gps_location_1, coordinates.gps_location_1 
  from trip
       lateral view outer explode(gps) s as coordinates

explode()UDTF 为每个数组元素生成行。横向视图将 UDTF 应用于基表的每一行,然后将结果行连接到输入行以形成具有指定表别名的虚拟表。

有关横向视图的更多信息,另请参阅答案。


推荐阅读