首页 > 解决方案 > 使用 bicond 获取 ID?

问题描述

我有这个表格片段称为joinTbl

PRODUCT_ID  PRODUCT_NAME    ORDER_ID     PRODUCT_ID      CUSTOMER_ID     SALESPERSON_ID      UNIT_PRICE 
   11          CAKE           10946          11               83              1                  31
   11          CAKE           10949          11               10              2                  31
   11          CAKE           11020          11               56              2                  31
   14          CHICKEN        11076          14               9               4                  23.25
   11          CAKE           11077          11               65              1                  31
   14          CHICKEN        11077          14               65              1                  23.25

在 Pig Apache 中,我试图获取ORDER_ID订单是否包含蛋糕和鸡肉。预期的结果是

11077

但是,我正面临着试图做一个 bicond 来得到这个的问题ORDER_ID。这是我使用的语法:

cakeChicken = FOREACH joinedTbl GENERATE ((PRODUCT_NAME == 'CAKE' AND PRODUCT_NAME == 'CHICKEN') ? ORDER_ID : 0) AS order_both;

来自 this 的返回0就是 else 语句。

我究竟做错了什么?

标签: hadoopapache-pig

解决方案


您的 bincond 工作正常 - 因为它在 内FOREACH,它将单独检查每一行数据。因此,每一行只有一个 的值PRODUCT_NAME,所以它不能同时是'CAKE''CHICKEN'

根据您想做的事情,我将使用GROUP BY分组ORDER_ID然后将袋子过滤PRODUCT_NAME到仅包含'CAKE''CHICKEN'使用嵌套 foreach的袋子。最后,将数据过滤到“非空”袋中。像这样的东西:

groupedData = GROUP  joinTbl BY ORDER_ID;

/* Structure:
---------------------------------------------------------------------------------------------------------------------------------
| groupedData | group | joinTbl: bag({PRODUCT_ID, PRODUCT_NAME, ORDER_ID, PRODUCT_ID, CUSTOMER_ID, SALESPERSON_ID, UNIT_PRICE}) |
---------------------------------------------------------------------------------------------------------------------------------
|             | 10946 | {(11, CAKE, 10946, 11, 83, 1, 31)}                                                                      |
|             | 11077 | {(11, CAKE, 11077, 11, 65, 1, 31), (14, CHICKEN,  11077, 14, 65, 1, 23.25)}                             |
---------------------------------------------------------------------------------------------------------------------------------
*/

cakeChickenIds = FOREACH groupedData {
    cakes = FILTER joinTbl BY PRODUCT_NAME == 'CAKE';
    chickens = FILTER joinTbl BY PRODUCT_NAME == 'CHICKEN';
    GENERATE group AS ORDER_ID,
    cakes,
    chickens;
}

/* Structure:
------------------------------------------------------------------------------------------
| cakeChickenIds | ORDER_ID | cakes: bag({PRODUCT_NAME}) | chickens: bag({PRODUCT_NAME}) |
------------------------------------------------------------------------------------------
|                | 10946    | {(CAKE)}                   | {()}                          |
|                | 11077    | {(CAKE)}                   | {(CHICKEN)}                   |
------------------------------------------------------------------------------------------
*/

-- Both cakes and chickens bags will not be empty if ordered both
cakeChickenOrders = FILTER cakeChickenIds BY NOT IsEmpty(cakes) AND NOT IsEmpty(chickens);

推荐阅读