首页 > 解决方案 > apache pig中一列的最大值

问题描述

我正在尝试使用 pig 找到列 ratingTime 的最大值。我在脚本下面运行:

    ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userid:int,movieID:int,rating:int, ratingTime:int);
    maxrating = MAX(ratings.ratingTime);
    DUMP maxrating

样本输入数据为:

    196 242 3   881250949
    186 302 3   891717742
    22  377 1   878887116
    244 51  2   880606923

我收到以下错误:

     2018-08-05 07:02:05,247 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook 

     2018-08-05 07:02:05,914 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <file script.pi    

标签: hadoopapache-pig

解决方案


GROUP ALL在申请之前你需要一个前置MAX资源

ratings = LOAD '/user/maria_dev/ml-100k/u.data' USING PigStorage('\t') AS (userid:int,movieID:int,rating:int, ratingTime:int);
rating_group = GROUP ratings  ALL;
maxrating = FOREACH ratings_group GENERATE MAX(ratings.ratingTime);
DUMP maxrating;

推荐阅读