apache-pig - 在 Apache Pig 中每年获取 MAX 值
问题描述
我一直在尝试使用以下数据获得每年的最高温度。实际数据看起来像这样,但我只对第一列是年份和第四列是温度感兴趣。
2016-11-03 12:00:00.000 +0100,Mostly Cloudy,rain,10.594444444444443,10.594444444444443,0.73,13.2664,174.0,10.1913,0.0,1019.74,Partly cloudy throughout the day.
2016-11-03 13:00:00.000 +0100,Mostly Cloudy,rain,11.072222222222223,11.072222222222223,0.72,13.1698,176.0,12.4131,0.0,1019.45,Partly cloudy throughout the day.
2016-11-03 14:00:00.000 +0100,Mostly Cloudy,rain,11.172222222222222,11.172222222222222,0.71,12.654600000000002,175.0,10.835300000000002,0.0,1019.16,Partly cloudy throughout the day.
2016-11-03 15:00:00.000 +0100,Mostly Cloudy,rain,10.911111111111111,10.911111111111111,0.72,11.753,170.0,10.867500000000001,0.0,1018.94,Partly cloudy throughout the day.
2016-11-03 16:00:00.000 +0100,Mostly Cloudy,rain,10.350000000000001,10.350000000000001,0.72,10.6582,161.0,11.592,0.0,1018.81,Partly cloudy throughout the day.
DUMP B is like below
(2014,12.038889)
(2014,21.055555)
(2016,29.905556)
(2016,30.605556)
(2016,29.95)
(2016,29.972221)
我编写的代码如下所示..但是,它在 D 处引发了错误。我也使用了 ToDate 函数,但似乎它也不起作用..
A = load 'file.csv' using PigStorage(',')......
B = foreach A GENERATE SUBSTRING(year,0,4) as year1, Atemp
C = group B by year1;
D = foreach C GENERATE group,MAX(Atemp);
我得到的错误:
Invalid field projection. Projected field [year1] does not exist in schema: group:chararray,B:bag{:tuple(year1:chararray,Atemp:float)}.
解决方案
我在stackoverflow上发布问题后弄清楚了自己:)我想知道为什么!而不是 D = foreach C GENERATE group,MAX(Atemp); 我使用 D= foreach C GENERATE group, MAX(B.Atemp) as max; 它有效!
如果有人要我删除帖子,我很乐意这样做。请告诉我
推荐阅读
- loops - 使用 Jinja2 模板循环访问主机
- ios - 无法更改 UITableViewCell 内 UIImageView 的宽度和高度 - swift
- r - r - 折线图未连接(财务数据)
- eloquent - eloquent 5.6 模型创建功能未找到
- nginx - 多个网站的Nginx反向代理和root问题
- javascript - 匹配标签内的所有换行符
- highcharts - Higcharts - marginBottom 不允许渲染 xAxis 系列省略号
- javascript - 从另一个函数内部的回调函数返回值
- javascript - 浏览器在单个会话中存储下载的资产多长时间
- signals - 将信号值放入 CAPL 中的变量中