首页 > 解决方案 > 我在猪中使用过滤器时遇到错误,当我转储结果时它会出错

问题描述

pig中使用的代码是:

studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int);
resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);
joniR = JOIN studentsR BY rollno,resultR BY rollno;
filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result) ;
filterRPass = FILTER filterR BY resultR.result == 'pass';
dump filterRPass;

错误如下:

ERROR 0: Scalar has more than one row in the output. 1st : (1,fail), 2nd :(2,fail)

标签: hadoopapache-pig

解决方案


尝试对每个结果集进行转储和描述,以查看使用的每个别名的输出。

参考:标量在输出中具有多于一行

studentsR = LOAD '/home/user/students' using PigStorage(' ') as (name:chararray,rollno:int);
dump studentsR;
resultR = LOAD '/home/user/results' using PigStorage(' ') as (rollno:int,result:chararray);
dump resultR;
joniR = JOIN studentsR BY rollno,resultR BY rollno;
dump joniR;
filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
dump filterR;
filterRPass = FILTER filterR BY resultR::result == 'pass';
dump filterRPass;

修改:

我使用输入文件中的空间作为分隔符,所以使用 PigStorage(' ')

在 filterR 中,我删除了 studentsR::name,studentsR::rollno,resultR::result 周围的开始和结束圆括号 (),因为转储的输出具有额外的圆括号。

grunt> filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
grunt> describe  filterR;
filterR: {org.apache.pig.builtin.totuple_studentsR::name_100: (studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray)}
grunt> filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
grunt> describe  filterR;
filterR: {studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray}

在 fifilterRPass 中使用 resultR::result 而不是 resultR.result

我使用了一组本地文件并在本地模式下执行 pig 进行测试。

cat students
a 1
b 2
c 3

cat results
3 pass
2 fail
5 pass

转储结果:

dump studentsR
(a,1)
(b,2)
(c,3)

dump resultR
(3,pass)
(2,fail)
(5,pass)

dump joniR
(b,2,2,fail)
(c,3,3,pass)

dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))

dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)

dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass';  --or-- filterRPass = FILTER filterR BY $2 == 'pass';
(c,3,pass)

推荐阅读