首页 > 解决方案 > 在 Apache pig 中使用 python udf 输出模式

问题描述

我的 python udf 返回一个像这样的元组列表:

[(0.01, 12), (0.02, 6), (0.03, 12), (0.04, 19), (0.05, 29), (0.06, 42)]

以上内容打印到映射器的标准输出中并从中复制。

元组中的两个值分别转换为 float 和 int。我还打印了类型,它确实是正确铸造的。

(<type 'float'>, <type 'int'>)

这里是装饰器@outputSchema("stats:bag{improvement:tuple(percent:float,entityCount:int)}")

这是错误消息:

错误:java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum (0.01,12) is not in union ["null",{"type":"record", "name":"TUPLE_1","fields":[{"name":"percent","type":["null","float"],"doc":"从 Pig Field Schema 自动生成","default ":null},{"name":"entityCount","type":["null","int"],"doc":"autogenerated from Pig Field Schema","default":null}]}] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:479) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer。PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:442) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:422) 在 org.apache.pig.backend.hadoop。 executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:269) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask. java:627) 在 org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) 在 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 在 java.security.AccessController。 doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.mapred。YarnChild.main(YarnChild.java:158) 引起:org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum (0.01,12) is not in union ["null",{"type" :"record","name":"TUPLE_1","fields":[{"name":"percent","type":["null","float"],"doc":"从 Pig Field 自动生成Schema","default":null},{"name":"entityCount","type":["null","int"],"doc":"autogenerated from Pig Field Schema","default":null }]}] at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263) at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) at org.apache .pig.piggybank。storage.avro.AvroStorage.putNext(AvroStorage.java:646) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136) 在 org.apache.pig.backend。 hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558) at org.apache.hadoop.mapreduce.task。 TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) 在 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer。 PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:477) ... 11 更多原因:java.lang.RuntimeException: Datum (0.01,12) is not in union ["null",{"type":"record","name":"TUPLE_1","fields":[{"name":"percent","type":["null","float"],"doc ":"从 Pig 字段架构自动生成","default":null},{"name":"entityCount","type":["null","int"],"doc":"从 Pig 字段架构自动生成","default":null}]}] 在 org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion 的 org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnion(PigAvroDatumWriter.java:132) (PigAvroDatumWriter.java:111) 在 org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82) 在 org.apache.avro.generic.GenericDatumWriter。writeArray(GenericDatumWriter.java:131) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)在 org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:113) 在 org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82) 在 org.apache .pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:378) 在 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) 在 org.apache.pig.piggybank.storage.avro .PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) 在 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) 在 org.apache.avro.file.DataFileWriter.append(DataFileWriter.爪哇:257)...

任何人都知道我在架构中做错了什么?

标签: apache-pig

解决方案


推荐阅读