首页 > 解决方案 > Hadoop 链接作业错误预期 org.apache.hadoop.io.DoubleWritable,收到 org.apache.hadoop.io.LongWritable

问题描述

我正在学习 hadoop,我正在尝试重现一个工作链示例。这个总和每个视频游戏的销售额。然后第二个映射器仅用于交换键和值,因此它按销售额而不是标题排序

我在第二个映射器上收到此错误:映射中的键类型不匹配:预期 org.apache.hadoop.io.DoubleWritable,收到 org.apache.hadoop.io.LongWritable

我对 LongWritable 的来源感到困惑,因为它只有 Text 和 DoubleWritable 无处不在,我错过了什么?

//第一个映射器

public class MonMap extends Mapper<Object, Text, Text, DoubleWritable>{

public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {


    String line = value.toString();
    // regex to avoid split on a comma between double quotes
    String [] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
    String jeux = tokens[1];
    Double sales = Double.parseDouble(tokens[10]);
    
    context.write(new Text(jeux), new DoubleWritable(sales));
}

}

// 减速器

public class MonReduce extends Reducer<Text,DoubleWritable,Text,DoubleWritable> {

public void reduce(Text key, Iterable<DoubleWritable> values,
                   Context context
                   ) throws IOException, InterruptedException {

    Double somme = 0.0 ;
    for (DoubleWritable val : values){
        somme += val.get();
    }
    
    context.write(key, new DoubleWritable(somme));
}

}

//第二个映射器

public class KeyValueSwapper extends Mapper<Text, DoubleWritable, DoubleWritable, Text>{

public void map(Text key, DoubleWritable value, Context context
                ) throws IOException, InterruptedException {
    
    context.write(value, key);
}

}

//主要的

   Configuration conf = new Configuration();
   
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(MonMap.class);       
    job.setCombinerClass(MonReduce.class);
    job.setReducerClass(MonReduce.class);
    
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(DoubleWritable.class);
    
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    if (!job.waitForCompletion(true)) {
        System.exit(1);
      }
    
    Job job2 = Job.getInstance(conf, "sort by sales");
    job2.setJarByClass(WordCount.class);
    job2.setMapperClass(KeyValueSwapper.class);
    job2.setOutputKeyClass(DoubleWritable.class);
    job2.setOutputValueClass(Text.class);
    
    FileInputFormat.addInputPath(job2, new Path(args[1]));
    FileOutputFormat.setOutputPath(job2, new Path(args[2]));
    
    if (!job2.waitForCompletion(true)) {
      System.exit(1);
    }
    
   }

谢谢!

标签: javahadoop

解决方案


您的映射器都使用 FileInputFormat,这需要读取<LongWritable, Text输入

话虽如此,您需要在第二个映射器中读取、拆分和解析文本行

总的来说,我建议使用 Pig、Hive、Spark 或 Flink……如果不需要 Hadoop,也可以只使用 Pandas。根据我的经验,没有多少人使用普通的 mapreduce


推荐阅读