首页 > 解决方案 > Hadoop MapReduce 映射器读取多行而不是一行

问题描述

我是 Hadoop MapReduce 的新手。我正在研究一个读取数据文件的项目,如下所示:

[Event "Rated Classical game"]
[Site "https://lichess.org/j1dkb5dw"]
[White "BFG9k"]
[Black "mamalak"]
[Result "1-0"]
[UTCDate "2012.12.31"]
[UTCTime "23:01:03"]
[WhiteElo "1639"]
[BlackElo "1403"]
[WhiteRatingDiff "+5"]
[BlackRatingDiff "-8"]
[ECO "C00"]
[Opening "French Defense: Normal Variation"]
[TimeControl "600+8"]
[Termination "Normal"]

1. e4 e6 2. d4 b6 3. a3 Bb7 4. Nc3 Nh6 5. Bxh6 gxh6 6. Be2 Qg5 7. Bg4 h5 8. Nf3 Qg6 9. Nh4 Qg5 10. Bxh5 Qxh4 11. Qf3 Kd8 12. Qxf7 Nc6 13. Qe8# 1-0

但我不知道如何在映射器中读取多行,因为映射器通常每行读取一次。我尝试过使用 NLineInputFormat,但效果不佳。

下面是我的驱动程序代码:(我只在映射器上尝试,所以我将减少工作设置为零)

public static void main(String[] args) throws Exception {

    if (args.length != 2) {
      System.out.printf("Usage: AvgWordLength <input dir> <output dir>\n");
      System.exit(-1);
    }

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Project: Chess Analysis");
    
    job.setInputFormatClass(NLineInputFormat.class);
    NLineInputFormat.addInputPath(job, new Path(args[0]));
    job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 18);
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    job.setJarByClass(ChessAnalysis.class);
    job.setMapperClass(ChessMapper.class);
    job.setReducerClass(ChessReducer.class);
    
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class); 
    
    job.setNumReduceTasks(0);
    
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    boolean success = job.waitForCompletion(true);
    System.exit(success ? 0 : 1);
  }

我会感谢所有的帮助谢谢。

标签: hadoopmapreduce

解决方案


推荐阅读