hadoop - 在cloudera中获取hadoop字数示例中的数字
问题描述
下面我们使用了代码: 地图类是WCMapper。reduce 类是 WCReducer。
不太清楚为什么输出生成数字而不是字数。
public class WCMapper extends Mapper {
public void map(LongWritable key,Text value,Context context) throws
IOException,InterruptedException
{ String line = key.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens())
{ value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
}
}
public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException,InterruptedException
{
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
result.set(sum);
System.out.println("Key: "+key+"Value: "+sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WorCount.class);
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
outputPath.getFileSystem(conf).delete(outputPath, true);
System.exit(job.waitForCompletion(true)? 0: 1);
}
输入文件:这是cloudera 这是smart
预期输出:this 2 is 2 cloudera 1 smart 1
获得的输出:0 1 17 1
解决方案
问题出在您的映射器中:
String line = key.toString();
在key
这种情况下,是LongWritable
表示文件中行的字节偏移量。如果您将该行更改为value
, 然后不要value
在下面使用,您将得到正确答案。
新映射器:
public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
Text word = new Text();
while(tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, new IntWritable(1));
}
}
推荐阅读
- python - 使用 backtesting.py 回测交易策略
- javascript - 多次反应组件渲染
- kubernetes - Kubernetes master 离线场景
- react-native - 如何在 React Native 中创建具有模糊背景的 TextInput
- javascript - 将字符串转换为对象 - Javascript
- sql - UNION 两张表 SQL/Teradata
- racket - Racket 中 OPP 的最大值和最小值
- python - 加载 Python lib pyinstaller 时出错(不同尝试的详细描述)
- javascript - 即使我调用了 bind(),也没有定义 ReactJS 状态
- java - 没有注册的仪器,运行单元测试时出错