java - 统计MapReduce中同一个对象的key
问题描述
所以我遇到了这个问题,我m trying to count the number of times an object/node is updated in MapReduce Hadoop. So my XML file looks like this. As you can see there are different node ids. I'm trying to count the total amount of versions for each unique ID but I
有点迷路了。您可以在下面看到 MapReduce 代码和 XML 文件。
</node>
<node id="1024219306" visible="true" version="1" changeset="6558971" timestamp="2010-12-06T01:34:53Z" user="tusvik" uid="203227" lat="59.2079125" lon="10.9487952">
<tag k="source" v="Bing"/>
</node>
<node id="1024219307" visible="true" version="2" changeset="6590128" timestamp="2010-12-08T22:03:37Z" user="jrj" uid="148636" lat="59.2099530" lon="10.9455866">
<tag k="source" v="Bing"/>
</node>
<node id="1024219308" visible="true" version="1" changeset="6558971" timestamp="2010-12-06T01:34:53Z" user="tusvik" uid="203227" lat="59.2131168" lon="10.9433018">
<tag k="source" v="Bing"/>
</node>
public class CountMU{
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String cb = itr.nextToken();
if (cb.startsWith("<node")) {
String moveTonxt = itr.nextToken();
word.set(moveTonxt);
}
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
}
计算xml中每个不同ID的数量的可靠方法是什么?一个快捷方式可能是计算version="x"
每个唯一 ID 的最大令牌数?
解决方案
推荐阅读
- c - 根据密钥长度打印列中的值
- flutter - Flutter列表迭代错误-列表功能无法分配给小部件
- swift - EventKit 提醒事件,带闹钟,不带小时?
- python - 如何编写在其他窗口后面的后台运行 html 文件的 python 脚本?(无第三方)
- android - 按下概览按钮时保持 appBar 的颜色
- python-3.x - 日期频率的箱须图
- python - 如何将 pygame 应用程序转换为 iOS 应用程序
- javascript - 如何在单个提供者中为多个状态对象实现 React useReducer
- java - Java Jackson 总是将一种类型序列化为另一种类型
- swagger - Swagger json 正在显示不需要的模式,例如 .net 核心 Web api 中的默认对象元数据