首页 > 解决方案 > 统计MapReduce中同一个对象的key

问题描述

所以我遇到了这个问题,我m trying to count the number of times an object/node is updated in MapReduce Hadoop. So my XML file looks like this. As you can see there are different node ids. I'm trying to count the total amount of versions for each unique ID but I有点迷路了。您可以在下面看到 MapReduce 代码和 XML 文件。

</node>
 <node id="1024219306" visible="true" version="1" changeset="6558971" timestamp="2010-12-06T01:34:53Z" user="tusvik" uid="203227" lat="59.2079125" lon="10.9487952">
  <tag k="source" v="Bing"/>
 </node>

 <node id="1024219307" visible="true" version="2" changeset="6590128" timestamp="2010-12-08T22:03:37Z" user="jrj" uid="148636" lat="59.2099530" lon="10.9455866">
  <tag k="source" v="Bing"/>
 </node>

 <node id="1024219308" visible="true" version="1" changeset="6558971" timestamp="2010-12-06T01:34:53Z" user="tusvik" uid="203227" lat="59.2131168" lon="10.9433018">
  <tag k="source" v="Bing"/>
</node>
public class CountMU{

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                String cb = itr.nextToken();
                if (cb.startsWith("<node")) {
                    String moveTonxt = itr.nextToken();
                        word.set(moveTonxt);

                }
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }
}

计算xml中每个不同ID的数量的可靠方法是什么?一个快捷方式可能是计算version="x"每个唯一 ID 的最大令牌数?

标签: javaxmlhadoopcountmapreduce

解决方案


推荐阅读