java - 在我们所有商店中按产品类别查找销售明细
问题描述
我有一个销售文件,其中包含商店名称、位置、销售价格、产品名称等信息。文件格式如下所示,
2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex
2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Diego Music 66.08 Cash
2012-01-01 09:00 Pittsburgh Pet Supplies 493.51 Discover
2012-01-01 09:00 Omaha Children's Clothing 235.63 MasterCard
2012-01-01 09:00 Stockton Men's Clothing 247.18 MasterCard
我想编写一个 Map-reduce 作业来查找我们所有商店中按产品类别划分的销售细目。下面提供了我的代码(包括 Mapper 和 reducer),
public final class P1Q1 {
public static final class P1Q1Map extends Mapper<LongWritable, Text, Text, DoubleWritable> {
private final Text word = new Text();
public final void map(final LongWritable key, final Text value, final Context context)
throws IOException, InterruptedException {
final String line = value.toString();
final String[] data = line.trim().split("\t");
if (data.length == 6) {
final String product = data[3];
final double sales = Double.parseDouble(data[4]);
word.set(product);
context.write(word, new DoubleWritable(sales));
}
}
}
public static final class P1Q1Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
public final void reduce(final Text key, final Iterable<DoubleWritable> values, final Context context)
throws IOException, InterruptedException {
double sum = 0.0;
for (final DoubleWritable val : values) {
sum += val.get();
}
context.write(key, new DoubleWritable(sum));
}
}
public final static void main(final String[] args) throws Exception {
final Configuration conf = new Configuration();
final Job job = new Job(conf, "P1Q1");
job.setJarByClass(P1Q1.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.setMapperClass(P1Q1Map.class);
job.setCombinerClass(P1Q1Reduce.class);
job.setReducerClass(P1Q1Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
代码提供的答案不正确,与 Udacity 结果不匹配。
任何人都知道这是否是正确的想法以及如何做到这一点?
笔记
我在输出文件中得到一个完全错误的结果,
Baby 5.749180844000035E7 Books 5.745075790999787E7 CDs 5.741075304000156E7 Cameras 5.7299046639999785E7 Children's Clothing 5.762482094000117E7 Computers 5.7315406319999576E7 Consumer Electronics 5.745237412999948E7 Crafts 5.7418154499999225E7 DVDs 5.764921213999939E7 Garden 5.7539833110000335E7 Health and Beauty 5.748158956000019E7 Men's Clothing 5.76212790400011E7 Music 5.749548970000038E7 Pet Supplies 5.71972502400004E7 Sporting Goods 5.7599085889999546E7 Toys 5.746347710999843E7 Video Games 5.7513165580000155E7 Women's Clothing 5.74344489699993E7
我想如果注释掉组合器,这就可以了。我这样做了,并没有改变结果。
job.setCombinerClass(P1Q1Reduce.class);
解决方案
在大多数情况下,我会说您的代码看起来不错,并且组合器只是一种优化,因此排除它应该产生与包含它相同的输出。
我写了自己的 MR,我得到了给定输入的输出
Children's Clothing 235.63
Men's Clothing 461.23
Music 66.08
Pet Supplies 493.51
Women's Clothing 153.57
显然,如果您有成百上千的商店,那么您将获得数百万个货币单位,如您的输出所示。
代码
@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = Job.getInstance(conf, APP_NAME);
job.setJarByClass(StoreSumRunner.class);
job.setMapperClass(TokenizerMapper.class);
job.setReducerClass(CurrencyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
static class TokenizerMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {
private final Text key = new Text();
private final DoubleWritable sales = new DoubleWritable();
@Override
protected void map(LongWritable offset, Text value, Context context) throws IOException, InterruptedException {
final String line = value.toString();
final String[] data = line.trim().split("\\s\\s+");
if (data.length < 6) {
System.err.printf("mapper: not enough records for %s%n", Arrays.toString(data));
return;
}
key.set(data[3]);
try {
sales.set(Double.parseDouble(data[4]));
context.write(key, sales);
} catch (NumberFormatException ex) {
System.err.printf("mapper: invalid value format %s%n", data[4]);
}
}
}
static class CurrencyReducer extends Reducer<Text, DoubleWritable, Text, Text> {
private final Text output = new Text();
private final DecimalFormat df = new DecimalFormat("#.00");
@Override
protected void reduce(Text date, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
double sum = 0;
for (DoubleWritable value : values) {
sum += value.get();
}
output.set(df.format(sum));
context.write(date, output);
}
}
推荐阅读
- php - preg_match_all 找不到 html
- kubernetes - Gitlab Runner auto CI 卡在下载中
- ms-access - 如何在 ms 访问中更改组合框后填充文本字段
- android - 运行单个单元测试类正在工作,运行所有类只执行第一个测试类
- ruby-on-rails - 为什么 redirect_to 在创建操作中不起作用?
- jenkins - 如果未构建,Jenkins 会删除 BitBucket 上的正在进行的构建
- c - 使用 C 语法在一行中重新定义宏常量的方法?
- c# - SQL INSERT INTO 没有任何错误但不显示任何数据?(C#)
- node.js - Docker Compose 与 Docker 工具箱:Node、Mongo、React。React 应用程序未显示在所述地址中
- html - 如何获得 Curl 标头响应?