查找我们所有商店中按产品类别划分的销售明细

我有一个销售文件,其中包含商店名称、位置、销售价格、产品名称等信息。文件格式如下所示,


2012-01-01  09:00   San Jose    Men's Clothing  214.05  Amex

2012-01-01  09:00   Fort Worth  Women's Clothing    153.57  Visa

2012-01-01  09:00   San Diego   Music   66.08   Cash

2012-01-01  09:00   Pittsburgh  Pet Supplies    493.51  Discover

2012-01-01  09:00   Omaha   Children's Clothing 235.63  MasterCard

2012-01-01  09:00   Stockton    Men's Clothing  247.18  MasterCard  

我想编写一个 Map-reduce 作业来查找我们所有商店中按产品类别划分的销售明细。下面提供了我的代码(包括 Mapper 和 reducer),


public final class P1Q1 {



    public static final class P1Q1Map extends Mapper<LongWritable, Text, Text, DoubleWritable> {


        private final Text word = new Text();


        public final void map(final LongWritable key, final Text value, final Context context)

                throws IOException, InterruptedException {


            final String line = value.toString();

            final String[] data = line.trim().split("\t");


            if (data.length == 6) {


                final String product = data[3];

                final double sales = Double.parseDouble(data[4]);


                word.set(product);

                context.write(word, new DoubleWritable(sales));

            }

        }

    }



    public static final class P1Q1Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {


        public final void reduce(final Text key, final Iterable<DoubleWritable> values, final Context context)

                throws IOException, InterruptedException {


            double sum = 0.0;


            for (final DoubleWritable val : values) {

                sum += val.get();

            }


            context.write(key, new DoubleWritable(sum));

        }

    }

}

代码提供的答案不正确,与 Udacity 结果不匹配。


任何人都知道这是否是正确的想法以及如何做到这一点?


GCT1015
浏览 180回答 1
1回答

凤凰求蛊

在大多数情况下,我会说您的代码看起来不错,并且组合器只是一种优化,因此排除它应该产生与包含它相同的输出。我写了我自己的 MR,我得到了给定输入的输出Children's Clothing 235.63Men's Clothing&nbsp; 461.23Music&nbsp; &nbsp;66.08Pet Supplies&nbsp; &nbsp; 493.51Women's Clothing&nbsp; &nbsp; 153.57显然,如果您有成百上千的商店,那么您将获得数百万个货币单位,如您的输出所示。代码@Overridepublic int run(String[] args) throws Exception {&nbsp; &nbsp; Configuration conf = getConf();&nbsp; &nbsp; Job job = Job.getInstance(conf, APP_NAME);&nbsp; &nbsp; job.setJarByClass(StoreSumRunner.class);&nbsp; &nbsp; job.setMapperClass(TokenizerMapper.class);&nbsp; &nbsp; job.setReducerClass(CurrencyReducer.class);&nbsp; &nbsp; job.setOutputKeyClass(Text.class);&nbsp; &nbsp; job.setOutputValueClass(DoubleWritable.class);&nbsp; &nbsp; FileInputFormat.addInputPath(job, new Path(args[0]));&nbsp; &nbsp; FileOutputFormat.setOutputPath(job, new Path(args[1]));&nbsp; &nbsp; return job.waitForCompletion(true) ? 0 : 1;}static class TokenizerMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {&nbsp; &nbsp; private final Text key = new Text();&nbsp; &nbsp; private final DoubleWritable sales = new DoubleWritable();&nbsp; &nbsp; @Override&nbsp; &nbsp; protected void map(LongWritable offset, Text value, Context context) throws IOException, InterruptedException {&nbsp; &nbsp; &nbsp; &nbsp; final String line = value.toString();&nbsp; &nbsp; &nbsp; &nbsp; final String[] data = line.trim().split("\\s\\s+");&nbsp; &nbsp; &nbsp; &nbsp; if (data.length < 6) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.err.printf("mapper: not enough records for %s%n", Arrays.toString(data));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; key.set(data[3]);&nbsp; &nbsp; &nbsp; &nbsp; try {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sales.set(Double.parseDouble(data[4]));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; context.write(key, sales);&nbsp; &nbsp; &nbsp; &nbsp; } catch (NumberFormatException ex) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.err.printf("mapper: invalid value format %s%n", data[4]);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}static class CurrencyReducer extends Reducer<Text, DoubleWritable, Text, Text> {&nbsp; &nbsp; private final Text output = new Text();&nbsp; &nbsp; private final DecimalFormat df = new DecimalFormat("#.00");&nbsp; &nbsp; @Override&nbsp; &nbsp; protected void reduce(Text date, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {&nbsp; &nbsp; &nbsp; &nbsp; double sum = 0;&nbsp; &nbsp; &nbsp; &nbsp; for (DoubleWritable value : values) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sum += value.get();&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; output.set(df.format(sum));&nbsp; &nbsp; &nbsp; &nbsp; context.write(date, output);&nbsp; &nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java