需求: 在一堆给定的文本文件中统计输出每一个单词出现的总次数
如下图所示为MapReduce统计WordCount的分析图:

map阶段从文件中读取数据,行号作为key,读取的每行值作为value,将每个key/value对输出给reduce阶段,reduce阶段将map阶段所有执行完的结果进行reduce操作,每个相同的key执行一次reduce方法。
代码如下:
WordCountMapper.java
package com.lxj.wc;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;//Map阶段:输入的行号作为key,每行读取的值作为valuepublic class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{ private Text k = new Text(); private IntWritable v = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value,Context context) throws java.io.IOException, java.lang.InterruptedException {
// 1 将每次读入的一行进行分割
String line = value.toString();
// 2 转换成String类型进行分割
String[] words = line.split(" ");
// 3 将每个键值对都写出
for (String word : words) {
String trim = word.trim(); if(!" ".equals(trim)){
k.set(trim); // 4 map阶段将单词拆分,并不合并,所以固定值为1
context.write(k, v);
}
}
}
}WordCountReducer.java
package com.lxj.wc;import java.util.Iterator;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;//Reduce阶段是以Map阶段的输出结果作为Reduce阶段的输入数据public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
//同一个key有且仅只执行一次reduce方法
@Override
protected void reduce(Text text, Iterable<IntWritable> iterable, Context context) throws java.io.IOException, java.lang.InterruptedException {
// 1. 将map阶段同一个key对应的value值求和
int sum = 0;
Iterator<IntWritable> iterator = iterable.iterator(); while(iterator.hasNext()){
sum += iterator.next().get();
} if(!text.toString().trim().equals("")){ //将结果输出
context.write(text, new IntWritable(sum));
}
}
}WordCountDriver.java
package com.lxj.wc;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//驱动类,将map与reduce进行关联
public class WordCountDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// 1.获取配置信息
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration);
// 2.设置加载jar的位置路径,直接传入当前Class对象
job.setJarByClass(WordCountDriver.class);
// 3.设置map和reduce类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 4.设置map的输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 5.设置最终的输出
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 6.设置输入和输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7.提交
boolean result = job.waitForCompletion(true);
System.exit( result ? 0 : 1);
}
}准备如下文件:



一 本地方法测试结果如下:

Astonished 1 At 1 But 1 Fate 1 He 2 Immediately 1 Many 1 O 1 Phoenix 1 a 1 admired, 1 again 1 ages 1 al 1 amongst 1 an 1 and 5 animals, 1 appeared 1 around 1 at 1 away 1 beasts, 1 beauty, 1 been 2 began 1 being 1 birds 1 both 1 broke 1 compassion, 1 different 1 elasticserach 1 euraka 1 eye 1 flocked 1 friend 1 great 1 had 2 hadoop 1 hard 1 has 2 he 1 him 3 his 1 in 2 into 1 javaee 1 kinds 1 know 1 last 1 look 1 loved 1 loving 1 map 1 mate 1 most 1 mysql 1 neither 1 never 1 nor 1 now 1 of 4 or 1 out 1 passed 1 phoenix 1 pleasure 1 praise. 1 prudent 1 redis 2 reduce 1 seen 1 shiro 1 short 1 sighed 1 since 1 spark 1 ssh 1 ssm 1 stared 1 the 5 them 1 they 2 time, 1 to 2 unhappy 1 upon 1 will 1 wisest 1 with 1 world. 1 yarn 1 zookeeper 1
二 Hadoop集群上运行如下:
首先将项目打成jar包,然后上传到HDFS上面进行分析,并执行以下命令:

执行成功之后查看结果:


当然也可以直接在web端下载查看:

随时随地看视频