我写了一个小的mapreduce作业来查找数据集中第二高的薪水。我相信第二个最高薪水逻辑是正确的。但是我得到了多个不正确的输出,应该只有一个名称为John的输出,例如9000。而且输出也不正确,这里我给出了数据集和代码
hh,0,Jeet,3000
hk,1,Mayukh,4000
nn,2,Antara,3500
mm,3,Shubu,6000
ii,4,Parsi,8000
输出应该是Shubu,6000,但是我得到以下输出
Antara -2147483648
Mayukh -2147483648
Parsi 3500
Shubu 4000
我正在使用的代码是
public class SecondHigestMapper extends Mapper<LongWritable,Text,Text,Text>{
private Text salary = new Text();
private Text name = new Text();
public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
if(key.get()!=0){
String split[]= value.toString().split(",");
salary.set(split[2]+";"+split[3]);
name.set("ignore");
context.write(name,salary);
}
}
}
public class SecondHigestReducer extends Reducer<Text,Text,Text,IntWritable>{
public void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException{
int highest = 0;
int second_highest = 0;
int salary;
for(Text val:values){
String[] fn = val.toString().split("\\;");
salary = Integer.parseInt(fn[3]);
if(highest < salary){
second_highest = highest;
highest =salary;
} else if(second_highest < salary){
second_highest = salary;
}
}
String seconHigest = String.valueOf(second_highest);
context.write(new Text(key),new Text(seconHigest));
}
}
MYYA
相关分类