从简单的Java程序调用mapreduce作业

我一直试图从同一程序包中的简单Java程序调用mapreduce作业。.我试图在java程序中引用mapreduce jar文件,并runJar(String args[])通过传递mapreduce作业的输入和输出路径,使用该方法调用它..但是该程序可以正常工作..


我如何运行这样的程序,在该程序中,我只使用传递输入,输出和jar路径的主要方法?是否可以通过它运行mapreduce作业(jar)?我想要这样做是因为我要一个接一个地运行多个mapreduce作业,其中我的Java程序vl通过引用其jar文件来调用每个此类作业。如果可能的话,我不妨只使用一个简单的servlet进行此类调用并出于图形目的参考其输出文件。


/*

 * To change this template, choose Tools | Templates

 * and open the template in the editor.

 */


/**

 *

 * @author root

 */

import org.apache.hadoop.util.RunJar;

import java.util.*;


public class callOther {


    public static void main(String args[])throws Throwable

    {


        ArrayList arg=new ArrayList();


        String output="/root/Desktp/output";


        arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar");


        arg.add("/root/Desktop/input");

        arg.add(output);


        RunJar.main((String[])arg.toArray(new String[0]));


    }

}


手掌心
浏览 854回答 3
3回答

慕运维8079593

哦,请不要使用runJar,Java API非常好。了解如何从常规代码开始工作:// create a configurationConfiguration conf = new Configuration();// create a new job based on the configurationJob job = new Job(conf);// here you have to put your mapper classjob.setMapperClass(Mapper.class);// here you have to put your reducer classjob.setReducerClass(Reducer.class);// here you have to set the jar which is containing your // map/reduce class, so you can use the mapper classjob.setJarByClass(Mapper.class);// key/value of your reducer outputjob.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);// this is setting the format of your input, can be TextInputFormatjob.setInputFormatClass(SequenceFileInputFormat.class);// same with outputjob.setOutputFormatClass(TextOutputFormat.class);// here you can set the path of your inputSequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));// this deletes possible output paths to prevent job failuresFileSystem fs = FileSystem.get(conf);Path out = new Path("files/out/processed/");fs.delete(out, true);// finally set the empty out pathTextOutputFormat.setOutputPath(job, out);// this waits until the job completes and prints debug out to STDOUT or whatever// has been configured in your log4j properties.job.waitForCompletion(true);如果您使用的是外部群集,则必须通过以下方式将以下信息放入配置中:// this should be like defined in your mapred-site.xmlconf.set("mapred.job.tracker", "jobtracker.com:50001"); // like defined in hdfs-site.xmlconf.set("fs.default.name", "hdfs://namenode.com:9000");当hadoop-core.jar位于您的应用程序容器类路径中时,这应该没问题。但是我认为您应该在网页上放置某种进度指示器,因为完成一项Hadoop工作可能需要几分钟到几小时;)对于YARN(> Hadoop 2)对于YARN,需要设置以下配置。// this should be like defined in your yarn-site.xmlconf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); // framework is now "yarn", should be defined like this in mapred-site.xmconf.set("mapreduce.framework.name", "yarn");// like defined in hdfs-site.xmlconf.set("fs.default.name", "hdfs://namenode.com:9000");

尚方宝剑之说

因为映射和减少在不同机器上的运行,所以所有引用的类和jar必须在机器之间移动。如果您有程序包jar,并且在您的桌面上运行,则@ThomasJungblut的答案是可以的。但是,如果您在Eclipse中运行,请右键单击您的类并运行,它不起作用。代替:job.setJarByClass(Mapper.class);使用:job.setJar("build/libs/hdfs-javac-1.0.jar");同时,您的jar清单必须包含Main-Class属性,这是您的主类。对于gradle用户,可以将这些行放在build.gradle中:jar {manifest {    attributes("Main-Class": mainClassName)}}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java