1.1环境
系统 | centos |
---|---|
jdk | 1.8.0_144 |
scala | 2.11.8 |
hadoop | 2.7.3 |
spark | 2.1.0 |
1.2打包工具
IDEA + sbt
2.打包2.1安装插件
需要预先安装scala插件,点击File ->Setting ->Plugins ->输入框输入scala->install
安装完成需要重启IDE
2.2创建项目
File -> New Project ->Scala -> SBT 选择相应版本 ->finish
2.3编写代码
build.sbt 添加spark相关依赖
name := "scalaWorkspace"
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.1.0"
创建WordCount.scala,编写如下代码
import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("hdfs:master:9000/test/test.txt")
val sc = new SparkContext(conf)
val input = sc.textFile("derby.log")
val words = input.flatMap(line =>line.split(" "))
val count = words.map(word => (word,1)).reduceByKey{ case (x,y) => x+y }
val out = count.saveAsTextFile("hdfs:master:9000/test/result")
}
}
2.4打包
File -> Project Structure -> Aritifacts -> 点击+号 ->jar -> 第二个 -> 指定Module和 MainClass -> JAR files from libraries 选择第二个 ->点击ok
主题栏点击Build -> Build Aritifacts - Build
在工程目下out目录中生成相应jar包即打包成功
3.提交任务3.1启动hadoop
#进入sbin目录
cd $Hadoop_HOME/sbin
#启动hadoop集群
start-all.sh
3.2上传测试文件到hdfs
hadoop fs -put test.txt /test/test.txt
3.3上传程序jar包
是同filelize 或者sftp 或者 rz -y命令上传程序jar
3.4 提交任务
3.4.1启动Master
.$SPARK_HOME/sbin/start-master.sh 访问localhost:8080 获取spark://xxx:7077
3.4.2启动Worker
.$SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker spark://xxx:7077
3.4.3提交作业
.$SPARK_HOME/bin/spark-submit --master spark://xxx:7077 --class WordCount /xxx/wordCount.jar