接着上篇部署。该篇是针对上篇的测试。
测试
Spark-shell测试
./spark-shell ... scala> val days = List("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday") days: List[String] = List(Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday) scala> val daysRDD =sc.parallelize(days) daysRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:14scala>daysRDD.count() scala>res0:Long =7
脚本测试
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ../lib/spark-examples*.jar 10
http://localhost:8088/
(localhost可以是服务器地址)./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ../lib/spark-examples*.jar 10
访问
localhost:8088
可以看到结果。
./spark-submit --class org.apache.spark.examples.SparkPi --master spark://127.0.0.1:7077 ../lib/spark-examples-1.4.0-hadoop2.6.0.jar 100
./bin/run-example org.apache.spark.examples.SparkPi 2 spark://localhost:7077
./bin/run-example SparkPi 10 --master local[2]
本地模式
standalone模式
【注意】127.0.0.1 && *.jar的路径yarn测试(cluster模式和client模式)
【注意】*.jar的路径
数据测试
getNum(){ c=1 while [[ $c -le 5000000 ]] do echo $(($RANDOM/500)) ((c++)) done}for i in `seq 30`do getNum >> ${i}.txt & # getNumdonewaitecho "------------------DONE-----------------"cat [0-9]*.txt > num.txt
scala> val file = sc.textFile("hdfs://localhost:9000/user/hadoop/datatest/num.txt") scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) scala> count.sortBy(_._2).map(x => x._1 + "\t" + x._2).saveAsTextFile("hdfs://localhost:9000/user/hadoop/datatest/numCount")
执行hadoop相关操作命令:(hadoop/bin/)
./hadoop fs -cat hdfs://localhost:9000/user/hadoop/datatest/numCount/p*|sort -k2n
创建hdfs文件目录(执行文件位于hadoop/bin/hdfs;hdfs根目录是
hdfs://localhost:9000
)
执行命令:./bin/hdfs dfs -mkdir -p /user/hadoop/datatest
向创建的hdfs文件中写入数据(脚本生成的数据)
执行命令:./bin/hdfs dfs -put /root/num.txt /user/hadoop/datatest
scala测试代码:
执行命令:spark/bin/Spark-shell
shell脚本
作者:popsheng
链接:https://www.jianshu.com/p/06e7b1fb9c42