148.169服务器(Centos 6.2)上面下载安装文件和依赖文件,部署相关环境。
参考:
http://spark.apache.org/docs/latest/index.html
https://www.python.org/downloads/release/python-2710/
1.建立安装文件目录
mkdir /data/softcd /data/soft
2.下载spark源码文件
wget -c http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2.tgz
3.spark依赖环境
Spark runs on Java 7+, Python 2.6+ ,Numpy and R 3.1+. For the Scala API, Spark 1.5.2 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
4.安装Python 2.7+
wget -c https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz tar -xzf Python-2.7.10.tgz yum groupinstall "Development tools" yum install zlib-devel yum install bzip2-devel yum install openssl-devel yum install ncurses-devel cd Python-2.7.10 ./configure --prefix=/usr/local make && make altinstall
建立软连接,使系统默认的python指向python2.7
正常情况下即使python2.7安装成功后,系统默认指向的python仍然是2.6.6版本
mv /usr/bin/python /usr/bin/python.bak ln -s /usr/local/bin/python2.7 /usr/bin/python
解决系统python软链接指向python2.7版本后,yum出错!
方法:
vi /usr/bin/yum
将文本编辑显示的#/usr/bin/python修改为#/usr/bin/python2.6,保存修改即可!
5.安装Numpy & Scipy
先安装python包管理工具pip
wget -c https://bootstrap.pypa.io/get-pip.py --no-check-certificatepython get-pip.py pip install numpy
安装scipy需要下面的依赖
yum install lapack lapack-devel blas blas-devel pip install scipy
6.安装Java7+
yum search openjdk-devel yum install java-1.7.0-openjdk-devel.x86_64 /usr/sbin/alternatives --config java /usr/sbin/alternatives --config javac vim /etc/profileexport JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64export JRE_HOME=$JAVA_HOME/jreexport PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
7.安装Scala 2.11.7
wget -c http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz tar -xzf scala-2.11.7.tgz cp -R /data/soft/scala-2.11.7 /usr/local/ vim /etc/profileexport SCALA_HOME=/usr/local/scala-2.11.7export PATH=$PATH:$SCALA_HOME/bin
8.安装R3.1+
wget -c http://mirror.bjtu.edu.cn/cran/src/base/R-3/R-3.2.2.tar.gz tar -xzf R-3.2.2.tar.gz cd R-3.2.2yum install readline-devel yum install libXt-devel ./configure --prefix=/usr/localmake && make install
9.安装Spark
wget -c http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.3.tgztar -xzf /data/soft/spark-1.5.2-bin-hadoop2.3.tgz -C /data/hadoop/
10.配置Spark
cp /data/hadoop/spark-1.5.2-bin-hadoop2.3/conf/spark-env.sh.template /data/hadoop/spark-1.5.2-bin-hadoop2.3/conf/spark-env.sh vim spark-env.shexport JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64export HADOOP_HOME=/data/hadoop/hadoop-2.3.0-cdh5.1.0export HADOOP_CONF_DIR=/data/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoopexport YARN_CONF_DIR=/data/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoopexport HIVE_HOME=/data/hadoop/hive-0.12.0-cdh5.1.0export SCALA_HOME=/usr/local/scala-2.11.7export SPARK_HOME=/data/hadoop/spark-1.5.2-bin-2.3.0export SPARK_LOCAL_DIRS=/data/hadoop/app/tmpexport SPARK_PID_DIR=/data/hadoop/app/pidsexport SPARK_MASTER_IP=host169export SPARK_MASTER_PORT=7077export PYSPARK_PYTHON=/usr/local/bin/python2.7export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7export SPARK_LOCAL_IP=host177export SPARK_YARN_QUEUE=hadoopexport SPARK_WORKER_CORES=10export SPARK_WORKER_INSTANCES=1export SPARK_WORKER_MEMORY=30Gexport SPARK_WORKER_WEBUI_PORT=8081export SPARK_EXECUTOR_CORES=1export SPARK_EXECUTOR_MEMORY=5Gexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.20-bin.jar
11.配置各个服务器的Hosts
vim /etc/hosts 192.168.6.86 host86 192.168.6.87 host87 192.168.6.88 host88 192.168.6.89 host89 192.168.6.164 host164 192.168.6.165 host165 192.168.6.166 host166 192.168.6.167 host167 192.168.6.168 host168 192.168.6.169 host169
12.停掉iptables服务
service iptables stop chkconfig iptables off
13.集群环境搭建
从各个服务器上复制192.168.6.169上的文件
rsync -avz --include "soft/" --exclude "/*" 192.168.6.169::data /data
在其他服务器节点上重复上面4到11的操作
cp /data/soft/spark-env.sh /data/hadoop/spark-1.5.2-bin-hadoop2.3/conf/
14.启动Spark集群
以host169服务器做为主节点
在host169上启动master
$SPARK_HOME/sbin/start-master.sh
启动之后可以通过http://host169:8080/访问spark集群WebUI,页面中的spark://HOST:PORT为worker注册到master中的标识
同时host169也作为worker使用
$SPARK_HOME/sbin/start-slave.sh spark://host169:7077
在其他服务器节点上启动worker,注册到master中
$SPARK_HOME/sbin/start-slave.sh spark://host169:7077
15.启动Spark交互Shell
scala shell
./bin/spark-shell --master spark://IP:PORT
python shell
./bin/pyspark
16.Spark集群测试
本地单机模式执行
./bin/run-example org.apache.spark.examples.SparkPi
本地单机并行(多线程)模式执行
MASTER=local[2] ./bin/run-example org.apache.spark.examples.SparkPi
在spark集群上运行程序
MASTER=spark://host169:7077 ./bin/run-example org.apache.spark.examples.SparkPi
作者:Jogging
链接:https://www.jianshu.com/p/9a5896aae4a6