hive 和 spark版本之前有强对应关系
apache hive 和 spark 对应关系表
master | 2.3.0 |
---|---|
3.0.x | 2.3.0 |
2.3.x | 2.0.0 |
2.2.x | 1.6.0 |
2.1.x | 1.6.0 |
2.0.x | 1.5.0 |
1.2.x | 1.3.1 |
1.1.x | 1.2.0 |
cdh hive 和 spark对应关系
http://archive.cloudera.com/cdh5/cdh/5/
编译环境准备
下载scala 2.11版本
# 添加环境变量vim /etc/profileexport SCALA_HOME=/root/scala-2.11.12export PATH=$PATH:$SCALA_HOME/bin
下载maven (3.3 版本以上)
# 添加环境变量vim /etc/profileexport MAVEN_HOME=/root/apache-maven-3.5.3export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m -XX:MaxPermSize=2014M"export PATH=$PATH:$MAVEN_HOME/binsource /etc/profile
下载源码,进行编译
查看hadoop version
指定hadoop version 编译
# 确定版本 + 不编译hive包./make-distribution.sh --name hadoop2-without-hive --tgz -Pyarn -Phadoop-provided -Phadoop-2.6 -Porc-provided -Dhadoop.version=2.6.0-cdh5.14.2
编译生成spark-1.6.0-bin-hadoop2-without-hive.tgz
解压spark-1.6.0-bin-hadoop2-without-hive.tgz 到目录(eg. /root/spark-1.6.0-bin-hadoop2-without-hive)
添加spark配置文件
spark hdfs
sudo -u hdfs hdfs dfs -mkdir -p /spark/jars sudo -u hdfs hdfs dfs -mkdir -p /spark/log/envent-log# 上传 jar包hdfs dfs -put /root/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar /user/root sudo -u hdfs hdfs dfs -mv /user/root/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar /spark/jars sudo -u hdfs hdfs dfs -chown hdfs /spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar sudo -u hdfs hdfs dfs -chmod 777 /spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
spark-evn.sh
vim spark-1.6.0-bin-hadoop2-without-hive/conf/spark-env.shexport JAVA_HOME=/usr/java/defaultexport SPARK_HOME=/root/spark-1.6.0-bin-hadoop2-without-hiveexport HADOOP_HOME=/usr/lib/hadoopexport HADOOP_CONF_DIR=/etc/hadoop/confexport YARN_CONF_DIR=/etc/hadoop/confexport SPARK_LIBARY_PATH=$SPARK_LIBARY_PATH:$HADOOP_HOME/lib/nativeexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/*export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/export SPARK_DIST_CLASSPATH=$(hadoop classpath)export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=17777 -Dspark.history.fs.logDirectory=hdfs://xiwu-cluster/spark/log/envent-log"
spark-defaults.conf
vim spark-1.6.0-bin-hadoop2-without-hive/conf/spark-defaults.conf spark.yarn.archive hdfs://xiwu-cluster/spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar spark.eventLog.enabled truespark.eventLog.dir hdfs://xiwu-cluster/spark/log/envent-log spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory 1g
修改hive-site.xml
<property> <name>hive.execution.engine</name> <value>spark</value> </property> <property> <name>hive.enable.spark.execution.engine</name> <value>true</value> </property> <property> <name>spark.home</name> <value>/root/spark-1.6.0-bin-hadoop2-without-hive</value> </property> <property> <name>spark.yarn.jar</name> <value>hdfs://xiwu-cluster/spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar</value> </property> <property> <name>spark.master</name> <value>yarn-cluster</value> </property> <property> <name>spark.serializer</name> <value>org.apache.spark.serializer.KryoSerializer</value> </property>
作者:阿武z
链接:https://www.jianshu.com/p/69e4ea167885